133 Commits

Author SHA1 Message Date
Krzysztof Drewniak
3446ff1e67
[mlir] Update all-reduce (& vector tests) to use workgroup barriers (#178285)
This commit updates the lowering of all-reduce operations to annotate
the generated barriers with `memfence [#gpu.address_space<workgroup>]`
so that these barriers do not force unrelated global memory operations
to complete. It similarly sets up the warp synchronization function in
the vectory distribuhte tests, since they also only read/write shared
memory.

In additon, this commit adds convenience builders for gpu.barrier, which
will allow it to either fence on a given address space or on the address
space of a provided memref.
2026-01-27 13:09:18 -08:00
Longsheng Mou
a46cb15b42
[mlir][vector] Fix typo in vector.contract mnemonic (NFC) (#173661) 2025-12-31 09:31:59 +08:00
Nishant Patel
71ee84acc4
[MLIR][Vector] Add unroll pattern for vector.constant_mask (#171518)
This PR adds unrolling for vector.constant_mask op based on the
targetShape. Each unrolled vector computes its local mask size in each
dimension (d) as:
min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).
2025-12-11 13:16:55 -08:00
Nishant Patel
7931e2fd52
[MLIR][Vector] Add unroll pattern for vector.create_mask (#169119)
This PR adds unrolling for vector.create_mask op based on the
targetShape. Each unrolled vector computes its local mask size in each
dimension (d) as:
min(max(originalMaskSize[d] - offset[d], 0), unrolledMaskSize[d]).
2025-12-03 13:38:17 -08:00
Nishant Patel
af73aeaa19
[MLIR][Vector] Add unroll pattern for vector.shape_cast (#167738)
This PR adds pattern for unrolling shape_cast given a targetShape. This
PR is a follow up of #164010 which was very general and was using
inserts and extracts on each element (which is also
LowerVectorShapeCast.cpp is doing).
After doing some more research on use cases, we (me and @Jianhui-Li )
realized that the previous version in #164010 is unnecessarily generic
and doesn't fit our performance needs.

Our use case requires that targetShape is contiguous in both source and
result vector.

This pattern only applies when contiguous slices can be extracted from
the source vector and inserted into the result vector such that each
slice remains in vector form with targetShape (and not decompose to
scalars). In these cases, the unrolling proceeds as:

vector.extract_strided_slice -> vector.shape_cast (on the slice
unrolled) -> vector.insert_strided_slice
2025-11-19 16:16:44 -08:00
Diego Caballero
7bdd88c1e3
[mlir][Vector] Add patterns to lower vector.shuffle (#157611)
This PR adds patterns to lower `vector.shuffle` with inputs with
different vector sizes more efficiently. The current LLVM lowering for
these cases degenerates to a sequence of `vector.extract` and
`vector.insert` operations. With this PR, the smaller input is promoted
to larger vector size by introducing an extra `vector.shuffle`.
2025-09-16 18:01:30 -07:00
Nishant Patel
0e5c32bd6d
[MLIR][Vector] Add unrolling pattern for vector StepOp (#157752)
This PR adds unrolling pattern for vector.step op to VectorUnroll
transform.
2025-09-16 15:33:08 -07:00
Andrzej Warzyński
0ee7c9434a
[mlir][vector] Tidy-up testing for to/from_elements unrolling (#158309)
1. Remove `TestUnrollVectorToElements` and
   `TestUnrollVectorFromElements` test passes - these are not required.
2. Make "vector-from-elements-lowering.mlir" use TD Op for testing (for
   consistency "vector-to-elements-lowering.mlir" and to make sure that
   the TD Op, `transform.apply_patterns.vector.unroll_from_elements`, is
   tested).
3. Unify `CHECK` prefixes (`CHECK-UNROLL` -> `CHECK`).
4. Rename `@to_elements_1d` as `@negative_unroll_to_elements_1d`, for
   consistency with it's counterpart for `vector.from_elements` and to
   align with our testing guide (*).

(*)
https://mlir.llvm.org/getting_started/TestingGuide/#after-step-3-add-the-newly-identified-missing-case
2025-09-15 08:35:03 +01:00
Andrzej Warzyński
77596b78e5
[mlir][vector] Add a new TD op to wrap unit-dim collapsing patterns (#157507)
Adds a new TD Op,
  * `apply_patterns.vector.drop_inner_most_unit_dims_from_xfer_ops`,

which wraps the following Vector patterns:

  * `DropInnerMostUnitDimsTransferRead`
  * `DropInnerMostUnitDimsTransferWrite`

This complements other existing unit-dimension–related patterns.

To reduce duplication, the
`TestVectorTransferCollapseInnerMostContiguousDims`
pass has been removed. That pass was only used for testing, and its
functionality is now covered by the newly added TD Op.
2025-09-12 11:09:06 +01:00
Erick Ochoa Lopez
9d19250610
[mlir][vector] Add vector.to_elements unrolling (#157142)
This PR adds support for unrolling `vector.to_element`'s source operand.

It transforms

```mlir
%0:8 = vector.to_elements %v : vector<2x2x2xf32>
```

to

```mlir
%v0 = vector.extract %v[0] : vector<2x2xf32> from vector<2x2x2xf32>
%v1 = vector.extract %v[1] : vector<2x2xf32> from vector<2x2x2xf32>
%0:4 = vector.to_elements %v0 : vector<2x2xf32>
%1:4 = vector.to_elements %v1 : vector<2x2xf32>
// %0:8 = %0:4 - %1:4
```

This pattern will be applied until there are only 1-D vectors left.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Co-authored-by: hanhanW <hanhan0912@gmail.com>
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2025-09-11 13:56:57 -04:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Maksim Levental
258daf5395
[mlir][NFC] update mlir create APIs (34/n) (#150660)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 12:36:54 -05:00
Maksim Levental
258d04c810
[mlir][NFC] update mlir/Dialect create APIs (28/n) (#150641)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-25 11:48:00 -05:00
Andrzej Warzyński
6ecb6a8a8c
[mlir][vector][nfc] Rename populateVectorTransferCollapseInnerMostContiguousDimsPatterns (#145228)
Renames `populateVectorTransferCollapseInnerMostContiguousDimsPatterns`
as `populateDropInnerMostUnitDimsXferOpPatterns` + updates the
corresponding comments.

This addresses a TODO and makes the difference between these two
`populate*` methods clearer:
 * `populateDropUnitDimWithShapeCastPatterns`,
 * `populateDropInnerMostUnitDimsXferOpPatterns`.
2025-07-02 20:07:35 +01:00
Nishant Patel
9c1ce31f54
[mlir][vector] Add unroll patterns for vector.load and vector.store (#143420)
This PR adds unroll patterns for vector.load and vector.store. This PR is follow up of #137558
2025-06-20 13:50:25 -07:00
Chao Chen
def37f7e3a
[mlir][vector] add unroll pattern for broadcast (#142011)
This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. 
To support this, it also extends `BroadcastOp` definition with
`VectorUnrollOpInterface`
2025-06-05 12:42:16 -05:00
Kazu Hirata
c180d0cfad
[mlir] Use llvm::any_of (NFC) (#141317) 2025-05-23 23:59:48 -07:00
James Newling
3d6d5dfed2
[mlir][vector] Address linearization comments (post commit) (#138075)
This PR adds some documentation to address comments in
https://github.com/llvm/llvm-project/pull/136581 

This PR adds a test for linearization across scf.for. This new test
might be considered redundant by more experienced MLIRers, but might
help newer users understand how to linearize scf/cf/func operations
easily

The documentation added in this PR also tightens our definition of
linearization, to now exclude unrolling (which creates multiple ops from
1 op). We hadn't really specified what linearization meant before.
2025-05-15 07:52:53 -07:00
Nishant Patel
1778d3b824
[mlir] [vector] Add linearization pattern for vector.create_mask (#138214)
This PR is a breakdown [3 / 4] of the PR #136193 
The PR adds linearization patterns for vector.create_mask
2025-05-14 15:53:58 -07:00
Mehdi Amini
c02aa91939
Revert "[mlir][MemRef] Remove integer address space builders" (#138853)
Reverts llvm/llvm-project#138579

An integration test is broken on the mlir-nvidia* bots.
2025-05-07 13:43:55 +02:00
Krzysztof Drewniak
c7c1283ab2
[mlir][MemRef] Remove integer address space builders (#138579)
The forms of the MemRef builder that took an integer argument instead of
an attribute have been deprecated for years now, and have almost no
upstream uses (the remaining ones are handled in this PR). Therefore,
remove them.
2025-05-06 11:22:15 -05:00
James Newling
bad8bf56d3
[mlir][vector] Linearization: push 'bit width' logic out of patterns (#136581)
[NFC]
Vector linearization is a collection of rewrite patterns that reduce the
rank of vector operands and results.

In https://github.com/llvm/llvm-project/pull/83314 an option to ignore
(make 'legal') operations with large inner-most dimensions was added.
This current PR is a step towards making that option live outside of
upstream MLIR. The motivation is to remove non-core functionality (I
would like to use this pass, but would prefer not to deal with
'targetVectorBitWidth` at all).

As a follow-up to this PR, I propose that user(s) of the
`targetVectorBitWidth` move the relevant code (now in
mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp) to their code
bases, and then eventually remove it from upstream. In addition the tests need to
split out (I've intentionally not modified the lit tests here, to make
it easier to confirm that this is a NFC). I'm happy to help make it
easier to do this final step!

The approach I've used is to move the logic pertaining to
`targetVectorBitWidth` out the patterns, and into the conversion target,
which the end user can control outside of core MLIR.
2025-04-30 09:05:40 -07:00
Ivan Butygin
e87aa0c6ab
[mlir][vector] Sink vector.extract/splat into load/store ops (#134389)
```
vector.load %arg0[%arg1] : memref<?xf32>, vector<4xf32>
vector.extract %0[1] : f32 from vector<4xf32>
```
Gets converted to:
```
%c1 = arith.constant 1 : index
%0 = arith.addi %arg1, %c1 overflow<nsw> : index
%1 = memref.load %arg0[%0] : memref<?xf32>
```

```
%0 = vector.splat %arg2 : vector<1xf32>
vector.store %0, %arg0[%arg1] : memref<?xf32>, vector<1xf32>
```
Gets converted to:
```
memref.store %arg2, %arg0[%arg1] : memref<?xf32>
```
2025-04-22 17:18:54 +03:00
Kunwar Grover
cf0efb3188
[mlir][vector] Decouple unrolling gather and gather to llvm lowering (#132206)
This patch decouples unrolling vector.gather and lowering vector.gather
to llvm.masked.gather.

This is consistent with how vector.load, vector.store,
vector.maskedload, vector.maskedstore lower to LLVM.

Some interesting test changes from this patch:

- 2D vector.gather lowering to llvm tests are deleted. This is
consistent with other memory load/store ops.
- There are still tests for 2D vector.gather, but the constant mask for
these test is modified. This is because with the updated lowering, one
of the unrolled vector.gather disappears because it is masked off (also
demonstrating why this is a better lowering path)

Overall, this makes vector.gather take the same consistent path for
lowering to LLVM as other load/store ops.

Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
2025-03-24 12:25:17 +00:00
Kazu Hirata
3041fa6c7a
[mlir] Use *Set::insert_range (NFC) (#132326)
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range.  This patch replaces:

  Dest.insert(Src.begin(), Src.end());

with:

  Dest.insert_range(Src);

This patch does not touch custom begin like succ_begin for now.
2025-03-20 22:24:17 -07:00
Jacques Pienaar
09dfc5713d
[mlir] Enable decoupling two kinds of greedy behavior. (#104649)
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.

These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.

Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.

For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
2024-12-20 08:15:48 -08:00
Kai Sasaki
017c75bfac
[mlir] Fix typo in test vector transform pass descriptions (#118194)
Fix some typos in the description of vector transform passes.
2024-12-01 15:48:02 +09:00
Petr Kurapov
ecaf2c335c
[MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994)
Please see the related RFC here:
https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989.

This patch does exactly one thing - moves the op to gpu.
2024-11-22 15:30:47 +01:00
Benoit Jacob
a9ebdbb5ac
[MLIR] Vector: turn the ExtractStridedSlice rewrite pattern from #111541 into a canonicalization (#111614)
This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.

Context: discussion on #111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in #111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.

---------

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2024-10-09 09:24:23 -04:00
Benoit Jacob
10054ba4ac
[mlir][vector] Add pattern to rewrite contiguous ExtractStridedSlice into Extract (#111541)
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-10-08 11:51:01 -04:00
Andrzej Warzyński
42944da5ba
[mlir][vector] Group re-order patterns together (#102856)
Group all patterns that re-order vector.transpose and vector.broadcast
Ops (*) under `populateSinkVectorOpsPatterns`. These patterns are
normally used to "sink" redundant Vector Ops, hence grouping together.
Example:

```mlir
%at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%r = arith.addf %at, %bt : vector<2x4xf32>
```
would get converted to:
```mlir
%0 = arith.addf %a, %b : vector<4x2xf32>
%r = vector.transpose %0, [1, 0] : vector<2x4xf32>
```

This patch also moves all tests for these patterns so that all of them
are:
  * run under one test-flag: `test-vector-sink-patterns`,
  * located in one file: "vector-sink.mlir".

To facilitate this change:
  * `-test-sink-vector-broadcast` is renamed as
    `test-vector-sink-patterns`,
  * "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir",
  * tests for `ReorderCastOpsOnBroadcast` and
    `ReorderElementwiseOpsOnTranspose` patterns are moved from
    "vector-reduce-to-contract.mlir" to "vector-sink.mlir",
  * `ReorderElementwiseOpsOnTranspose` patterns are removed from
    `populateVectorReductionToContractPatterns` and added to (newly
    created) `populateSinkVectorOpsPatterns`,
  * `ReorderCastOpsOnBroadcast` patterns are removed from
    `populateVectorReductionToContractPatterns` - these are already
    present in `populateSinkVectorOpsPatterns`.

This should allow us better layering and more straightforward testing.
For the latter, the goal is to be able to easily identify which pattern
a particular test is exercising (especially when it's a specific
pattern).

NOTES FOR DOWNSTREAM USERS

In order to preserve the current functionality, please make sure to add
  * `populateSinkVectorOpsPatterns`,

wherever you are using `populateVectorReductionToContractPatterns`.
Also, rename `populateSinkVectorBroadcastPatterns` as
`populateSinkVectorOpsPatterns`.

(*) I didn't notice any other re-order patterns.
2024-08-16 16:53:53 +01:00
Benjamin Maxwell
9b06e25e73
[mlir][vector] Add mask elimination transform (#99314)
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It attempts to do this by simply pattern-matching the mask
operands (similar to some canonicalizations), if that does not lead to
an answer (is all-true? yes/no), then value bounds analysis will be used
to find the lower bound of the unknown operands. If the lower bound is
>= to the corresponding mask vector type dim, then that dimension of the
mask is all true.

Note that the pattern matching prevents expensive value-bounds analysis
in cases where the mask won't be all true.

For example:
```mlir
%mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1>
```
From looking at `%c2` we can tell this is not going to be an all-true
mask, so we don't need to run the value-bounds analysis for
`%dynamicValue` (and can exit the transform early).

Note: Eliminating create_masks here means replacing them with all-true
constants (which will then lead to the masks folding away).
2024-08-09 10:51:49 +01:00
Kazu Hirata
5262865aac
[mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
Charitha Saumya
c577f91d26
[mlir][vector] Add support for linearizing Extract, ExtractStridedSlice, Shuffle VectorOps in VectorLinearize (#88204)
This PR adds support for converting `vector.extract_strided_slice` and
`vector.extract` operations to equivalent `vector.shuffle` operations
that operates on linearized (1-D) vectors. `vector.shuffle` operations
operating on n-D (n > 1) are also converted to equivalent shuffle
operations working on linearized vectors.
2024-04-18 21:13:49 +03:00
Andrzej Warzyński
d3aa92ed14
[mlir][vector] Add support for scalable vectors to VectorLinearize (#86786)
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.

Linearization is disable in 2 notable cases:
  * vectors with more than 1 scalable dimension (we cannot represent
    vscale^2),
  * vectors initialised with arith.constant that's not a vector splat
    (such arith.constant Ops cannot be flattened).
2024-03-28 14:53:21 +00:00
Balaji V. Iyer
6f5c4f2eac
[mlir][vector]Add Vector bitwidth target to Linearize Vectorizable and Constant Ops (#83314)
Added a new flag `targetVectorBitwidth` to capture bit-width input.
2024-03-04 19:17:51 -06:00
Quinn Dawkins
c2b952926f
[mlir][vector] Fix n-d transfer write distribution (#83215)
Currently n-d transfer write distribution can be inconsistent with
distribution of reductions if a value has multiple users, one of which
is a transfer_write with a non-standard distribution map, and the other
of which is a vector.reduction.

We may want to consider removing the distribution map functionality in
the future for this reason.
2024-02-28 00:11:28 -05:00
Diego Caballero
71441ed171
[mlir][Vector] Add vector bitwidth target to xfer op flattening (#81966)
This PR adds an optional bitwidth parameter to the vector xfer op
flattening transformation so that the flattening doesn't happen if the
trailing dimension of the read/writen vector is larger than this
bitwidth (i.e., we are already able to fill at least one vector register
with that size).
2024-02-21 09:22:48 -08:00
Ivan Butygin
35ef3994bf
[mlir][vector] ND vectors linearization pass (#81159)
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion
handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V
conversion doesn't handle them at all at the moment. Sometimes it's
preferable to treat multidim vectors as linearized 1D. Add pass to do
this. Only constants and simple elementwise ops are supported for now.

@krzysz00 I've extracted yours result type conversion code from
LegalizeToF32 and moved it to common place.

Also, add ConversionPattern class operating on traits.
2024-02-13 15:30:58 +03:00
Jakub Kuderski
07677113ff
[mlir][vector] Add pattern to break down reductions into arith ops (#75727)
The number of vector elements considered 'small' enough to extract is
parameterized.                                                   
                                                                 
This is to avoid going into specialized reduction lowering when a
single/couple of arith ops can do. Targets without dedicated reduction  
intrinsics can use that as an emulation path too.                  
                                                                   
Depends on https://github.com/llvm/llvm-project/pull/75846.
2023-12-18 17:54:54 -05:00
Hsiangkai Wang
f643eec892
[mlir][vector] Add emulation patterns for vector masked load/store (#74834)
In this patch, it will convert

```
vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru
```

to

```
%ivalue = %pass_thru
%m = vector.extract %mask[0]
%result0 = scf.if %m {
  %v = memref.load %base[%idx_0, %idx_1]
  %combined = vector.insert %v, %ivalue[0]
  scf.yield %combined
} else {
  scf.yield %ivalue
}
%m = vector.extract %mask[1]
%result1 = scf.if %m {
  %v = memref.load %base[%idx_0, %idx_1 + 1]
  %combined = vector.insert %v, %result0[1]
  scf.yield %combined
} else {
  scf.yield %result0
}
...
```

It will convert

```
vector.maskedstore %base[%idx_0, %idx_1], %mask, %value
```

to

```
%m = vector.extract %mask[0]
scf.if %m {
  %extracted = vector.extract %value[0]
  memref.store %extracted, %base[%idx_0, %idx_1]
}
%m = vector.extract %mask[1]
scf.if %m {
  %extracted = vector.extract %value[1]
  memref.store %extracted, %base[%idx_0, %idx_1 + 1]
}
...
```
2023-12-15 11:35:48 +00:00
Andrzej Warzyński
c02d07fdf0
[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817)
For vectors with either leading or trailing unit dim, replaces:

    elementwise(a, b)

with:

    sc_a = shape_cast(a)
    sc_b = shape_cast(b)
    res = elementwise(sc_a, sc_b)
    return shape_cast(res)

The newly inserted shape_cast Ops fold (before elementwise Op) and then
restore (after elementwise Op) the unit dim. Vectors `a` and `b` are
required to be rank > 1.

Example:
```mlir
  %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32>
  %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32>
```

gets converted to:

```mlir
  %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32>
  %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32>
  %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32>
  %mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32>
  %cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32>
```

In practice, the bottom 2 shape_cast(s) will be folded away.
2023-12-13 20:29:12 +00:00
Jakub Kuderski
8063622721
[mlir][vector] Allow vector distribution with multiple written elements (#75122)
Add a configuration option to allow vector distribution with multiple
elements written by a single lane.

This is so that we can perform vector multi-reduction with multiple
results per workgroup.
2023-12-12 13:15:17 -05:00
Andrzej Warzyński
2eb9e33cc5
[mlir][Vector] Update patterns for flattening vector.xfer Ops (2/N) (#73523)
Updates patterns for flattening `vector.transfer_read` by relaxing the
requirement that the "collapsed" indices are all zero. This enables
collapsing cases like this one:

```mlir
  %2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... :
    memref<1x43x4x6xi32>, vector<1x2x6xi32>
```

Previously only the following case would be consider for collapsing
(all indices are 0):

```mlir
  %2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... :
    memref<1x43x4x6xi32>, vector<1x2x6xi32>
```

Also adds some new comments and renames the `firstContiguousInnerDim`
parameter as `firstDimToCollapse` (the latter better matches the actual
meaning).

Similar updates for `vector.transfer_write` will be implemented in a
follow-up patch.
2023-12-05 08:35:58 +00:00
Jakub Kuderski
d33bad66d8
[mlir][vector] Add patterns to simplify chained reductions (#73048)
Chained reductions get created during vector unrolling. These patterns
simplify them into a series of adds followed by a final reductions.

This is preferred on GPU targets like SPIR-V/Vulkan where vector
reduction gets lowered into subgroup operations that are generally more
expensive than simple vector additions.

For now, only the `add` combining kind is handled.
2023-11-22 10:30:04 -05:00
Quinn Dawkins
df49a97ab2
[mlir][vector] Root the transfer write distribution pattern on the warp op (#71868)
Currently when there is a mix of transfer read ops and transfer write
ops that need to be distributed, because the pattern for write
distribution is rooted on the transfer write, it is hard to guarantee
that the write gets distributed after the read when the two aren't
directly connected by SSA. This is likely still relatively unsafe when
there are undistributable ops, but structurally these patterns are a bit
difficult to work with. For now pattern benefits give fairly good
guarantees for happy paths.
2023-11-10 08:49:33 -05:00
Mehdi Amini
830b9b072d Update some uses of getAttr() to be explicit about Inherent vs Discardable (NFC) 2023-09-12 01:33:47 -07:00
Andrzej Warzynski
576b184d6e [mlir][vector] Add support for scalable vectors in trimLeadingOneDims
This patch updates one specific hook in "VectorDropLeadUnitDim.cpp" to
make sure that "scalable dims" are handled correctly. While this change
affects multiple patterns, I am only adding one regression tests that
captures one specific case that affects me right now.

I am also adding Vector dialect to the list of dependencies of
`-test-vector-to-vector-lowering`. Otherwise my test case won't work as
a standalone test.

Differential Revision: https://reviews.llvm.org/D157993
2023-08-22 08:45:59 +00:00
Andrzej Warzynski
4d339ec91e [mlir][Vector] Add pattern to reorder elementwise and broadcast ops
The new pattern will replace elementwise(broadcast) with
broadcast(elementwise) when safe.

This change affects tests for vectorising nD-extract. In one case
("vectorize_nd_tensor_extract_with_tensor_extract") I just trimmed the
test and only preserved the key parts (scalar and contiguous load from
the original Op). We could do the same with some other tests if that
helps maintainability.

Differential Revision: https://reviews.llvm.org/D152812
2023-06-15 10:13:41 +01:00
Matthias Springer
faae4d5d81 [mlir][vector][transform] Expose tensor slice -> transfer folding patterns
Add a new transform op to populate patterns: ApplyFoldTensorSliceIntoTransferPatternsOp.

Differential Revision: https://reviews.llvm.org/D152531
2023-06-09 16:23:25 +02:00