Zhuoran Yin 642763c553
[AMDGPU] Adding FoldMemRefOpsIntoTransposeLoadOp pattern (#183330)
Before the fix we wouldn't fold a trivial expand_shape as index
computation. This will later force expand_shape to materialize into a
extract_stride_metadata and a reinterpret_cast unnecessarily. The
example below showcase the motivation of a source IR that won't be able
to fold today.

```mlir
%expanded = memref.expand_shape %buf [[0, 1], [2, 3]]
    : memref<32x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
    into memref<1x32x8x16xf16, strided<..., offset: ?>, #gpu.address_space<workgroup>>
amdgpu.transpose_load %expanded[%i, %j, %k, %l]
    : memref<1x32x8x16xf16, ...> -> vector<4xf16>
```

With this pattern that matches the more generic
`FoldMemRefAliasOpsPass`, the expand_shape can now fold into
transpose_load op like other load/stores.

The current `FoldMemRefAliasOps` pass doesn't use a more generic
interface yet — it still uses the hardcoded overloads. This PR continues
the pragmatic approach in providing its own folding pass (like
`GatherToLDSOp`).
2026-02-25 16:58:44 -05:00
..