llvm-project

History

[AMDGPU] Adding FoldMemRefOpsIntoTransposeLoadOp pattern (#183330 )

Before the fix we wouldn't fold a trivial expand_shape as index
computation. This will later force expand_shape to materialize into a
extract_stride_metadata and a reinterpret_cast unnecessarily. The
example below showcase the motivation of a source IR that won't be able
to fold today.

```mlir
%expanded = memref.expand_shape %buf [[0, 1], [2, 3]]
    : memref<32x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
    into memref<1x32x8x16xf16, strided<..., offset: ?>, #gpu.address_space<workgroup>>
amdgpu.transpose_load %expanded[%i, %j, %k, %l]
    : memref<1x32x8x16xf16, ...> -> vector<4xf16>
```

With this pattern that matches the more generic
`FoldMemRefAliasOpsPass`, the expand_shape can now fold into
transpose_load op like other load/stores.

The current `FoldMemRefAliasOps` pass doesn't use a more generic
interface yet — it still uses the hardcoded overloads. This PR continues
the pragmatic approach in providing its own folding pass (like
`GatherToLDSOp`).

2026-02-25 16:58:44 -05:00

[mlir][AMDGPU] Allow packing of exactly 4 elements. (#181843 )

2026-02-17 12:01:18 -05:00

Transforms

[AMDGPU] Adding FoldMemRefOpsIntoTransposeLoadOp pattern (#183330 )

2026-02-25 16:58:44 -05:00

Utils

[mlir][amdgpu] Align Chipset with TargetParser (#107720 )

2024-09-09 11:12:26 -04:00

CMakeLists.txt

[mlir][amdgpu] Remove shared memory optimization pass (#88225 )

2024-04-11 11:07:17 -04:00