Before the fix we wouldn't fold a trivial expand_shape as index
computation. This will later force expand_shape to materialize into a
extract_stride_metadata and a reinterpret_cast unnecessarily. The
example below showcase the motivation of a source IR that won't be able
to fold today.
```mlir
%expanded = memref.expand_shape %buf [[0, 1], [2, 3]]
: memref<32x128xf16, strided<[128, 1], offset: ?>, #gpu.address_space<workgroup>>
into memref<1x32x8x16xf16, strided<..., offset: ?>, #gpu.address_space<workgroup>>
amdgpu.transpose_load %expanded[%i, %j, %k, %l]
: memref<1x32x8x16xf16, ...> -> vector<4xf16>
```
With this pattern that matches the more generic
`FoldMemRefAliasOpsPass`, the expand_shape can now fold into
transpose_load op like other load/stores.
The current `FoldMemRefAliasOps` pass doesn't use a more generic
interface yet — it still uses the hardcoded overloads. This PR continues
the pragmatic approach in providing its own folding pass (like
`GatherToLDSOp`).