The PR adds [`vector.contract(transpose_a/transpose_b)` decomposition patterns](3215645b8d/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp (L1263)) from `vector-to-gpu` to `vector-to-xegpu` pass. The `populatePrepareVectorToMMAPatterns` adds two patterns: 1. `PrepareContractToGPUMMA` that splits `vector.contract(transpose)` into `vector.transpose + vector.contract` 2. `CombineTransferReadOpTranspose` that fuses `vector.transpose` into the permutation map of `vector.transfer_read` The second pattern doesn't always bring us to the desired result (`xegpu.load_nd + vector.transpose + xegpu.dpas`) since [not all data types are supported ](1237bd6df0/mlir/lib/Conversion/VectorToXeGPU/VectorToXeGPU.cpp (L570-L575)) for the transposed-read case. There's a second PR (#182875) on this matter that adds a decomposition-pattern for unsupported types (it might seem strange that we first fuse and then decompose transfer_read+transpose but this way we don't have code duplication between vector-to-gpu&to-xegpu passes and cover all functional cases) --------- Signed-off-by: dchigarev <dmitry.chigarev@intel.com>