This reverts commit 03483737a7a2d72a257a5ab6ff01748ad9cf0f75 and
99c8557, which is a fix-up on top of the former.
I'm reverting because this commit broke two tests:
mlir/test/python/integration/dialects/linalg/opsrun.py
mlir/test/python/integration/dialects/transform.py
See https://lab.llvm.org/buildbot/#/builders/138/builds/4872
I'm not familiar with the tests, so I'm leaving it to the original author
to either remove or adapt the broken tests, as discussed here:
https://github.com/llvm/llvm-project/pull/104783#issuecomment-2406390905
The main goal of this patch is to extend the semantic of 'linalg.matmul'
named op to include per operand transpose semantic while also laying out
a way to move ops definition from OpDSL to tablegen. Hence, it is
implemented in tablegen. Transpose semantic is as follows.
By default 'linalg.matmul' behavior will remain as is. Transpose
semantics can be appiled on per input operand by specifying the optional
permutation attributes (namely 'permutationA' for 1st input and
'permutationB' for 2nd input) for each operand explicitly as needed. By
default, no transpose is mandated for any of the input operand.
Example:
```
%val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>,
memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
permutationA = [1, 0]
permutationB = [0, 1]
```
This adds patterns to convert from the Linalg matmul and batch_matmul
ops to the transposed variants. By default the LHS matrix is transposed.
Our work enabling a lowering path from linalg.matmul to ArmSME has
revealed the current lowering results in non-contiguous memory accesses
for the A matrix and very poor performance.
These patterns provide a simple option to fix this.