Fixes#184906
The SPIRV and DXIL backends assume matrices are provided in column-major
order when lowering matrix transpose and matrix multiplication
intrinsics.
To support row-major order matrices from Clang/HLSL, we therefore need
to convert row-major order matrices into column-major order matrices
before applying matrix transpose and multiplication. A conversion from
column-major order back to row-major order is also required for
correctness after a matrix transpose or matrix multiply.
For the matrix transpose case on row-major order matrices, the last two
matrix memory layout transforms cancel each other out. So a row-major
order matrix transpose is simply a column-major order transpose with the
row and column dimensions swapped.
For the matrix multiply case, this PR adds helper functions to the
MatrixBuilder to convert a NxM row-/column-major order matrix into a NxM
column-/row-major order matrix by applying a matrix transpose.
These transformations take advantage of the fact that a row-major order
matrix of NxM dimensions `rNxM` interpreted in column-major order is
equivalent to its transpose in column-major order.
Example: Let `r3x2 = [ 0, 1, 2, 3, 4, 5 ]`. The 3x2 row-major order
matrix is visualized as
```
0 1
2 3
4 5
```
When `r3x2`, or `[ 0, 1, 2, 3, 4, 5 ]` is interpreted as a 2x3
column-major order matrix, it is visualized as:
```
0 2 4
1 3 5
```
which is equal to the transpose of `r3x2` but in column-major order.
These matrix memory layout transformations are inserted before and after
the matrix multiply and transpose intrinsics when lowering HLSL mul and
transpose.
We don't simplify the matrix multiply case because HLSL in Clang will
eventually need to support the `row_major` and `column_major` keywords
that allow matrices to independently be row-major or column-major
regardless of the default matrix memory layout.
While this method of supporting row-major order matrices is not
performant, it is correct and will suffice for now until benchmarks are
created and performance becomes a primary concern.
Assisted-by: GitHub Copilot (powered by Claude Opus 4.6)