llvm-project/llvm/test/Transforms/LowerMatrixIntrinsics
Florian Hahn 54177e95d1
[Matrix] Use tiled loops automatically for large kernels. (#179325)
Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.

We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.

Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.

PR: https://github.com/llvm/llvm-project/pull/179325
2026-02-11 15:36:34 +00:00
..