llvm-project

History

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.

We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.

Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.

PR: https://github.com/llvm/llvm-project/pull/179325

2026-02-11 15:36:34 +00:00

after-transpose-opts.ll

…

analysis-invalidation.ll

…

bigger-expressions-double.ll

…

binop.ll

…

const-gep.ll

…

data-layout-multiply-fused.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

data-layout.ll

…

dot-product-float.ll

…

dot-product-int-also-fusable-multiply.ll

…

dot-product-int-row-major.ll

…

dot-product-int.ll

…

dot-product-transpose-int.ll

…

flatten.ll

…

load-align-volatile.ll

…

multiply-add-sub-double-row-major.ll

…

multiply-double-contraction-fmf.ll

…

multiply-double-contraction.ll

…

multiply-double-row-major.ll

…

multiply-double.ll

…

multiply-float-contraction-fmf.ll

…

multiply-float-contraction.ll

…

multiply-float.ll

…

multiply-fused-dominance.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

multiply-fused-lifetime-ends.ll

…

multiply-fused-loops-large-matrixes.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

multiply-fused-loops.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

multiply-fused-multiple-blocks.ll

…

multiply-fused-volatile.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

multiply-fused.ll

[Matrix] Use tiled loops automatically for large kernels. (#179325 )

2026-02-11 15:36:34 +00:00

multiply-i32-row-major.ll

…

multiply-i32.ll

…

multiply-left-transpose-row-major.ll

…

multiply-minimal.ll

…

multiply-remainder-rm.ll

[LMI] Support non-power-of-2 types for the matmul remainder (#163987 )

2025-10-17 18:42:30 +00:00

multiply-remainder.ll

[LMI] Support non-power-of-2 types for the matmul remainder (#163987 )

2025-10-17 18:42:30 +00:00

multiply-right-transpose.ll

…

phi.ll

…

preserve-existing-fast-math-flags.ll

…

propagate-backward.ll

…

propagate-backwards-unsupported.ll

…

propagate-forward.ll

…

propagate-mixed-users.ll

…

propagate-multiple-iterations.ll

…

remarks-inlining.ll

[DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (#166855 )

2025-11-10 13:13:49 +01:00

remarks-shared-subtrees.ll

…

remarks.ll

[DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (#166855 )

2025-11-10 13:13:49 +01:00

select.ll

…

shape-verification.ll

…

store-align-volatile.ll

…

strided-load-double.ll

…

strided-load-float.ll

…

strided-load-i32.ll

…

strided-store-double.ll

…

strided-store-float.ll

…

strided-store-i32.ll

…

transpose-double-row-major.ll

…

transpose-double.ll

…

transpose-float-row-major.ll

…

transpose-float.ll

…

transpose-fold-store.ll

…

transpose-fold.ll

…

transpose-i32-row-major.ll

…

transpose-i32.ll

…

transpose-opts-iterator-invalidation.ll

…

transpose-opts-lifting-constant-folds.ll

…

transpose-opts-lifting.ll

…

transpose-opts.ll

…

unary.ll

…