ZhaoQi ca4886bf96
[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass (#118437)
Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:

- `getCacheLineSize()`: 64 for loongarch64.
- `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements
taken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
- `enableWritePrefetching()`: store prefetch is supported by LoongArch,
so set WritePrefetching to true in default.
- `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still
use default values: 1 and UINT_MAX, so not override them.

After this commit, the test added by https://reviews.llvm.org/D146600
can generate llvm.prefetch intrinsic IR correctly.

Results of spec2017rate benchmarks (testing date: ref, copies: 1):
- For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can
bring about -1.58%/0.31%/0.07% performance improvement for int
benchmarks and 3.26%/3.73%/3.78% improvement for floating point
benchmarks. (Only O3+novec+prefetch decreases when testing intrate.)
- But prefetch results in performance reduction almost for every Fortran
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.

FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to
false for now due to the bad effect for Fortran.
2025-01-20 16:20:15 +08:00
..