ZhaoQi ca4886bf96
[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass (#118437)
Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:

- `getCacheLineSize()`: 64 for loongarch64.
- `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements
taken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
- `enableWritePrefetching()`: store prefetch is supported by LoongArch,
so set WritePrefetching to true in default.
- `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still
use default values: 1 and UINT_MAX, so not override them.

After this commit, the test added by https://reviews.llvm.org/D146600
can generate llvm.prefetch intrinsic IR correctly.

Results of spec2017rate benchmarks (testing date: ref, copies: 1):
- For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can
bring about -1.58%/0.31%/0.07% performance improvement for int
benchmarks and 3.26%/3.73%/3.78% improvement for floating point
benchmarks. (Only O3+novec+prefetch decreases when testing intrate.)
- But prefetch results in performance reduction almost for every Fortran
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.

FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to
false for now due to the bad effect for Fortran.
2025-01-20 16:20:15 +08:00

47 lines
2.4 KiB
LLVM

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt --mtriple=loongarch64 -mattr=+d --passes=loop-data-prefetch -S < %s | FileCheck %s
define void @foo(ptr %a, ptr %b) {
; CHECK-LABEL: define void @foo(
; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: br label %[[FOR_BODY:.*]]
; CHECK: [[FOR_BODY]]:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], 200
; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], 200
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[B]], i64 [[TMP3]]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[INDVARS_IV]]
; CHECK-NEXT: call void @llvm.prefetch.p0(ptr [[SCEVGEP]], i32 0, i32 3, i32 1)
; CHECK-NEXT: [[TMP4:%.*]] = load double, ptr [[ARRAYIDX]], align 8
; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP4]], 1.000000e+00
; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, ptr [[A]], i64 [[INDVARS_IV]]
; CHECK-NEXT: call void @llvm.prefetch.p0(ptr [[SCEVGEP1]], i32 1, i32 3, i32 1)
; CHECK-NEXT: store double [[ADD]], ptr [[ARRAYIDX2]], align 8
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1600
; CHECK-NEXT: br i1 [[EXITCOND]], label %[[FOR_END:.*]], label %[[FOR_BODY]]
; CHECK: [[FOR_END]]:
; CHECK-NEXT: ret void
;
entry:
br label %for.body
for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds double, ptr %b, i64 %indvars.iv
%0 = load double, ptr %arrayidx, align 8
%add = fadd double %0, 1.000000e+00
%arrayidx2 = getelementptr inbounds double, ptr %a, i64 %indvars.iv
store double %add, ptr %arrayidx2, align 8
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1600
br i1 %exitcond, label %for.end, label %for.body
for.end: ; preds = %for.body
ret void
}