
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
44 lines
1.8 KiB
LLVM
44 lines
1.8 KiB
LLVM
; When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
|
|
; that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We need to avoid rerun of
|
|
; the same SCC when the SCC is set to be processed by both mechanisms back to back. In pathological cases, such extra,
|
|
; redundant rerun could cause exponential size growth due to inlining along cycles.
|
|
;
|
|
; The test cases here illustrates potential redundant rerun and how it's prevented, however there's no extra inlining
|
|
; even if we allow the redundant rerun. In real code, when inliner makes different decisions for different call sites
|
|
; of the same caller-callee edge, we could end up getting more recursive inlining without SCC mutation.
|
|
;
|
|
; REQUIRES: asserts
|
|
; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=500 -debug-only=cgscc -S 2>&1 | FileCheck %s
|
|
|
|
; CHECK: Running an SCC pass across the RefSCC: [(test1_a, test1_b, test1_c)]
|
|
; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
|
|
; CHECK: Enqueuing a newly formed SCC:(test1_c)
|
|
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
|
|
; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
|
|
; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
|
|
; CHECK: Skipping redundant run on SCC: (test1_c, test1_a)
|
|
|
|
declare void @external(i32 %seed)
|
|
|
|
define void @test1_a(i32 %num) {
|
|
entry:
|
|
call void @test1_b(i32 %num)
|
|
call void @external(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
define void @test1_b(i32 %num) {
|
|
entry:
|
|
call void @test1_c(i32 %num)
|
|
call void @test1_a(i32 %num)
|
|
call void @external(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
define void @test1_c(i32 %num) #0 {
|
|
call void @test1_a(i32 %num)
|
|
ret void
|
|
}
|
|
|
|
attributes #0 = { noinline nounwind optnone }
|