Back-ports additional tests (eb9febb4a6b0, dc697de12792), refactoring
(43c9c14577db) and functional change (18f1369297f4) in a single PR.
https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive
tail duplication for computed-gotos in both pre- and post-regalloc tail
duplication.
In some cases, performing tail-duplication too early can lead to worse
results, especially if we duplicate blocks with a number of phi nodes.
This is causing a ~3% performance regression in some workloads using
Python 3.12.
This patch updates TailDup to delay aggressive tail-duplication for
computed gotos to after register allocation.
This means we can keep the non-duplicated version for a bit longer
throughout the backend, which should reduce compile-time as well as
allowing a number of optimizations and simplifications to trigger before
drastically expanding the CFG.
For the case in https://github.com/llvm/llvm-project/issues/106846, I
get the same performance with and without this patch on Skylake.
PR: https://github.com/llvm/llvm-project/pull/150911