This patch adjusts the build process for building the toolchain for the
CI container to perform more rigorous perf-training for PGO,
particularly building the entirety of LLVM as that is what showed the
best results while benchmarking. This patch also splits the job into two
stages to avoid timeouts due to the large increase in buildtime. There
are a couple other hacks added in here to make things work that we can
do away with eventually once we're able to run jobs like this on more
powerful self-hosted runners.
Only the stage2-distribution target is built by default for the
stage2 distribution installation target. This means that we don't get a
BOLT optimized binary. This patch explicitly builds the
stage2-clang-bolt target before the distribution installation target so
that the clang binary is optimized before it gets installed.
This patch adjusts the Docker container intended for CI use to contain a
PGO+ThinLTO+BOLT optimized clang. The toolchain is built within a Github
action and takes ~3.5 hours. No caching is utilized. The current PGO
optimization is fairly minimal, only running clang over hello world.
This can be adjusted as needed.