PR #117514 refactored BPSectionOrderer to be used by the ELF port
but introduced some inefficiency:
* BPSectionBase/BPSymbol are wrappers around a single pointer.
The numbers of sections and symbols could be huge, and the extra
allocations are memory inefficient.
* Reconstructing the returned DenseMap (since BPSectionBase != InputSectin)
is wasteful.
This patch refactors BPSectionOrderer with Curiously Recurring Template
Pattern and eliminates the inefficiency. In addition,
`symbolToSectionIdxs` is removed and `rootSymbolToSectionIdxs` building
is moved to lld/MachO: while getting sections for symbols is cheap in
Mach-O, it is awkward and inefficient in the ELF port.
While here, add a file-level comment and replace some `StringMap<*>`
(which copies strings) with `DenseMap<CachedHashStringRef, *>`.
Pull Request: https://github.com/llvm/llvm-project/pull/124482
--order_file, call graph profile, and BalancedPartitioning currently
build the section order vector by decreasing priority (from SIZE_MAX to
0). However, it's conventional to use an increasing key (see
OutputSection::inputOrder).
Switch to increasing priorities, remove the global variable
highestAvailablePriority, and remove the highestAvailablePriority
parameter from BPSectionOrderer. Change size_t to int.
This improves consistenty with the ELF and COFF ports. The ELF port
utilizes negative priorities for --symbol-ordering-file and call graph
profile, and non-negative priorities for --shuffle-sections (no Mach-O
counterpart yet).
Pull Request: https://github.com/llvm/llvm-project/pull/121727
Refactor some code in `BPSectionOrderer.cpp` in preparation for
https://github.com/llvm/llvm-project/pull/107348.
* Rename `constructNodesForCompression()` -> `getUnsForCompression()`
and return a `SmallVector` directly rather than populating a vector
alias
* Pass `duplicateSectionIdxs` as a pointer to make it possible to skip
finding (nearly) duplicate sections
* Combine `duplicate{Function,Data}SectionIdxs` into one variable
* Compute all `BPFunctionNode` vectors at the end (like
`nodesForStartup`)
There should be no functional change.
Add the "Separate" option `--irpgo-profile-sort <profile` instead of
just the "Joined" option `--irpgo-profile-sort=<profile>`. This is
useful if the path has a `,` for some reason which would break when
trying to use `-Wl,--irpgo-profile-sort=<profile-with-comma>`.
While I'm here, use `static_cast<>` instead of the C style cast
introduced in https://github.com/llvm/llvm-project/pull/100627
Add the lld flags `--irpgo-profile-sort=<profile>` and
`--compression-sort={function,data,both}` to order functions to improve
startup time, and functions or data to improve compressed size,
respectively.
We use Balanced Partitioning to determine the best section order using
traces from IRPGO profiles (see
https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
for details) to improve startup time and using hashes of section
contents to improve compressed size.
In our recent LLVM talk (https://www.youtube.com/watch?v=yd4pbSTjwuA),
we showed that this can reduce page faults during startup by 40% on a
large iOS app and we can reduce compressed size by 0.8-3%.
More details can be found in https://dl.acm.org/doi/10.1145/3660635
---------
Co-authored-by: Vincent Lee <thevinster@users.noreply.github.com>