239 Commits

Author SHA1 Message Date
Amir Ayupov
b5af667b01
[BOLT] Map branch source address to the containing basic block in BAT YAML
Fix an issue where the profile for all branches that have a BRANCHENTRY
is dropped. If the branch has an entry in BAT, it will be translated to
its input offset. We used to only permit the basic block offset as a
branch source. Perform a lookup of containing basic block instead.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: maksfb, dcci, rafaelauler, ayermolo

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/91273
2024-05-12 17:11:09 -07:00
Amir Ayupov
4f127667ca
[BOLT] Set entry counts in BAT YAML profile (#91775)
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.

Test Plan: updated bolt-address-translation-yaml.test
2024-05-10 22:23:45 -07:00
Amir Ayupov
bbcdd4f4b2
[BOLT] Use disambiguated local names in BAT YAML
Align BAT YAML to fdata profile.

Test Plan: updated register-fragments-bolt-symbols.s

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/91773
2024-05-10 22:18:50 -07:00
Amir Ayupov
db29f20fdd
[BOLT] Ignore returns in DataAggregator
Returns are ignored in perf/pre-aggregated/fdata profile reader (see
DataReader::convertBranchData). They are also omitted in
YAMLProfileWriter by virtue of not having the profile attached to them
in the reader, and YAMLProfileWriter converting the profile attached to
BinaryFunctions. Thus, return profile is universally ignored across all
profile types except BAT YAML.

To make returns ignored for YAML produced in BAT mode, we can:
1) ignore them in YAMLProfileReader,
2) omit them from YAML profile in profile conversion/writing.

The first option is prone to profile staleness issue, where the profiled
binary doesn't match the one to be optimized, and thus returns in the
profile can no longer be reliably detected (as we don't distinguish them
from calls in the profile).

The second option is robust to staleness but requires disassembling the
branch source instruction.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/90807
2024-05-08 12:02:18 -07:00
Amir Ayupov
f2d7130579
[BOLT][NFC] Simplify DataAggregator::getFallthroughsInTrace (#90752) 2024-05-01 21:53:49 +02:00
Amir Ayupov
5fb59e7447
[BOLT] Print program stats in perf2bolt/aggregate-only mode (#89763) 2024-04-25 19:08:51 +02:00
Kazu Hirata
8927ac8657 [BOLT] Fix a warning
This patch fixes:

  bolt/lib/Profile/BoltAddressTranslation.cpp:380:37: error: operator
  '<<' has lower precedence than '+'; '+' will be evaluated first
  [-Werror,-Wshift-op-parentheses]
2024-04-15 08:56:57 -07:00
Amir Ayupov
b79b6f9cf0
[BOLT] Use offset deduplication for cold fragments
Apply deduplication for uniformity and BAT section size reduction.

Changes BAT section size to:
- large binary: 39541552 bytes (1.02x original),
- medium binary: 3828996 bytes (0.64x),
- small binary: 928 bytes (0.65x).

Test Plan: Updated bolt-address-translation.test

Reviewers: rafaelauler, dcci, ayermolo, JDevlieghere, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87853
2024-04-15 09:50:12 +02:00
Amir Ayupov
3997f0eb81
[BOLT] Cover all call sites in writeBATYAML
Call site information setting was conditioned on branch information
presence for a given block. However, it's possible to have sampled
profile lacking one or the other for a given basic block.

Iterate over branch profiles and call profiles independently to cover
all recorded profile data.

Depends on https://github.com/llvm/llvm-project/pull/87569

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87743
2024-04-11 21:15:04 +02:00
Amir Ayupov
8840992667
[BOLT][BAT] Fix handling of split functions
Move BAT parent function lookup outside `getLocationName`, to the
scope where we retrieve `FuncBranchData` linked with the function.

Previously DataAggregator would store branch profile recorded in the
split fragment in `FuncBranchData` associated with the fragment, and
perform name translation in `getLocationName` for symbol name only.
This works for fdata profile which is printed out as-is, but doesn't
work with BAT YAML profile writer which requires a combined profile.

The issue necessitated `fixupBATProfile` which partially addressed the
issue (reassigned inter-fragment calls back into intra-function
branches). However, `fixupBATProfile` fails to address disjoint
profiles (i.e. doesn't merge `FuncBranchData` for fragments back
into parent). This diff eliminates the need for `fixupBATProfile` by
removing the root cause of the issue.

Test Plan: NFC for existing tests

Reviewers: ayermolo, dcci, rafaelauler, maksfb

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/87569
2024-04-11 21:07:36 +02:00
Amir Ayupov
e64eede0dc
[BOLT][BAT] Fix encoded NumBasicBlocks
Emit the recorded number of blocks, not the number of basic block
hashes. There might be differences in corner cases (openssl
BN_BLINDING_convert_ex function).

Test Plan:
Updated openssl.test in https://github.com/rafaelauler/bolt-tests/pull/31

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: ayermolo

Pull Request: https://github.com/llvm/llvm-project/pull/87830
2024-04-05 17:54:07 -07:00
Amir Ayupov
0227623915
[BOLT][BAT] Support multi-way split functions
BAT writeMaps encoded the assumption that functions are only split into
two fragments (hot and cold). However, BOLT supports splitting into
arbitrary number of fragments. Relax that assumption and look up primary
(hot) fragment explicitly.

Depends on: https://github.com/llvm/llvm-project/pull/86219

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, rafaelauler, maksfb, dcci

Reviewed By: maksfb, dcci

Pull Request: https://github.com/llvm/llvm-project/pull/87123
2024-04-05 17:50:50 -07:00
Amir Ayupov
2d3c827c05
[BOLT] Use BAT for YAML profile call target information
Provide a mechanism to resolve call target information for calls from non-BAT
functions to BAT functions (`YAMLProfileWriter::convert`). Make it generic for
future use in BAT-to-BAT calls.

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, maksfb, rafaelauler, dcci

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/86219
2024-04-05 16:08:59 -07:00
Amir Ayupov
fd38366e45
[BOLT][NFC] Clean includes, add license headers (#87200) 2024-03-31 19:29:45 -07:00
Amir Ayupov
385e3e26c1
[BOLT] Set EntryDiscriminator in YAML profile for indirect calls
Indirect call handling missed setting an `EntryDiscriminator` while it's
set for direct calls and tail calls.

Improve YAML profile accuracy by unifying the destination setting
between direct and indirect calls into `setCSIDestination` method.

Depends on: https://github.com/llvm/llvm-project/pull/86848

Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s

Reviewers: ayermolo, maksfb, rafaelauler

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/82128
2024-03-27 16:40:38 -07:00
Amir Ayupov
213eda157a
[BOLT] Add CallSiteInfo entries in YAMLBAT (#76896)
Attach call counters to YAML profile, covering inter-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86218

Test Plan: 
Updated bolt/test/X86/bolt-address-translation-yaml.test
2024-03-25 16:23:21 -07:00
Amir Ayupov
1b763f230a
[BOLT] Add secondary entry points to BAT
Provide secondary entry points for `EntryDiscriminator` call info field
in YAML profile.

Increases BAT section size to:
- large binary: 39655300 bytes (1.03x the original),
- medium binary: 3834328 bytes (0.65x),
- small binary: 924 bytes (0.64x).

Depends on: https://github.com/llvm/llvm-project/pull/76911

Test Plan:
- Updated bolt-address-translation{,-yaml}.test
- Added openssl test: https://github.com/rafaelauler/bolt-tests/pull/30

Reviewers: dcci, rafaelauler, maksfb, ayermolo

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86218
2024-03-25 15:14:33 -07:00
Amir Ayupov
d7d2f7ca62
[BOLT] Emit intra-function control flow in YAMLBAT
Attach branch counters to YAML profile, covering intra-function control
flow.

Depends on: https://github.com/llvm/llvm-project/pull/86353

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76911
2024-03-23 19:11:49 -07:00
Amir Ayupov
a91cd53de3
[BOLT][NFC] Refactor BAT metadata data structures
Hide the implementations of `FuncHashes` and `BBHashMap` classes,
getting rid of `at` accessors that could throw an exception.

Test Plan: NFC

Reviewers: ayermolo, maksfb, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86353
2024-03-23 16:08:31 -07:00
Amir Ayupov
ceba3a38e8
[BOLT] Add number of basic blocks to BAT
YAML profile reader checks the number of basic blocks in regular,
no-stale-matching mode. Add it to BAT.

This increases the size of BAT section to:
- large binary: 39583080 bytes (1.02x of the original),
- medium binary: 3816492 bytes (0.64x),
- small binary: 920 bytes (0.64x, no change due to alignment).

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, ayermolo, maksfb, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/86045
2024-03-22 08:46:48 -07:00
Amir Ayupov
b0e23639c5 [BOLT] Add BB index to BAT
Add input basic block index to BAT metadata. This addresses the case
where some basic blocks are eliminated, and output index is not equal
to the input block index. These indices are used in non-stale-matching
mode.

Increases BAT section size to:
- large binary: 39521512 bytes (1.02x original),
- medium binary: 3799988 bytes (0.64x),
- small binary: 920 bytes (0.64x).

Test Plan:
Updated bolt-address-translation{,-yaml}.test

Pull Request: https://github.com/llvm/llvm-project/pull/86044
2024-03-22 08:42:58 -07:00
Amir Ayupov
f66d631bf8 Revert "[BOLT] Add BB index to BAT (#86044)"
This reverts commit 3b3de48fd84b8269d5f45ee0a9dc6b7448368424.
2024-03-22 08:38:40 -07:00
Amir Ayupov
3b3de48fd8
[BOLT] Add BB index to BAT (#86044) 2024-03-22 06:07:17 -07:00
Amir Ayupov
6280681137
[BOLT] Output basic YAML profile in BAT mode
Relax assumptions that YAML output is not supported in BAT mode.
Set up basic infrastructure for emitting YAML for functions not covered
by BAT, such as from `.bolt.org.text` section (code identical to input binary
sans external refs), or non-rewritten functions in non-relocation mode (where
the function stays in the same section but BAT mapping is not emitted).

This diff only produces YAML profile for non-BAT functions (skipped,
non-simple). YAML profile for BAT functions is added in follow-up diffs:
- https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with
  internal control flow information only (branch profile),
- https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile
  (calls profile).

Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76910
2024-03-21 14:32:13 -07:00
Kazu Hirata
aa7e4ba3ca [BOLT] Fix an unused variable warning
This patch fixes:

  bolt/lib/Profile/BoltAddressTranslation.cpp:26:12: error: unused
  variable 'HotFuncAddress' [-Werror,-Wunused-variable]
2024-03-20 17:46:02 -07:00
Amir Ayupov
ad00e7e5ed
[BOLT] Write and parse BF/BB hashes in BAT
This increases BAT section size to:
- large binary: 34832976 bytes (0.90x original),
- medium binary: 3586800 bytes (0.60x original),
- small binary: 816 bytes (0.57x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76907
2024-03-20 16:24:21 -07:00
Amir Ayupov
de0abc0983
[BOLT][NFC] Simplify YAMLProfileWriter::convert
Use `getAnnotationWithDefault` instead of testing if the annotation is
set. If the default value is used, and `CSI.Count` is set to zero, the
target is discarded by a check below.

Test Plan: NFC

Reviewers: maksfb, dcci, rafaelauler, ayermolo

Reviewed By: ayermolo

Pull Request: https://github.com/llvm/llvm-project/pull/82129
2024-03-20 14:39:28 -07:00
Amir Ayupov
061b408964
[BOLT][NFC] Expose YAMLProfileWriter::convert function
The function is to be used by YAML profile emission in BAT mode for
BinaryFunctions not covered by BAT tables (same as in original binary).

Test Plan: NFC

Reviewers: rafaelauler, ayermolo, dcci, maksfb

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/76909
2024-03-20 14:35:47 -07:00
Amir Ayupov
b431546d41
[BOLT] Check BF state in stale matching (#85339)
Only apply stale matching if the binary function is in CFG state, i.e.
has basic blocks.

Test Plan:
Updated bolt/test/X86/reader-stale-yaml.test
2024-03-15 10:55:53 -07:00
Amir Ayupov
d2c9a19dd8
[BOLT][NFC] Pass BF/BB hashes to BAT
Test Plan: NFC

Reviewers: dcci, rafaelauler, maksfb, ayermolo

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76906
2024-02-15 12:49:43 -08:00
Amir Ayupov
52cf07116b
[BOLT][NFC] Log through JournalingStreams (#81524)
Make core BOLT functionality more friendly to being used as a
library instead of in our standalone driver llvm-bolt. To
accomplish this, we augment BinaryContext with journaling streams
that are to be used by most BOLT code whenever something needs to
be logged to the screen. Users of the library can decide if logs
should be printed to a file, no file or to the screen, as
before. To illustrate this, this patch adds a new option
`--log-file` that allows the user to redirect BOLT logging to a
file on disk or completely hide it by using
`--log-file=/dev/null`. Future BOLT code should now use
`BinaryContext::outs()` for printing important messages instead of
`llvm::outs()`. A new test log.test enforces this by verifying that
no strings are print to screen once the `--log-file` option is
used.

In previous patches we also added a new BOLTError class to report
common and fatal errors, so code shouldn't call exit(1) now. To
easily handle problems as before (by quitting with exit(1)),
callers can now use
`BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code
needs to deal with BOLT errors. To test this, we have fatal.s
that checks we are correctly quitting and printing a fatal error
to the screen.

Because this is a significant change by itself, not all code was
yet ported. Code from Profiler libs (DataAggregator and friends)
still print errors directly to screen.

Co-authored-by: Rafael Auler <rafaelauler@fb.com>

Test Plan: NFC
2024-02-12 14:53:53 -08:00
Amir Ayupov
3c64b24ed3
[BOLT] Add extra staleness logging (#80225)
Report two extra metrics:
- # of stale functions with matching block count,
- # of stale blocks with matching instruction count.
2024-02-01 07:16:40 -08:00
Maksim Panchenko
2abcbbd96a
[BOLT] Detect Linux kernel based on ELF program headers (#80086)
Check if program header addresses fall into the kernel space to detect a
Linux kernel binary on x86-64.

Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel
instead.
2024-01-30 18:04:29 -08:00
Amir Ayupov
df7d2b2f90
[BOLT] Deduplicate equal offsets in BAT (#76905)
Encode BRANCHENTRY bits as bitmask for deduplicated entries.

Reduces BAT section size:
- large binary: to 11834216 bytes (0.31x original),
- medium binary: to 1565584 bytes (0.26x original),
- small binary: to 336 bytes (0.23x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-25 15:37:47 -08:00
Amir Ayupov
8f1d94aaea
[BOLT] Use continuous output addresses in delta encoding in BAT
Make output function addresses be delta-encoded wrt last offset in the
previous function. This reduces the deltas in function start addresses.

Test Plan:
Reduces BAT section size to:
- large binary: 12218860 bytes (0.32x original),
- medium binary: 1606580 bytes (0.27x original),
- small binary: 404 bytes (0.28x original),

Reviewers: rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/76904
2024-01-18 13:49:44 -08:00
Amir Ayupov
dcba077146
[BOLT] Embed cold mapping info into function entry in BAT (#76903)
Reduces BAT section size:
- large binary: to 12283500 bytes (0.32x original size),
- medium binary: to 1616020 bytes (0.27x original size),
- small binary: to 404 bytes (0.28x original size).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-12 13:02:32 -08:00
Amir Ayupov
8fb8ad66c9
[BOLT] Delta-encode function start addresses in BAT (#76902)
Further reduce the size of BAT section:
- large binary: to 12716312 bytes (0.33x original),
- medium binary: to 1649472 bytes (0.28x original),
- small binary: to 428 bytes (0.30x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 14:35:37 -08:00
Amir Ayupov
bbe07989d7
[BOLT] Delta-encode offsets in BAT (#76900)
This change further reduces the size of BAT:
- large binary: to 13073904 bytes (0.34x original),
- medium binary: to 1703116 bytes (0.29x original),
- small binary: to 436 bytes (0.30x original).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 14:29:46 -08:00
Amir Ayupov
565f40d66b [BOLT] Encode BAT using ULEB128 (#76899)
Reduces BAT section size, bytes:
- large binary: 38676872 -> 23262524 (0.60x),
- medium binary (trunk clang): 5938004 -> 3213504 (0.54x),
- small binary (X86/bolt-address-translation.test): 1436 -> 680 (0.47x).

Test Plan: Updated bolt/test/X86/bolt-address-translation.test
2024-01-11 12:16:30 -08:00
Kazu Hirata
ad8fd5b185 [BOLT] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:34:49 -08:00
Amir Ayupov
b039ccc684
[BOLT] Provide backwards compatibility for YAML profile with std::hash (#74253)
Provide backwards compatibility for YAML profile that uses `std::hash`:
xxh3 hash is the default for newly produced profile (sets `std-hash:
false`),
whereas the profile that doesn't specify `std-hash` will be treated as
`std-hash: true`, preserving old behavior.
2023-12-11 12:27:32 -08:00
spupyrev
e7dd596c68
[BOLT] Use deterministic xxh3 for computing BF/BB hashes (#72542)
std::hash and ADT/Hashing::hash_value are non-deterministic functions
whose
results might vary across implementation/process/execution. Using xxh3
instead
for computing hashes of BinaryFunctions and BinaryBasicBlock for stale
profile
matching.
(A possible alternative is to use ADT/StableHashing.h based on FNV
hashing but
xxh3 seems to be more popular in LLVM)

This is to address https://github.com/llvm/llvm-project/issues/65241.
2023-11-27 14:45:46 -08:00
Ho Cheung
3af586f797
[BOLT] Fix type mismatch error (#73016)
Fix build issue on Windows.

Fixes #73006
2023-11-21 19:13:46 -08:00
Maksim Panchenko
2db9b6a93f
[BOLT] Make instruction size a first-class annotation (#72167)
When NOP instructions are used to reserve space in the code, e.g. for
patching, it becomes critical to preserve their original size while
emitting the code. On x86, we rely on "Size" annotation for NOP
instructions size, as the original instruction size is lost in the
disassembly/assembly process.

This change makes instruction size a first-class annotation and is
affectively NFCI. A follow-up diff will use the annotation for code
emission.
2023-11-13 14:33:39 -08:00
Jonathan Davies
22bea0c521
[BOLT] Add itrace aggregation for AUX data (#70426)
If you have a perf.data with Arm ETM data the only way to use perf2bolt
with Branch Aggregation is to first run `perf inject --itrace=l64i1us -o
perf-brstack.data` and then pass the new perf-brstack.data into
perf2bolt. perf2bolt then runs `perf script -F pid,ip,brstack` to
produce the brstacks.

This PR adds `--itrace` arg to perf2bolt to enable Itrace Aggregation.
It takes a string which is what is passed to the `perf script -F
pid,ip,brstack --itrace={0}`. This command produces the brstacks without
having to run perf inject and creating a new perf.data file.
2023-11-06 12:40:04 +01:00
Jonathan Davies
5db75d74a1
[BOLT] Filter itrace from perf script mmap & task events (#69585)
perf2bolt launches a few perf script commands and stores the output in
temporary files before processing the output and cleaning them up before
it exits.

The command `perf script --show-mmap-events` outputs PERF_RECORD_MMAP2
and instruction tracing data but when processed it only looks for
PERF_RECORD_MMAP2 and the instruction tracing data is ignored. This is
fine for small amounts of instruction trace data but when I've recorded
Arm ETM or Intel PT AUX I get lots of it

By adding `--no-itrace` is will just show the PERF_RECORD_MMAP2 records
and will save on time running the `perf script`, disk space storing the
output & time parsing the output.

It is the same for `perf script --show-task-events` where BOLT is only
interested in the PERF_RECORD_COMM & PERF_RECORD_FORK records.

### Data

| Perf Record | Perf Data Size  | MMap Size | MMap No Itrace Size |
|---|---|---|---|
| perf record -e cs_etm/@tmc_etr0/u | 137K | 4468K | 0.632K |
| perf record -e intel_pt//u | 890K | 33378K | 0.673K |
2023-10-20 10:00:05 +02:00
Amir Ayupov
6a1cf545cc [BOLT][YAML] Only read first profile per function
D159460 regressed the bugfix in D156644. Fix that and emit a warning.
Add a test case.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D159529
2023-09-18 20:40:47 -07:00
Amir Ayupov
4627446d38 [BOLT] Fix AutoFDO output format after D154120
AutoFDO profile has no leading 0x in hex dumps.

Reviewed By: #bolt, rafauler

Differential Revision: https://reviews.llvm.org/D159507
2023-09-12 13:58:25 -07:00
Amir Ayupov
7b750943d7 [BOLT][NFC] Speedup YAML profile processing
Reduce YAML profile processing times:
- preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile
  mapping with ProfileFunctionNames set and ProfileBFs vector.
  Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs.
- readProfile: replace iteration over all functions in the binary by iteration
  over profile functions (strict match and LTO name match).

On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions)
reduces profile steps runtime:
pre-process profile data: 12.4953s -> 10.7123s
process profile data: 9.8195s -> 5.6639s

Compared to fdata profile reading:
pre-process profile data: 8.0268s
process profile data: 1.0265s
process profile data pre-CFG: 0.1644s

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D159460
2023-09-11 16:07:57 -07:00
Amir Ayupov
ffef4fe0db [BOLT][NFC] Use formatv in DataAggregator/DataReader prints
Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D154120
2023-09-11 16:01:02 -07:00