117 Commits

Author SHA1 Message Date
Amir Ayupov
9fed480f18
[BOLT] Explicitly check for returns when extending call continuation profile (#143295)
Call continuation logic relies on assumptions about fall-through origin:
- the branch is external to the function,
- fall-through start is at the beginning of the block,
- the block is not an entry point or a landing pad.

Leverage trace information to explicitly check whether the origin is a
return instruction, and defer to checks above only in case of
DSO-external branch source.

This covers both regular and BAT cases, addressing call continuation
fall-through undercounting in the latter mode, which improves BAT
profile quality metrics. For example, for one large binary:
- CFG discontinuity 21.83% -> 0.00%,
- CFG flow imbalance 10.77%/100.00% -> 3.40%/13.82% (weighted/worst)
- CG flow imbalance 8.49% —> 8.49%.

Depends on #143289.

Test Plan: updated callcont-fallthru.s
2025-06-17 06:28:27 -07:00
Amir Ayupov
7e6c1bd3ed
[BOLT][NFCI] Simplify DataAggregator using traces (#143289)
Consistently apply traces as defined in #127125 for branch profile
aggregation. This combines branches and fall-through records into one.

With large input binaries/profiles, the speed up in aggregation time
(`-time-aggr`, wall time):
- perf.data, pre-BOLT input: 154.5528s -> 144.0767s
- pre-aggregated data, pre-BOLT input: 15.1026s -> 9.0711s
- pre-aggregated data, BOLTed input: 15.4871s -> 10.0077s

Test Plan: NFC
2025-06-16 23:54:40 -07:00
Amir Ayupov
902a991e12
[BOLT] Make memory profile parsing optional (#129585)
Introduce `parse-mem-profile` option to limit overheads processing
tracing data (Intel PT or ARM ETM). By default, it's enabled for
perf data (existing behavior), unless `itrace` is passed to parse
tracing data where it's extremely expensive. In this case, the flag
needs to be set explicitly if needed.
2025-06-12 14:46:37 -07:00
Amir Ayupov
0c77468288
[BOLT] Expose external entry count for functions (#141674)
Record the number of function invocations from external code - code
outside the binary, which may include JIT code and DSOs. Accounting
external entry counts improves the fidelity of call graph flow 
conservation analysis.

Test Plan: updated shrinkwrapping.test
2025-06-10 14:31:22 -07:00
Amir Ayupov
03bbd04bb7
[BOLT][NFCI] Skip validation in parseLBRSample (#143288)
Parsed branches and fall-throughs are validated in `doBranch` and
`doTrace` respectively. Simplify parseLBRSample by omitting the
validation. This also speeds up perf data processing as checks are only
done once for aggregated branches/fall-throughs and not individual LBR
entries.

Since invalid/external addresses are no longer sanitized during parsing,
sanitize them in `doBranch`.

Test Plan: updated X86/pre-aggregated-perf.test
2025-06-08 17:50:02 -07:00
Amir Ayupov
dcd2ac7ef2
[BOLT] Sort EntryData (#143308)
Aggregated branch data has two containers: `Data` for local branches,
and `EntryData` for external branches. Fix the omission and sort
`EntryData` to ensure stable output fdata profiles.

Test Plan: updated pre-aggregated-perf.test
2025-06-08 17:43:44 -07:00
Amir Ayupov
c480dcddd9
[BOLT][NFC] Move LBREntry from DataReader to DataAggregator (#143287)
LBREntry is only used in DataAggregator.

Test Plan: NFC
2025-06-08 17:41:46 -07:00
Amir Ayupov
dc513fa8dc
[BOLT] Zero initialize pre-aggregated counters (#142698)
#140196 introduced UB by using uninitialized misprediction count for
pre-aggregated traces. Fix by zero initializing both counters.

Test Plan: updated entry-point-fallthru.s
2025-06-03 20:50:19 -07:00
Amir Ayupov
18e51314c4
[BOLT] Support pre-aggregated basic sample profile (#140196)
Define a pre-aggregated basic sample format:
```
E <event name>
S <location> <count>
```

`-nl` flag is required to use parsed basic samples.

Test Plan: update pre-aggregated-perf.test
2025-06-02 11:43:48 -07:00
Amir Ayupov
5047a33cd8
[BOLT][heatmap] Produce zoomed-out heatmaps (#140153)
Add a capability to produce multiple heatmaps with given bucket sizes.

The default heatmap block size (64B) could be too fine-grained for
large binaries. Extend the option `block-size` to accept a list of
bucket sizes for additional heatmaps with coarser granularity. The
heatmap is simply rescaled so provided sizes should be multiples of
each other. Human-readable suffixes can be used, e.g. 4K, 16kb, 1MiB.

New defaults: 64B (base bucket size), 4KB (default page size),
256KB (for large binaries).

Test Plan: updated heatmap-preagg.test
2025-05-30 16:20:19 -07:00
Amir Ayupov
9d5d715330
[BOLT][heatmap] Add synthetic hot text section (#139824)
In heatmap mode, report samples and utilization of the section(s)
between hot text markers `[__hot_start, __hot_end)`.

The intended use is with multi-way splitting where there are several
sections that contain "hot" code (e.g. `.text.warm` with CDSplit).

Addresses the comment on #139193

https://github.com/llvm/llvm-project/pull/139193#pullrequestreview-2835274682

Test Plan: updated heatmap-preagg.test
2025-05-14 09:47:14 -07:00
Amir Ayupov
616489e2ee
[BOLT] Drop perf2bolt cold samples diagnostic (#139337)
Cold samples diagnostics in perf2bolt are superseded by
`perf2bolt --heatmap` option (#139194). It provides a superset of stats
and works without BAT section which is not emitted by default.

Test Plan: NFC
2025-05-13 13:24:56 -07:00
Amir Ayupov
0289ca09be
[BOLT] Print heatmap from perf2bolt (#139194)
Add perf2bolt `--heatmap` option to produce heatmaps during profile
aggregation.

Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode 
(`perf2bolt --heatmap`), which impacts perf.data handling:
exclusive mode covers all addresses, whereas optional mode consumes
attached profile only covering function addresses.

Test Plan: updated per2bolt tests:
- pre-aggregated-perf.test: pre-aggregated data,
- bolt-address-translation-yaml.test: pre-aggregated + BOLTed input,
- perf_test.test: no-LBR perf data.
2025-05-13 13:23:18 -07:00
Amir Ayupov
fbdb5aeff6
[BOLT] Build heatmap with pre-aggregated data (#138798)
Reuse data structures used by perf data reader for pre-aggregated data.
Combined with #136531 this allows using pre-aggregated data for heatmap.

Test Plan: heatmap-preagg.test
2025-05-12 18:04:10 -07:00
Amir Ayupov
f2351d9e7f
[BOLT][heatmap] Use parsed basic/branch events (#136531)
Remove duplicate profile parsing in heatmap construction, switching to
using parsed profile. #138798 adds support for using pre-aggregated
profile for heatmap construction.

Test Plan: added heatmap.test in
0868850a15
2025-05-12 17:33:30 -07:00
Amir Ayupov
e039d16ee5
[BOLT][NFC] Disambiguate sample as basic sample (#139350)
Sample is a general term covering both basic (IP) and branch (LBR)
profiles. Find and replace ambiguous uses of sample in a basic sample
sense.

Rename `RawBranchCount` into `RawSampleCount` reflecting its use for
both kinds of profile.

Rename `PF_LBR` profile type as `PF_BRANCH` reflecting non-LBR based
branch profiles (non-brstack SPE, synthesized brstack ETM/PT).

Follow-up to #137644.

Test Plan: NFC
2025-05-12 17:15:16 -07:00
Amir Ayupov
8f31c6dde7
[BOLT] Support profile density with basic samples (#137644)
For profile with LBR samples, binary function profile density is
computed as a ratio of executed bytes to function size in bytes.

For profile with IP samples, use the size of basic block containing the
sample IP as a numerator.

Test Plan: updated perf_test.test
2025-05-10 21:01:49 -07:00
Kazu Hirata
193135c800
[BOLT] Remove an unused local variable (NFC) (#139392) 2025-05-10 10:21:59 -07:00
Amir Ayupov
54aa16d293
[BOLT] Drop converting return profile to call cont (#129477)
The workaround was not implemented for BAT case, and it is no longer
needed with pre-aggregated traces, alternatively, the effect can be
achieved with `infer-fall-throughs` with old pre-aggregated format
(branches + ranges).

Test Plan: updated callcont-fallthru.s
2025-05-06 22:21:53 -07:00
Amir Ayupov
5d0afacd1b
[BOLT][NFCI] Emit uniform diagnostics in DataAggregator (#136530)
DataAggregator supports reading different kinds of profile data:
- perf data: branch records or IP samples,
- pre-aggregated branch data.

Make profile quality reporting uniform across all kinds of input:
- out-of-range and mismatching samples,
- samples in cold code in BAT mode (profiled BOLTed binary).

Test Plan: NFCI
2025-04-24 13:51:18 -07:00
Amir Ayupov
fa4ac19f0f
[BOLT] Accept PLT fall-throughs as valid traces (#129481)
We used to report PLT traces as invalid (mismatching disassembled
function contents) because PLT functions are marked as pseudo and
ignored, thus missing CFG. However, such traces are not mismatching
the function contents. Accept them without attaching the profile.

Test Plan: updated callcont-fallthru.s
2025-04-11 21:26:19 -07:00
Amir Ayupov
f567524399
[BOLT] Fix doTrace in BAT mode (#128546)
When processing BOLTed binaries with BAT section, we used to
indiscriminately use `BAT->getFallthroughsInTrace` to record
fall-throughs, even if the function is not covered by BAT.

Fix that by using non-BAT CFG-based `getFallthroughsInTrace` if the
function is not in BAT.

Test Plan: updated bolt-address-translation-yaml.test
2025-02-25 10:56:13 -08:00
Amir Ayupov
61acfb07e8
[BOLT] Add pre-aggregated trace support (#127125)
Traces are triplets of branch source, target, and fall-through end (next
branch).

Traces simplify differentiation of fall-throughs into local- and
external-origin, which improves performance over profile with
undifferentiated fall-throughs by eliminating profile discontinuity in
call to continuation fall-throughs. This makes it possible to avoid
converting return profile into call to continuation profile which may
introduce statistical biases.

The existing format makes provisions for local- (F) and external- (f)
origin fall-throughs, but the profile producer needs to know function
boundaries. BOLT has that information readily available, so providing
the origin branch of a fall-through is a functional replacement of the
fall-through kind (f or F). This also has an effect of combining
branches and fall-throughs into a single record.

As traces subsume other pre-aggregated profile kinds, BOLT may drop
support for them soon. Users of pre-aggregated profile format are
advised to migrate to the trace format.

Test Plan: Updated callcont-fallthru.s
2025-02-13 15:14:56 -08:00
Amir Ayupov
e6c9cd9c06
[BOLT] Drop parsing sample PC when processing LBR perf data (#123420)
Remove options to generate autofdo data (unused) and `use-event-pc`
(not beneficial).

Cuts down perf2bolt time for 11GB perf.data by 40s (11:10->10:30).
2025-01-21 09:04:49 -08:00
Paschalis Mpeis
51003076eb
Reapply [BOLT] DataAggregator support for binaries with multiple text segments (#118023)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing, which allows passing
perf events without BOLT having to parse them by invoking 'perf script'.
The flag is used to pass a mock perf profile that has two memory
mappings for a mock binary that has two text segments. The mapping
size is updated as `parseMMapEvents` now processes all text segments.
2024-12-02 09:20:40 +00:00
Hans Wennborg
537343dea4 Revert "[BOLT] DataAggregator support for binaries with multiple text segments (#92815)"
This caused test failures, see comment on the PR:

  Failed Tests (2):
    BOLT-Unit :: Core/./CoreTests/AArch64/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0
    BOLT-Unit :: Core/./CoreTests/X86/MemoryMapsTester/MultipleSegmentsMismatchedBaseAddress/0

> When a binary has multiple text segments, the Size is computed as the
> difference of the last address of these segments from the BaseAddress.
> The base addresses of all text segments must be the same.
>
> Introduces flag 'perf-script-events' for testing. It allows passing perf events
> without BOLT having to parse them using 'perf script'. The flag is used to
> pass a mock perf profile that has two memory mappings for a mock binary
> that has two text segments. The size of the mapping is updated as this
> change `parseMMapEvents` processes all text segments.

This reverts commit 4b71b3782d217db0138b701c4514bd2168ca1659.
2024-11-26 14:59:30 +01:00
Paschalis Mpeis
4b71b3782d
[BOLT] DataAggregator support for binaries with multiple text segments (#92815)
When a binary has multiple text segments, the Size is computed as the
difference of the last address of these segments from the BaseAddress.
The base addresses of all text segments must be the same.

Introduces flag 'perf-script-events' for testing. It allows passing perf events
without BOLT having to parse them using 'perf script'. The flag is used to
pass a mock perf profile that has two memory mappings for a mock binary
that has two text segments. The size of the mapping is updated as this
change `parseMMapEvents` processes all text segments.
2024-11-25 13:12:43 +00:00
Amir Ayupov
74e6478f81
[BOLT] Set call to continuation count in pre-aggregated profile
#109683 identified an issue with pre-aggregated profile where a call to
continuation fallthrough edge count is missing (profile discontinuity).

This issue only affects pre-aggregated profile but not perf data since
LBR stack has the necessary information to determine if the trace (fall-
through) starts at call continuation, whereas pre-aggregated fallthrough
lacks this information.

The solution is to look at branch records in pre-aggregated profiles
that correspond to returns and assign counts to call to continuation
fallthrough:
- BranchFrom is in another function or DSO,
- BranchTo may be a call continuation site:
  - not an entry point/landing pad.

Note that we can't directly check if BranchFrom corresponds to a return
instruction if it's in external DSO.

Keep call continuation handling for perf data (`getFallthroughsInTrace`)
[1] as-is due to marginally better performance. The difference is that
return-converted call to continuation fallthrough is slightly more
frequent than other fallthroughs since the former only requires one LBR
address while the latter need two that belong to the profiled binary.
Hence return-converted fallthroughs have larger "weight" which affects
code layout.

[1] `DataAggregator::getFallthroughsInTrace`
fea18afeed/bolt/lib/Profile/DataAggregator.cpp (L906-L915)

Test Plan: added callcont-fallthru.s

Reviewers: maksfb, ayermolo, ShatianWang, dcci

Reviewed By: maksfb, ShatianWang

Pull Request: https://github.com/llvm/llvm-project/pull/109486
2024-11-07 16:20:19 -08:00
Amir Ayupov
6ee5ff95ab
[BOLT] Add profile density computation
Reuse the definition of profile density from llvm-profgen (#92144):
- the density is computed in perf2bolt using raw samples (perf.data or
  pre-aggregated data),
- function density is the ratio of dynamically executed function bytes
  to the static function size in bytes,
- profile density:
  - functions are sorted by density in decreasing order, accumulating
    their respective sample counts,
  - profile density is the smallest density covering 99% of total sample
    count.

In other words, BOLT binary profile density is the minimum amount of
profile information per function (excluding functions in tail 1% sample
count) which is sufficient to optimize the binary well.

The density threshold of 60 was determined through experiments with
large binaries by reducing the sample count and checking resulting
profile density and performance. The threshold is conservative.

perf2bolt would print the warning if the density is below the threshold
and suggest to increase the sampling duration and/or frequency to reach
a given density, e.g.:
```
BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples.
```

Test Plan: updated pre-aggregated-perf.test

Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe, wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/101094
2024-10-24 18:30:59 -07:00
Amir Ayupov
08916cef7e
[BOLT] Set RawBranchCount in DataAggregator
Align DataAggregator (Linux perf and pre-aggregated profile reader) to
DataReader (fdata profile reader) behavior: set BF->RawBranchCount which
is used in profile density computation (#101094).

Reviewers: ayermolo, maksfb, dcci, rafaelauler, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/101093
2024-10-24 18:28:44 -07:00
Kristof Beyls
6d216fb7b8
[perf2bolt] Improve heuristic to map in-process addresses to specific… (#109397)
… segments in Elf binary.

The heuristic is improved by also taking into account that only
executable segments should contain instructions.

Fixes #109384.
2024-09-23 15:14:51 +02:00
Amir Ayupov
c00c62c113
[BOLT] Add pseudo probe inline tree to YAML profile
Add probe inline tree information to YAML profile, at function level:
- function GUID,
- checksum,
- parent node id,
- call site in the parent.

This information is used for pseudo probe block matching (#99891).

The encoding adds/changes probe information in multiple levels of
YAML profile:
- BinaryProfile: add pseudo_probe_desc with GUIDs and Hashes, which
  permits deduplication of data:
  - many GUIDs are duplicate as the same callee is commonly inlined
    into multiple callers,
  - hashes are also very repetitive, especially for functions with
    low block counts.
- FunctionProfile: add inline tree (see above). Top-level function
  is included as root of function inline tree, which makes guid and
  pseudo_probe_desc_hash fields redundant.
- BlockProfile: densely-encoded block probe information:
  - probes reference their containing inline tree node,
  - separate lists for block, call, indirect call probes,
  - block probe encoding is specialized: ids are encoded as bitset
    in uint64_t. If only block probe with id=1 is present, it's
    encoded as implicit entry (id=0, omitted).
  - inline tree nodes with identical probes share probe description
    where node indices are combined into a list.

On top of #107970, profile with new probe encoding has the following
characteristics (profile for a large binary):

- Profile without probe information: 33MB, 3.8MB compressed (baseline).
- Profile with inline tree information: 92MB, 14MB compressed.

Profile processing time (YAML parsing, inference, attaching steps):
- profile without pseudo probes: 5s,
- profile with pseudo probes, without pseudo probe matching: 11s,
- with pseudo probe matching: 12.5s.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: wlei-llvm, ayermolo, rafaelauler, dcci, maksfb

Reviewed By: wlei-llvm, rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/107137
2024-09-12 20:51:35 -07:00
Amir Ayupov
ccc7a072db
[BOLT] Drop blocks without profile in BAT YAML (#107970)
Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks
without profile:

61372fc5db/bolt/lib/Profile/YAMLProfileWriter.cpp (L162-L176)

Test Plan: NFCI
2024-09-11 16:36:47 -07:00
Amir Ayupov
c820bd3e33
[BOLT][NFC] Rename profile-use-pseudo-probes
The flag currently controls writing of probe information in YAML
profile. #99891 adds a separate flag to use probe information for stale
profile matching. Thus `profile-use-pseudo-probes` becomes a misnomer
and `profile-write-pseudo-probes` better captures the intent.

Reviewers: maksfb, WenleiHe, ayermolo, rafaelauler, dcci

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/106364
2024-09-11 16:27:33 -07:00
Amir Ayupov
ee09f7d1fc
[MC][NFC] Reduce Address2ProbesMap size
Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: https://github.com/llvm/llvm-project/pull/102904
2024-08-26 09:14:35 -07:00
Amir Ayupov
4d19676de4
[BOLT] Add profile-use-pseudo-probes option
Move pseudo probe profile generation under --profile-use-pseudo-probes
option. Note that updating pseudo probes is independent from this flag.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: maksfb, rafaelauler, ayermolo, dcci, WenleiHe

Reviewed By: WenleiHe

Pull Request: https://github.com/llvm/llvm-project/pull/100299
2024-07-24 07:31:01 -07:00
Amir Ayupov
c905db67a0
[BOLT] Attach pseudo probes to blocks in YAML profile
Read pseudo probes in regular and BAT YAML profile generation, and
attach them to YAML profile basic blocks. This exposes GUID, probe id,
and probe type in profile for future use in stale profile matching.

Test Plan: updated pseudoprobe-decoding-inline.test

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/99554
2024-07-18 21:01:40 -07:00
Amir Ayupov
9b007a199d
[BOLT] Expose pseudo probe function checksum and GUID (#99389)
Add a BinaryFunction field for pseudo probe function GUID.
Populate it during pseudo probe section parsing, and emit it in YAML
profile (both regular and BAT), along with function checksum.

To be used for stale function matching.

Test Plan: update pseudoprobe-decoding-inline.test
2024-07-18 20:58:16 -07:00
Amir Ayupov
d1d9545ed3
[BOLT][BAT] Add entries for deleted basic blocks
Deleted basic blocks are required for correct mapping of branches
modified by SCTC.

Increases BAT size, bytes:
- large binary: 8622496 -> 8703244.
- small binary (X86/bolt-address-translation.test): 928 -> 940.

Test Plan: updated bb-with-two-tail-calls.s

Reviewers: ayermolo, dcci, maksfb, rafaelauler

Reviewed By: rafaelauler

Pull Request: https://github.com/llvm/llvm-project/pull/91906
2024-05-23 19:19:07 -07:00
Amir Ayupov
465bfd41fa
[BOLT][NFC] Simplify BBHashMapTy (#91812) 2024-05-22 16:00:51 -07:00
Amir Ayupov
1529ec085a
[BOLT][NFC] Move out PrintProgramStats from Profile into Rewrite (#93075)
Eliminate the dependence of Profile on Passes.

Test Plan: NFC
2024-05-22 13:53:41 -07:00
Amir Ayupov
97025bd9d5
[BOLT] Use getLocationName in YAMLProfileWriter (#92493)
Disambiguate local functions using the containing file symbol in BAT
mode. Make local function naming consistent across BAT fdata and YAML
profiles.

Test Plan: updated register-fragments-bolt-symbols.s
2024-05-21 20:24:46 -07:00
Amir Ayupov
a9b67490b2
[BOLT] Report adjusted program stats from perf2bolt in BAT mode (#91683) 2024-05-21 18:54:15 -07:00
Kazu Hirata
1486653dcf
[BOLT] Use StringRef::contains (NFC) (#92842) 2024-05-20 19:18:45 -07:00
Amir Ayupov
9f15aa009c
[BOLT][NFC] Rename DataAggregator::BranchInfo to TakenBranchInfo
Align the name to its counterpart `FTInfo` which avoids name aliasing
with llvm::bolt::BranchInfo and allows to drop namespace specifier.

Test Plan: NFC

Reviewers: maksfb, rafaelauler, ayermolo, dcci

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/92017
2024-05-16 20:02:51 -07:00
Amir Ayupov
4ecf2caf68
[BOLT] Use aggregated FuncBranchData in writeBATYAML
Switch from FuncBranchData intermediate maps (Intra/InterIndex)
to aggregated Data, same as one used by DataReader:
e62ce1f884/bolt/lib/Profile/DataReader.cpp (L385-L389)
This aligns the order of the output between YAMLProfileWriter and
writeBATYAML.

Test Plan: updated bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: ayermolo, maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/91289
2024-05-13 14:23:32 -07:00
Kazu Hirata
f841ca0c35
Use StringRef::operator== instead of StringRef::equals (NFC) (#91864)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  276 under llvm-project/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-12 23:08:40 -07:00
Amir Ayupov
b5af667b01
[BOLT] Map branch source address to the containing basic block in BAT YAML
Fix an issue where the profile for all branches that have a BRANCHENTRY
is dropped. If the branch has an entry in BAT, it will be translated to
its input offset. We used to only permit the basic block offset as a
branch source. Perform a lookup of containing basic block instead.

Test Plan: Updated bolt-address-translation-yaml.test

Reviewers: maksfb, dcci, rafaelauler, ayermolo

Reviewed By: maksfb

Pull Request: https://github.com/llvm/llvm-project/pull/91273
2024-05-12 17:11:09 -07:00
Amir Ayupov
4f127667ca
[BOLT] Set entry counts in BAT YAML profile (#91775)
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.

Test Plan: updated bolt-address-translation-yaml.test
2024-05-10 22:23:45 -07:00
Amir Ayupov
bbcdd4f4b2
[BOLT] Use disambiguated local names in BAT YAML
Align BAT YAML to fdata profile.

Test Plan: updated register-fragments-bolt-symbols.s

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: dcci

Pull Request: https://github.com/llvm/llvm-project/pull/91773
2024-05-10 22:18:50 -07:00