Reapply #147854 after fixes merged in #151398.
Change memory access histogram storage from uint64_t to uint16_t to
reduce profile size on disk. This change updates the raw profile format
to v5. Also add a histogram test in compiler-rt since we didn't have one
before. With this change the histogram memprof raw for the basic test
reduces from 75KB -> 20KB.
Change memory access histogram storage from uint64_t to uint16_t to
reduce profile size on disk. This change updates the raw profile format
to v5. Also add a histogram test in compiler-rt since we didn't have one
before. With this change the histogram memprof raw for the basic test
reduces from 75KB -> 20KB.
## Purpose
The compiler-rt project `check-same-common-code.test` test case started
failing after #142861 was merged. This change addresses the failure.
## Overview
This patch replicates the changes made by #142861 to
llvm/include/llvm/ProfileData/InstrProfData.inc in the duplicated file
compiler-rt/include/profile/InstrProfData.inc. These files otherwise
match.
## Validation
Locally built `check-profile` target and verified
`check-same-common-code.test` now passes.
The XRay interface header uses no C++ specific features aside from using
the `std` namespace and including the C++ variant of C headers. Yet,
these changes prevent using `xray_interface.h` in external tools relying
on C for different reasons. Make this header C compliant by using C
headers, removing the `std` namespace from `std::size_t` and guard
`extern "C"`.
To make sure that further changes to not break the interface
accidentally, port one test from C++ to C. This requires the C23
standard to officially support the attribute syntax used in this test
case.
Note that this only resolves this issue for `xray_interface.h`.
`xray_records.h` is also not C compliant, but requires more work to
port.
Fixes#139902
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
'list.size()' is determined at runtime, so using static_assert on it as
suggested by the TODO comment is not feasible and produces the following
error when done:
error: static assertion expression is not an integral constant
expression
initially referenced in https://github.com/bitcoin/bitcoin/pull/32024
Co-authored-by: Chand-ra <chandrapratap376@gmail.com>
Add the `__memprof_default_options_str` variable, initialized via the
`-memprof-runtime-default-options` LLVM flag, to hold the default options string
for memprof. This allows us to set these options during compile time in
the clang invocation.
Also update the docs to describe the various ways to set these options.
This patch extends the PGO infrastructure with an option to prefer the
instrumentation of loop entry blocks.
This option is a generalization of
19fb5b467b,
and helps to cover cases where the loop exit is never executed.
An example where this can occur are event handling loops.
Note that change does NOT change the default behavior.
In C++ it's UB to use undeclared values as enum.
And there is support __ATOMIC_HLE_ACQUIRE and
__ATOMIC_HLE_RELEASE need such values.
So use `int` in TSAN interface, and mask out
irrelevant bits and cast to enum ASAP.
`ThreadSanitizer.cpp` already declare morder parameterd
in these functions as `i32`.
This may looks like a slight change, as we
previously didn't mask out additional bits for `fmo`,
and `NoTsanAtomic` call. But from implementation
it's clear that they are expecting exact enum.
Reverts llvm/llvm-project#115032
Reapply llvm/llvm-project#114724
In C++ it's UB to use undeclared values as enum.
And there is support `__ATOMIC_HLE_ACQUIRE` and
`__ATOMIC_HLE_RELEASE` need such values.
Internal implementation was switched to `class enum`,
where that behavior is defined. But interface is C, so
we just switch to `int`.
Include `cstdlib` which was originally included transitively but the
changes to `vector` in libcpp breaks new builds due to missing cstdlib
header for `abort()` function call.
Add explicit symbol visibility macros to InstrProfData.inc
Annotating these symbols will fix missing symbols for InstrProfTest when
using shared library builds on windows with explicit visibility macros
enabled.
Add a empty fallback definition for LLVM_ABI macro so the code works in
compiler-rt.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
window.
```
llvm\lld-link : error : undefined symbol: public: void ValueProfData::deserializeTo(InstrProfRecord&, InstrProfSymtab*)
>>> referenced by unittests\ProfileData\InstrProfTest.cpp:1372 void ValueProfileReadWriteTest_value_prof_data_read_write_Test::TestBody()
```
This fixes remaining issues in my previous PR #90959.
Changes:
- Removed dependency on LLVM header in `xray_interface.cpp`
- Fixed XRay patching for some targets due to missing changes in
architecture-specific patching functions
- Addressed some remaining compiler warnings that I missed in the
previous patch
- Formatting
I have tested these changes on `x86_64` (natively), as well as
`ppc64le`, `aarch64` and `arm32` (cross-compiled and emulated using
qemu).
**Original description:**
This PR introduces shared library (DSO) support for XRay based on a
revised version of the implementation outlined in [this
RFC](https://discourse.llvm.org/t/rfc-upstreaming-dso-instrumentation-support-for-xray/73000).
The feature enables the patching and handling of events from DSOs,
supporting both libraries linked at startup or explicitly loaded, e.g.
via `dlopen`.
This patch adds the following:
- The `-fxray-shared` flag to enable the feature (turned off by default)
- A small runtime library that is linked into every instrumented DSO,
providing position-independent trampolines and code to register with the
main XRay runtime
- Changes to the XRay runtime to support management and patching of
multiple objects
These changes are fully backward compatible, i.e. running without
instrumented DSOs will produce identical traces (in terms of recorded
function IDs) to the previous implementation.
Due to my limited ability to test on other architectures, this feature
is only implemented and tested with x86_64. Extending support to other
architectures is fairly straightforward, requiring only a
position-independent implementation of the architecture-specific
trampoline implementation (see
`compiler-rt/lib/xray/xray_trampoline_x86_64.S` for reference).
This patch does not include any functionality to resolve function IDs
from DSOs for the provided logging/tracing modes. These modes still work
and will record calls from DSOs, but symbol resolution for these
functions in not available. Getting this to work properly requires
recording information about the loaded DSOs and should IMO be discussed
in a separate RFC, as there are mulitple feasible approaches.
---------
Co-authored-by: Sebastian Kreutzer <sebastian.kreutzer@tu-darmstadt.de>
This PR registers the writeout and reset functions for `gcov` for all
modules in the PGO runtime, instead of registering them
using global constructors in each module. The change is made for AIX
only, but the same mechanism works on Linux on Power.
When registering such functions using global constructors in each module
without `-ffunction-sections`, the AIX linker cannot garbage collect
unused undefined symbols, because such symbols are grouped in the same
section as the `__sinit` symbol. Keeping such undefined symbols causes
link errors (see test case
https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1
for this scenario). This PR implements the initialization in the
runtime, hence avoiding introducing `__sinit` into each module.
The implementation adds a new global variable `__llvm_covinit_functions`
to each module. This new global variable contains the function pointers
to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s
section is the named section `__llvm_covinit`. The linker will aggregate
all the `__llvm_covinit` sections from each module
to form one single named section in the final binary. The pair of
functions
```
const __llvm_gcov_init_func_struct *__llvm_profile_begin_covinit();
const __llvm_gcov_init_func_struct *__llvm_profile_end_covinit();
```
are implemented to return the start and end address of this named
section in the final binary, and they are used in function
```
__llvm_profile_gcov_initialize()
```
(which is a constructor function in the runtime) so the runtime knows
the addresses of all the `Writeout` and `Reset` functions from all the
modules.
One noticeable implementation detail relevant to AIX is that to preserve
the `__llvm_covinit` from the linker's garbage collection, a `.ref`
pseudo instruction is inserted into them, referring to the section that
contains the `__llvm_gcov_ctr` variables, which are used in the
instrumented code. The `__llvm_gcov_ctr` variables did not belong to
named sections before, but this PR added them to the
`__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo
instruction that refers to them in the `__llvm_covinit` section.
This PR adds a `__sanitizer_copy_contiguous_container_annotations`
function, which copies annotations from one memory area to another. New
area is annotated in the same way as the old region at the beginning
(within limitations of ASan).
Overlapping case: The function supports overlapping containers, however
no assumptions should be made outside of no false positives in new
buffer area. (It doesn't modify old container annotations where it's not
necessary, false negatives may happen in edge granules of the new
container area.) I don't expect this function to be used with
overlapping buffers, but it's designed to work with them and not result
in incorrect ASan errors (false positives).
If buffers have granularity-aligned distance between them (`old_beg %
granularity == new_beg % granularity`), copying algorithm works faster.
If the distance is not granularity-aligned, annotations are copied byte
after byte.
```cpp
void __sanitizer_copy_contiguous_container_annotations(
const void *old_storage_beg_p, const void *old_storage_end_p,
const void *new_storage_beg_p, const void *new_storage_end_p) {
```
This function aims to help with short string annotations and similar
container annotations. Right now we change trait types of
`std::basic_string` when compiling with ASan and this function purpose
is reverting that change as soon as possible.
87f3407856/libcxx/include/string (L738-L751)
The goal is to not change `__trivially_relocatable` when compiling with
ASan. If this function is accepted and upstreamed, the next step is
creating a function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving
trivially relocatable objects.
11a6799740/libcxx/include/__memory/uninitialized_algorithms.h (L644-L646)
---
I'm thinking if there is a good way to address fact that in a container
the new buffer is usually bigger than the previous one. We may add two
more arguments to the functions to address it (the beginning and the end
of the whole buffer.
Another potential change is removing `new_storage_end_p` as it's
redundant, because we require the same size.
Potential future work is creating a function `__asan_unsafe_memmove`,
which will be basically memmove, but with turned off instrumentation
(therefore it will allow copy data from poisoned area).
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Removes a dependency on LLVM in `xray_interface.cpp` by replacing
`llvm_unreachable` with compiler-rt's `UNREACHABLE`.
Applies clang-format to some unformatted changes.
Original PR: #90959
This PR introduces shared library (DSO) support for XRay based on a
revised version of the implementation outlined in [this
RFC](https://discourse.llvm.org/t/rfc-upstreaming-dso-instrumentation-support-for-xray/73000).
The feature enables the patching and handling of events from DSOs,
supporting both libraries linked at startup or explicitly loaded, e.g.
via `dlopen`.
This patch adds the following:
- The `-fxray-shared` flag to enable the feature (turned off by default)
- A small runtime library that is linked into every instrumented DSO,
providing position-independent trampolines and code to register with the
main XRay runtime
- Changes to the XRay runtime to support management and patching of
multiple objects
These changes are fully backward compatible, i.e. running without
instrumented DSOs will produce identical traces (in terms of recorded
function IDs) to the previous implementation.
Due to my limited ability to test on other architectures, this feature
is only implemented and tested with x86_64. Extending support to other
architectures is fairly straightforward, requiring only a
position-independent implementation of the architecture-specific
trampoline implementation (see
`compiler-rt/lib/xray/xray_trampoline_x86_64.S` for reference).
This patch does not include any functionality to resolve function IDs
from DSOs for the provided logging/tracing modes. These modes still work
and will record calls from DSOs, but symbol resolution for these
functions in not available. Getting this to work properly requires
recording information about the loaded DSOs and should IMO be discussed
in a separate RFC, as there are mulitple feasible approaches.
@petrhosek @jplehr
Pull request for issue #110823
Including the file which defines the macros we use here. This would let
user code only include this interface, rather than having to include two
files.
`counter_bias` is incompatible to Bitmap. The distance between Counters
and Bitmap is different between on-memory sections and profraw image.
Reference to `__llvm_profile_bitmap_bias` is generated only if
`-fcoverge-mcdc` `-runtime-counter-relocation` are specified. The
current implementation rejected their options.
```
Runtime counter relocation is presently not supported for MC/DC bitmaps
```
This change adds a new weak API function which makes the sanitizer
ignore the call to free(), and implements the
functionality in ASan and HWAsan. The runtime that implements this hook
can then call free() at a later point again on the same pointer (and
making sure the hook returns zero so that the memory will actually be
freed) when it's actually ready for the memory to be cleaned up.
This is needed in order to implement an sanitizer-compatible version
of Chrome's BackupRefPtr algorithm, since process-wide double-shimming
of malloc/free does not work on some platforms.
Requested and designed by @c01db33f (Mark) from Project Zero.
---------
Co-authored-by: Mark Brand <markbrand@google.com>
This Memprof preprocessor directives have diverged at some point. This
patch fixes that by copying an additional include from the LLVM side to
the compiler-rt side so the two files are identical again.
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
This diff contains the compiler-rt changes / preparations for nsan.
Test plan:
1. cd build/runtimes/runtimes-bins && ninja check-nsan
2. ninja check-all
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Fixes#83844.
This PR adds callbacks to mark futex syscalls as blocking. Unfortunately
we didn't have a mechanism before to mark syscalls as a blocking call,
so I had to implement it, but it mostly reuses the `BlockingCall`
implementation
[here](96819daa3d/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (L362-L380)).
The issue includes some information but this issue was discovered
because Rust uses futexes directly. So most likely we need to update
Rust as well to use these callbacks.
Also see the latest comments in #85188 for some context.
I also sent another PR #84162 to mark `pthread_*_lock` calls as
blocking.
New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
This is a follow up of
7b9d73c2f9
and
5ef9ba7412
* The definition of `Type::getInt8PtrTy` is deleted. This doesn't cause
a compile error because the `Initializer` part of the macro doesn't run.
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
There were buildbot failures when running memprof tests:
Failed Tests (12):
MemProfiler-x86_64-linux :: TestCases/interface_test.cpp
MemProfiler-x86_64-linux :: TestCases/log_path_test.cpp
MemProfiler-x86_64-linux :: TestCases/memprof_merge_mib.cpp
MemProfiler-x86_64-linux :: TestCases/memprof_profile_dump.cpp
MemProfiler-x86_64-linux :: TestCases/profile_reset.cpp
MemProfiler-x86_64-linux :: TestCases/unaligned_loads_and_stores.cpp
MemProfiler-x86_64-linux-dynamic :: TestCases/interface_test.cpp
MemProfiler-x86_64-linux-dynamic :: TestCases/log_path_test.cpp
MemProfiler-x86_64-linux-dynamic :: TestCases/memprof_merge_mib.cpp
MemProfiler-x86_64-linux-dynamic :: TestCases/memprof_profile_dump.cpp
MemProfiler-x86_64-linux-dynamic :: TestCases/profile_reset.cpp
MemProfiler-x86_64-linux-dynamic ::
TestCases/unaligned_loads_and_stores.cpp
See
- https://lab.llvm.org/buildbot/#/builders/258/builds/8852
- https://lab.llvm.org/buildbot/#/builders/258/builds/12876
I suspect the failure is because when build with
-DLLVM_ENABLE_RUNTIMES=compiler-rt -DCOMPILER_RT_BUILD_SANITIZERS=OFF,
the headers sanitizer/allocator_interface.h and
sanitizer/common_interface_defs.h
are not copied to the build tree, and not installed.
But in the failed memprof tests,
sanitizer/allocator_interface.h or sanitizer/memprof_interface.h is
included.
This patch adds sanitizer/allocator_interface.h and
sanitizer/memprof_interface.h to memprof headers if
COMPILER_RT_BUILD_SANITIZERS is false.
This PR exposes four PGO functions
- `__llvm_profile_set_filename`
- `__llvm_profile_reset_counters`,
- `__llvm_profile_dump`
- `__llvm_orderfile_dump`
to user programs through the new header `instr_prof_interface.h` under
`compiler-rt/include/profile`. This way, the user can include the header
`profile/instr_prof_interface.h` to introduce these four names to their
programs.
Additionally, this PR defines macro `__LLVM_INSTR_PROFILE_GENERATE` when
the program is compiled with profile generation, and defines macro
`__LLVM_INSTR_PROFILE_USE` when the program is compiled with profile
use. `__LLVM_INSTR_PROFILE_GENERATE` together with
`instr_prof_interface.h` define the PGO functions only when the program
is compiled with profile generation. When profile generation is off,
these PGO functions are defined away and leave no trace in the user's
program.
Background:
https://discourse.llvm.org/t/pgo-are-the-llvm-profile-functions-stable-c-apis-across-llvm-releases/75832
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.