319 Commits

Author SHA1 Message Date
Joseph Huber
ffd6a13b5f
[compiler-rt] Rework profile data handling for GPU targets (#187136)
Summary:
Currently, the GPU iterates through all of the present symbols and
copies them by prefix. This is inefficient as it requires a lot of small
high-latency data transfers rather than a few large ones. Additionally,
we force every single profiling symbol to have protected visibility.
This means potentially hundreds of unnecessary symbols in the symbol
table.

This PR changes the interface to move towards the start / stop section
handling. AMDGPU supports this natively as an ELF target, so we need
little changes. Instead of overriding visibility, we use a single table
to define the bounds that we can obtain with one contiguous load.

Using a table interface should also work for the in-progress HIP
implementation for this, as it wraps the start / stop sections into
standard void pointers which will be inside of an already mapped region
of memory, so they should be accessible from the HIP API.

NVPTX is more difficult as it is an ELF platform without this support. I
have hooked up the 'Other' handling to work around this, but even then
it's a bit of a stretch. I could remove this support here, but I wanted
to demonstrate that we can share the ABI. However, NVPTX will only work
if we force LTO and change the backend to emit variables in the same

TL;DR, we now do this:
```c
struct { start1, stop1, start2, stop2, start3, stop3, version; } device;
struct host = DtoH(lookup("device"));
counters = DtoH(host.stop - host.start)
version = DtoH(host.version);
```
2026-03-26 10:17:43 -05:00
Joseph Huber
d18a784d41
[compiler-rt] Define GPU specific handling of profiling functions (#185763)
Summary:
The changes in https://www.github.com/llvm/llvm-project/pull/185552
allowed us to
start building the standard `libclang_rt.profile.a` for GPU targets.
This PR expands this by adding an optimized GPU routine for counter
increment and removing the special-case handling of these functions in
the OpenMP runtime.

Vast majority of these functions are boilerplate, but we should be able
to do more interesting things with this in the future, like value or
memory profiling.
2026-03-19 10:51:48 -05:00
Fangrui Song
7d6a642161
Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options. NFC (#186044)
Similar to commit 557efc9a8b68628c2c944678c6471dac30ed9e8e (2022).

cl::ZeroOrMore is the default for cl::list and is unnecessary for
cl::opt
since the "may only occur zero or one times!" error was removed.
Also remove cl::init(false) on modified cl::opt<bool> lines.
2026-03-12 07:17:57 +00:00
Shilei Tian
70905e0afa
[RFC][IR] Remove Constant::isZeroValue (#181521)
`Constant::isZeroValue` currently behaves same as
`Constant::isNullValue` for all types except floating-point, where it
additionally returns true for negative zero (`-0.0`). However, in
practice, almost all callers operate on integer/pointer types where the
two are equivalent, and the few FP-relevant callers have no meaningful
dependence on the `-0.0` behavior.

This PR removes `isZeroValue` to eliminate the confusing API. All
callers are changed to `isNullValue` with no test failures.

`isZeroValue` will be reintroduced in a future change with clearer
semantics: when null pointers may have non-zero bit patterns,
`isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will
check for the semantic null (which
may be non-zero).
2026-02-15 12:06:42 -05:00
Jameson Nash
d10b2b566a
[NFCI] replace getValueType with new getGlobalSize query (#177186)
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
2026-01-22 13:55:53 -05:00
Aiden Grossman
86b0acd35f
[InstrProf] Mark __llvm_profile_runtime_user cold (#174174)
This function is only created to use a global so that it does not get
pruned by the compiler. We could probably get rid of it
(https://reviews.llvm.org/D98325), but there are some complications
around certain platforms. Given that it will never be called, mark it as
cold.
2026-01-05 08:06:39 -08:00
Ethan Luis McDonough
38cade7cc6
[PGO][Offload] Fix missing names bug in GPU PGO (#166444)
After #163011 was merged, the tests in
[`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1)
broke because the offload plugins were no longer able to find
`__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm`
visible to the host on GPU targets and reverses the changes made in
f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-10 10:11:53 -06:00
Ellis Hoag
cc1022ca0b
[InstrProf] Remove deprecated -debug-info-correlate flag (#165289) 2025-10-30 09:03:45 -07:00
Yi-Chi Lee
964b4abe6c
[Instrumentation] Fix typos across files in Transforms/Instrumentation (#165251)
Closes #165240.
2025-10-27 16:23:45 +01:00
Ethan Luis McDonough
67ff66e677
[PGO][Offload] Fix offload coverage mapping (#143490)
This pull request fixes coverage mapping on GPU targets. 

- It adds an address space cast to the coverage mapping generation pass.
- It reads the profiled function names from the ELF directly. Reading it
from public globals was causing issues in cases where multiple
device-code object files are linked together.
2025-06-10 20:19:38 -05:00
Andrew Rogers
b2584e0b17
[llvm] annotate interfaces in llvm/Transforms for DLL export (#143413)
## Purpose

This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.

## Background

This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).

The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.

The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.

## Validation

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
2025-06-10 08:10:17 -07:00
Kazu Hirata
d328510f23
[Instrumentation] Remove an unused local variable (NFC) (#138383) 2025-05-03 07:04:33 -07:00
Nikita Popov
eea1efed30 [InstrProfiling] Avoid unnecessary bitcast (NFC)
Not needed with opaque pointers.
2025-04-23 15:29:49 +02:00
Mircea Trofin
a6208ce4c1
[nfc] move isPresplitCoroSuspendExitEdge to Analysis/CFG (#135849) 2025-04-15 15:07:03 -07:00
Nikita Popov
979c275097
[IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Jeremy Morse
8e70273509
[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.

This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.

We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
2025-01-24 10:53:11 +00:00
Kazu Hirata
4d12a14357
[Instrumentation] Remove unused includes (NFC) (#115117)
Identified with misc-include-cleaner.
2024-11-06 08:36:34 -08:00
Michael O'Farrell
b4fcaa137f
[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350)
This corrects a couple off by ones related to the sampling of
**instrumented** counters, and enables setting 100% rates for burst
sampling (burst duration = period).

Off by ones:
Prior to this change it was impossible to set a period of 65535 because
this was converted to fast sampling which rollsover at USHRT_MAX + 1
(65536). Similarly the burst durations would collect burst duration + 1
counts as they used an ULE comparison.

100% sampling:
Although this is not useful for a productionized use case, it does allow
for more deterministic testing with the sampling checks in place. After
all the off by ones are fixed, allowing for 100% sampling is a matter of
letting burst duration = period.
2024-10-22 16:01:13 -07:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Yuta Saito
d4efc3e097
[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332)
Currently, WebAssembly/WASI target does not provide direct support for
code coverage.
This patch set fixes several issues to unlock the feature. The main
changes are:

1. Port `compiler-rt/lib/profile` to WebAssembly/WASI.
2. Adjust profile metadata sections for Wasm object file format.
- [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections
instead of data segments.
    - [lld] Align the interval space of custom sections at link time.
- [llvm-cov] Copy misaligned custom section data if the start address is
not aligned.
    - [llvm-cov] Read `__llvm_prf_names` from data segments
3. [clang] Link with profile runtime libraries if requested

See each commit message for more details and rationale.
This is part of the effort to add code coverage support in Wasm target
of Swift toolchain.
2024-10-15 02:41:43 +09:00
NAKAMURA Takumi
6c331e50e4
[MC/DC] Rework tvbitmap.update to get rid of the inlined function (#110792)
Per the discussion in #102542, it is safe to insert BBs under
`lowerIntrinsics()` since #69535 has made tolerant of modifying BBs.

So, I can get rid of using the inlined function `rmw_or`, introduced in
#96040.
2024-10-03 17:57:03 +09:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Antonio Frighetto
2ae968a0d9
[Instrumentation] Move out to Utils (NFC) (#108532)
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
2024-09-15 21:07:40 -07:00
Ethan Luis McDonough
fde2d23ee2
[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
gulfemsavrun
f5b81aa6ec
[InstrProf] Support conditional counter updates (#102542)
This patch adds support for conditional counter updates in single byte
counters mode to reduce the write contention by first checking whether
the counter is set before overwriting it.

---------

Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>
2024-08-16 12:33:50 -07:00
NAKAMURA Takumi
d2f77eb8ec
[MC/DC][Coverage] Introduce "Bitmap Bias" for continuous mode (#96126)
`counter_bias` is incompatible to Bitmap. The distance between Counters
and Bitmap is different between on-memory sections and profraw image.

Reference to `__llvm_profile_bitmap_bias` is generated only if
`-fcoverge-mcdc` `-runtime-counter-relocation` are specified. The
current implementation rejected their options.

```
Runtime counter relocation is presently not supported for MC/DC bitmaps
```
2024-07-31 10:14:12 +09:00
xur-llvm
b1ca2a9546
[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535)
In comparison to non-instrumented binaries, PGO instrumentation binaries
can be significantly slower. For highly threaded programs, this slowdown
can
reach 10x due to data races or false sharing within counters.

This patch incorporates sampling into the PGO instrumentation process to
enhance the speed of instrumentation binaries. The fundamental concept
is similar to the one proposed in https://reviews.llvm.org/D63949.

Three sampling modes are introduced:
1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1.
2. Fast Burst Sampling: When not using simple sampling, and 
'-sampled-instr-period' is set to 65535. This is the default mode of
sampling.
3. Full Burst Sampling: When neither simple nor fast burst sampling is
used.

Utilizing this sampled instrumentation significantly improves the
binary's
execution speed. Measurements show up to 5x speedup with default
settings. Fast burst sampling now results in only around 20% to 30%
slowdown (compared to 8 to 10x slowdown without sampling).

Out tests show that profile quality remains good with sampling,
with edge counts typically showing more than 90% overlap.
For applications whose behavior changes due to binary speed,
sampling instrumentation can enhance performance.
Observations have shown some apps experiencing up to
a ~2% improvement in PGO.

A potential drawback of this patch is the increased binary size
and compilation time. The Sampling method in this patch does
not improve single threaded program instrumentation binary
speed.
2024-07-22 09:19:17 -07:00
NAKAMURA Takumi
cfc22605f6
InstrProf: Mark BiasLI as invariant. (#95588)
Bias doesn't change after startup.

The test is enhanced for optimized sequences and atomic ops.
2024-07-20 10:44:30 +09:00
Ethan Luis McDonough
2c8b912f63
Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4
[PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
NAKAMURA Takumi
b347a720bf
[MC/DC][Coverage] Make tvbitmapupdate capable of atomic write (#96042)
This also introduces "Test and conditional Read-Modify-Write". The flow
to `atomicrmw or` is marked as `unlikely`.
2024-06-26 12:07:56 +09:00
NAKAMURA Takumi
a0e1b4a244
[MC/DC][Coverage] Split out Read-modfy-Write to rmw_or(ptr,i8) (#96040)
`rmw_or` is defined as "private alwaysinline". At the moment, it has
just only simple "Read, Or, and Write", which is just same as the
current implementation.
2024-06-22 10:09:32 +09:00
NAKAMURA Takumi
a512854249
InstProfiling: Give the name to profc_bias. NFC. (#95587) 2024-06-19 10:02:56 +09:00
NAKAMURA Takumi
139f896c0f
InstrProfiling: Split creating Bias offset to getOrCreateBiasVar(Name). NFC. (#95692) 2024-06-19 08:54:08 +09:00
NAKAMURA Takumi
85a7bba7d2
Cleanup MC/DC intrinsics for #82448 (#95496)
3rd arg of `tvbitmap.update` was made unused. Remove 3rd arg.

Sweep `condbitmap.update`, since it is no longer used.
2024-06-16 09:04:51 +09:00
NAKAMURA Takumi
71f8b441ed Reapply: [MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448)
By storing possible test vectors instead of combinations of conditions,
the restriction is dramatically relaxed.

This introduces two options to `cc1`:

* `-fmcdc-max-conditions=32767`
* `-fmcdc-max-test-vectors=2147483646`

This change makes coverage mapping, profraw, and profdata incompatible
with Clang-18.

- Bitmap semantics changed. It is incompatible with previous format.
- `BitmapIdx` in `Decision` points to the end of the bitmap.
- Bitmap is packed per function.
- `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`.

RFC:
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798

--
Change(s) since llvmorg-19-init-14288-g7ead2d8c7e91

- Update compiler-rt/test/profile/ContinuousSyncMode/image-with-mcdc.c
2024-06-14 19:31:56 +09:00
Hans Wennborg
b422fa6b62 Revert "[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448)"
This broke the lit tests on Mac:
https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/1096/

> By storing possible test vectors instead of combinations of conditions,
> the restriction is dramatically relaxed.
>
> This introduces two options to `cc1`:
>
> * `-fmcdc-max-conditions=32767`
> * `-fmcdc-max-test-vectors=2147483646`
>
> This change makes coverage mapping, profraw, and profdata incompatible
> with Clang-18.
>
> - Bitmap semantics changed. It is incompatible with previous format.
> - `BitmapIdx` in `Decision` points to the end of the bitmap.
> - Bitmap is packed per function.
> - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`.
>
> RFC:
> https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798

This reverts commit 7ead2d8c7e9114b3f23666209a1654939987cb30.
2024-06-14 10:47:41 +02:00
NAKAMURA Takumi
7ead2d8c7e
[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448)
By storing possible test vectors instead of combinations of conditions,
the restriction is dramatically relaxed.

This introduces two options to `cc1`:

* `-fmcdc-max-conditions=32767`
* `-fmcdc-max-test-vectors=2147483646`

This change makes coverage mapping, profraw, and profdata incompatible
with Clang-18.

- Bitmap semantics changed. It is incompatible with previous format.
- `BitmapIdx` in `Decision` points to the end of the bitmap.
- Bitmap is packed per function.
- `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`.

RFC:
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798
2024-06-13 20:09:02 +09:00
Mingming Liu
1351d17826
[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825)
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)

* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
 
* Implement profile reader and writer support 
  * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
  * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
  * Indexed profile writer collects the list of vtable names, and stores that to index profiles.
  * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.

rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
2024-04-01 08:52:35 -07:00
Mingming Liu
87e7140f72
[nfc][InstrProfiling]For comdat setting helper function, move comment closer to the code (#83757) 2024-03-04 09:46:34 -08:00
Mingming Liu
f83858f87c
[nfc][InstrProfiling]Compute a boolean state as a constant and use it everywhere (#83756) 2024-03-04 08:56:54 -08:00
NAKAMURA Takumi
cc53707a5c LLVMInstrumentation: Simplify mcdc.tvbitmap.update with GEP. 2024-02-25 11:21:46 +09:00
Petr Hosek
60c4f82d3c
[InstrProfiling] No runtime registration for ELF, COFF, Mach-O and XCOFF (#77225)
Whether runtime registration is needed is not dependent on the OS but
the file format. For ELF, COFF, Mach-O or XCOFF, we can always use the
linker support. This is important for baremetal platforms such as RTOS
and UEFI platforms where there is no OS but we still don't want to use
runtime registration and rely on linker support instead.
2024-01-07 16:07:17 -08:00
Zequan Wu
ab3430f891
[Profile] Add binary profile correlation for code coverage. (#69493)
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.

## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.

## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL
```

A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
2023-12-14 14:16:38 -05:00
Mircea Trofin
a06c7d9e5f
[NFC][InstrProf] Rename internal InstrProfiling to InstrLowerer (#75139)
Captures its responsibility a bit better.
2023-12-12 10:58:17 -08:00
Mircea Trofin
6ed1daa0c9
[NFC][InstrProf] Move InstrProfiling to the .cpp file (#75018) 2023-12-11 15:42:57 -08:00
Mircea Trofin
1d608fc755
[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970)
Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`.

A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.
2023-12-10 18:03:08 -08:00
Arthur Eubanks
66b919cb29 Reland [InstrProf][X86] Mark non-directly accessed globals as large (#74778)
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.

Similar to what was done for asan_globals in #74514.

This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.

The reland fixes platform.ll.
2023-12-08 09:54:57 -08:00
Arthur Eubanks
96a5135e56 Revert "[InstrProf][X86] Mark non-directly accessed globals as large (#74778)"
This reverts commit 5507f70cc205a7ec21d264a64c703b3d314b998c.

Breaks bots, e.g. https://lab.llvm.org/buildbot/#/builders/232/builds/16374
2023-12-08 09:41:31 -08:00
Arthur Eubanks
5507f70cc2
[InstrProf][X86] Mark non-directly accessed globals as large (#74778)
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.

Similar to what was done for asan_globals in #74514.

This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.
2023-12-08 09:33:40 -08:00