204 Commits

Author SHA1 Message Date
Ellis Hoag
11d3074267 [InstrProf] Add single byte coverage mode
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
  * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
  * We use a single byte per function rather than 8 bytes per block

The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.

When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).

[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D116180
2022-01-27 17:38:55 -08:00
Ellis Hoag
5b9358d774 [InstrProf][NFC] Add InstrProfInstBase base
The `InstrProfInstBase` class is for all `llvm.instrprof.*` intrinsics. In a
later diff we will add new instrinsic of this type. Also refactor some
logic in `InstrProfiling.cpp`.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D117261
2022-01-18 11:12:00 -08:00
Kazu Hirata
b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Kazu Hirata
e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887ee47f3ec8a030e9211169ef4fb094c3.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata
fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Ellis Hoag
a699b2f1c0 [InstrProf] Mark counters as used in debug correlation mode
In debug info correlation mode we do not emit the data globals so we
need to explicitly mark the counter globals as used so they don't get
stripped.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115981
2021-12-30 14:50:45 -08:00
Ellis Hoag
65d7fd0239 [Try2][InstrProf] Add Correlator class to read debug info
Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB).

This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

This was first landed in https://reviews.llvm.org/D114566 but had to be reverted due to build errors.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115915
2021-12-17 10:45:59 -08:00
Ellis Hoag
bdc68ee70f Revert "[InstrProf] Add Correlator class to read debug info"
Also reverts an attempt to fix the build errors https://reviews.llvm.org/D115911

The original diff https://reviews.llvm.org/D114566 causes some build
errors that I need to investigate.

https://lab.llvm.org/buildbot/#/builders/118/builds/7037

This reverts commit 95946d2f8589b5450d8f9a9e9b4f7adf44386f8b.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115913
2021-12-16 16:28:19 -08:00
Ellis Hoag
95946d2f85 [InstrProf] Add Correlator class to read debug info
Extend `llvm-profdata` to read in a `.proflite` file and also a debug info file to generate a normal `.profdata` profile. This reduces the binary size by 8.4% when building an instrumented Clang binary without value profiling (164 MB vs 179 MB).

This work is part of the "lightweight instrumentation" RFC: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D114566
2021-12-16 15:18:12 -08:00
Ellis Hoag
58d9c1aec8 [Try2][InstrProf] Attach debug info to counters
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.

Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115693
2021-12-16 14:20:30 -08:00
Ellis Hoag
c809da7d9c Revert "[InstrProf] Attach debug info to counters"
This reverts commit 800bf8ed29fbcaa9436540e83bc119ec92e7d40f.

The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.

Differential Revision: https://reviews.llvm.org/D115689
2021-12-13 18:15:17 -08:00
Ellis Hoag
800bf8ed29 [InstrProf] Attach debug info to counters
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.

Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D114565
2021-12-13 17:51:22 -08:00
Ellis Hoag
9e647806f3 [InstrProf][NFC] Refactor ProfileDataMap usage
Instead of using `DenseMap::find()` and `DenseMap::insert()`, use
`DenseMap::operator[]` to get a reference to the profile data and update
the reference. This simplifies the changes in D114565.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D114828
2021-12-01 11:47:14 -08:00
Roland McGrath
b72b56016a NFC: clang-format lib/Transforms/Instrumentation/InstrProfiling.cpp
Differential Revision: https://reviews.llvm.org/D114343
2021-11-21 18:16:02 -08:00
Ellis Hoag
de11de308b [InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment
The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D114268
2021-11-19 15:45:14 -08:00
Simon Pilgrim
21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
Jinsong Ji
25c30324e9 [AIX] Change the linkage of profiling counter/data to be private
We generate symbols like `profc`/`profd` for each function, and put them into csects.
When there are weak functions,  we generate weak symbols for the functions as well,
with ELF (and some others),  linker (binder) will discard and only keep one copy of the weak symbols.

However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.

This creates a unique challenge for using those symbols to calculate some relative offsets.

This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.

The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .

Reviewed By: Whitney, MaskRay

Differential Revision: https://reviews.llvm.org/D110422
2021-09-29 00:47:25 +00:00
Kazu Hirata
24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Fangrui Song
68745a557e [InstrProfiling] Use llvm.compiler.used if applicable for Mach-O
Similar to D97585.

D25456 used `S_ATTR_LIVE_SUPPORT` to ensure the data variable will be retained
or discarded as a unit with the counter variable, so llvm.compiler.used is
sufficient. It allows ld to dead strip unneeded profc and profd variables.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D105445
2021-09-01 14:46:51 -07:00
Fangrui Song
9ab9a9595b [InstrProfiling] Keep profd non-private for non-renamable comdat functions
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).

If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.

The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108432
2021-08-24 20:14:03 -07:00
Fangrui Song
32e2326cda Revert D108432 "[InstrProfiling] Keep profd non-private for non-renamable comdat functions"
This reverts commit f653beea88d2684cdc8117e662b321ba04666771.

It broke Windows coverage-inline.cpp because link.exe has a limitation
that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work.

It essentially dropped the previous size optimization for coverage
because coverage doesn't rename comdat by default.
Needs more investigation what we should do.
2021-08-24 19:16:07 -07:00
Fangrui Song
f653beea88 [InstrProfiling] Keep profd non-private for non-renamable comdat functions
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).

If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.

The fix is straightforward: just keep non-private profd in this case.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108432
2021-08-24 15:59:35 -07:00
Fangrui Song
77b435aaa1 Revert "[InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)"
This reverts commit fbb8e772ec501a1b71643db90e9c6445e17d7cac.

Accidentally pushed.
2021-08-19 16:42:57 -07:00
Fangrui Song
fbb8e772ec [InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)
The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to
a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is
not really dropped, so it can cause duplicate definition error.
2021-08-19 16:38:32 -07:00
Petr Hosek
1c84167149 [InstrProfiling][NFC] Initialize MadeChange variable
This addresses an issue introduced in 389dc94d4be7 which triggers
a crash on Windows.
2021-08-17 23:33:38 -07:00
Petr Hosek
389dc94d4b [InstrProfiling] Generate runtime hook for Fuchsia
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.

This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a75221ca, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.

Differential Revision: https://reviews.llvm.org/D98061
2021-08-10 23:21:15 -07:00
Petr Hosek
c0c1c3cf93 Revert "[InstrProfiling] Emit bias variable eagerly"
This reverts commit 6660cec568504df47d9becb0c552c20577880df8 since
it was superseded by https://reviews.llvm.org/D98061.
2021-08-10 23:21:15 -07:00
Petr Hosek
6660cec568 [InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.

Differential Revision: https://reviews.llvm.org/D107377
2021-08-04 10:17:08 -07:00
Fangrui Song
a1532ed275 [InstrProfiling] Make CountersPtr in __profd_ relative
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.

```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```

(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Differential Revision: https://reviews.llvm.org/D104556
2021-07-30 11:52:18 -07:00
Fangrui Song
3924877932 [IR] Rename comdat noduplicates to comdat nodeduplicate
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Petr Hosek
54902e00d1 [InstrProfiling] Use weak alias for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.

This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.

We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-19 12:23:51 -07:00
Nico Weber
a92964779c Revert "[InstrProfiling] Use external weak reference for bias variable"
This reverts commit 33a7b4d9d8e6a113108aa71ed78ca32a83c68523.
Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176
2021-07-02 09:05:12 -04:00
Petr Hosek
33a7b4d9d8 [InstrProfiling] Use external weak reference for bias variable
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.

This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.

Differential Revision: https://reviews.llvm.org/D105176
2021-07-01 15:25:31 -07:00
Fangrui Song
3307240f05 [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00
Fangrui Song
5798be8458 Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF"
This reverts commit 76d0747e0807307780ba84cbd7e5c80b20c26bd7.

If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.

Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.

```
group section [   66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
   [Index]    Name
   [   67]   __llvm_prf_cnts
   [   68]   __llvm_prf_vals
   [   69]   __llvm_prf_data
   [   70]   .rela__llvm_prf_data
```
2021-06-17 23:38:17 -07:00
Fangrui Song
76d0747e08 [InstrProfiling] Make __profd_ unconditionally private for ELF
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-17 14:16:54 -07:00
Fangrui Song
9e51d1f348 [InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.

(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)

On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.

Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:

#BEFORE

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867

$ du -cksh base_unittests.exe
82M     base_unittests.exe
82M     total

# AFTER

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499

$ du -cksh base_unittests.exe
78M     base_unittests.exe
78M     total
```

The change is NFC for Mach-O.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103372
2021-06-04 13:27:56 -07:00
Nico Weber
e9a9c85098 Revert "[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat"
This reverts commit a14fc749aab2c8e1a45d19d512255ebfc69357c3.
Breaks check-profile on macOS. See https://reviews.llvm.org/D103372 for details.
2021-06-04 10:00:12 -04:00
Fangrui Song
a14fc749aa [InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.

(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)

On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.

Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:

#BEFORE

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867

$ du -cksh base_unittests.exe
82M     base_unittests.exe
82M     total

# AFTER

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499

$ du -cksh base_unittests.exe
78M     base_unittests.exe
78M     total
```

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103372
2021-06-03 13:16:13 -07:00
Fangrui Song
87c43f3aa9 [InstrProfiling] Delete linkage/visibility toggling for Windows
The linkage/visibility of `__profn_*` variables are derived
from the profiled functions.

    extern_weak => linkonce
    available_externally => linkonce_odr
    internal => private
    extern => private
    _ => unchanged

The linkage/visibility of `__profc_*`/`__profd_*` variables are derived from
`__profn_*` with linkage/visibility wrestling for Windows.

The changes can be folded to the following without changing semantics.

```
if (TT.isOSBinFormatCOFF() && !NeedComdat) {
  Linkage = GlobalValue::InternalLinkage;
  Visibility = GlobalValue::DefaultVisibility;
}
```

That said, I think we can just delete the code block.

An extern/internal function will now use private `__profc_*`/`__profd_*`
variables, instead of internal ones. This saves some symbol table entries.

A non-comdat {linkonce,weak}_odr function will now use hidden external
`__profc_*`/`__profd_*` variables instead of internal ones.  There is potential
object file size increase because such symbols need `/INCLUDE:` directives.
However such non-comdat functions are rare (note that non-comdat weak
definitions don't prevent duplicate definition error).

The behavior changes match ELF.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D103355
2021-06-02 16:49:54 -07:00
Reid Kleckner
8f20ac9595 [PGO] Don't reference functions unless value profiling is enabled
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.

This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.

PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.

Recording the address of every function with code coverage drastically
increases code size. Consider this program:

  void foo();
  void bar();
  inline void inlineMe(int x) {
    if (x > 0)
      foo();
    else
      bar();
  }
  int getVal();
  int main() { inlineMe(getVal()); }

With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.

One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.

Differential Revision: https://reviews.llvm.org/D102818
2021-05-20 11:09:24 -07:00
Hans Wennborg
f50aef745c Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user"
This broke the check-profile tests on Mac, see comment on the code
review.

> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325

This reverts commit c7712087cbb505d324e1149fa224f607c91a8c6a.

Also reverting the dependent follow-up commit:

Revert "[InstrProfiling] Generate runtime hook for ELF platforms"

> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a75221ca, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061

This reverts commit 87fd09b25f8892e07b7ba11525baa9c3ec3e5d3f.
2021-03-12 13:53:46 +01:00
Petr Hosek
87fd09b25f [InstrProfiling] Generate runtime hook for ELF platforms
When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.

This approach was already used prior to 9a041a75221ca, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.

Differential Revision: https://reviews.llvm.org/D98061
2021-03-11 12:29:01 -08:00
Petr Hosek
c7712087cb [InstrProfiling] Don't generate __llvm_profile_runtime_user
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.

Differential Revision: https://reviews.llvm.org/D98325
2021-03-10 22:33:51 -08:00
Fangrui Song
a84f4fc0df [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Nico Weber
64f5d7e972 Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f417683e7c31b3ee3381a3263106f48c5.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Fangrui Song
04c3040f41 [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on other COFF/Mach-O are not affected.

Differential Revision: https://reviews.llvm.org/D97649
2021-03-01 13:43:23 -08:00
Fangrui Song
bf176c49e8 [InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF
Many optimizers (e.g.  GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.

The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection.  We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.

In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.

Differential Revision: https://reviews.llvm.org/D97585
2021-02-26 16:14:03 -08:00
James Y Knight
24539f1ef2 Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg.
And then push those change throughout LLVM.

Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).

The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.

Differential Revision: https://reviews.llvm.org/D97223
2021-02-25 18:29:42 -05:00
Petr Hosek
c24b7a16b1 [InstrProfiling] Use ELF section groups for counters, data and values
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.

The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.

Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.

Differential Revision: https://reviews.llvm.org/D96757
2021-02-22 14:00:02 -08:00