New change on top of [reviewed
patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits
after this
one](d0757f46b3).
Previous commits are restored from the remote branch with timestamps.
1. Fix build breakage for non-ELF platforms, by defining the missing
functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`,
`__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`}
everywhere.
* Tested on mac laptop (for darwins) and Windows. Specifically,
functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it
more explicit that type prof isn't supported; see comments for the
reason.
* For the rest (AIX, other), mostly follow existing examples (like this
[one](f95b2f1acf))
2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section
name, and make returned pointers
[const](a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121))
**Original Description**
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
* Raw profile format
- Header: records the byte size of compressed vtable names, and the
number of profiled vtable entries (call it `VTableProfData`). Header
also records padded bytes of each section.
- Payload: adds a section for compressed vtable names, and a section to
store `VTableProfData`. Both sections are padded so the size is a
multiple of 8.
* Indexed profile format
- Header: records the byte offset of compressed vtable names.
- Payload: adds a section to store compressed vtable names. This section
is used by `llvm-profdata` to show the list of vtables profiled for an
instrumented site.
[The originally reviewed
patch](https://github.com/llvm/llvm-project/pull/66825) will have
profile reader/write change and llvm-profdata change.
- To ensure this PR has all the necessary profile format change along
with profile version bump, created a copy of the originally reviewed
patch in https://github.com/llvm/llvm-project/pull/80761. The copy
doesn't have profile format change, but it has the set of tests which
covers type profile generation, profile read and profile merge. Tests
pass there.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
---------
Co-authored-by: modiking <modiking213@gmail.com>
Whether runtime registration is needed is not dependent on the OS but
the file format. For ELF, COFF, Mach-O or XCOFF, we can always use the
linker support. This is important for baremetal platforms such as RTOS
and UEFI platforms where there is no OS but we still don't want to use
runtime registration and rely on linker support instead.
In #74514 and #74778 we marked various instrumentation-added sections as
large. This causes an extra PT_LOAD segment if using the small code
model. Since people using the small code model presumably aren't hitting
relocation limits, disable this when using the small code model to avoid
the extra segment.
This uses Module::getCodeModel() which isn't necessarily reliable since
it reads module metadata (which right now only the clang frontend sets),
but it would be nice to get to a point where we reliably put this sort
of information (e.g. PIC/code model/etc) in the IR. This requires
duplicating the existing tests since opt/llc currently don't set these
metadata. If we get to a point where they do set the code model metadata
based on command line arguments then we can deduplicate these tests.
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.
Similar to what was done for asan_globals in #74514.
This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.
The reland fixes platform.ll.
We'd like to make various instrprof globals large to make them not
contribute to relocation pressure since there are no direct accesses
to them in the module.
Similar to what was done for asan_globals in #74514.
This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names
sections.
If we did, we couldn't lower symmetric transfer resumes to tail calls.
We can instrument the other 2 edges instead, as long as they also don't
point to the same basic block.
Fixes a bug introduced by
commit f95b2f1acf11 ("Reland [InstrProf][compiler-rt] Enable MC/DC
Support in LLVM Source-based Code Coverage (1/3)")
The InstrProfiling pass was refactored when introducing support for
MC/DC such that the creation of the data variable was abstracted and
called only once per function from ::run(). Because ::run() only
iterated over functions there were not fully inlined, and because it
only created the data variable for the first intrinsic that it saw, data
variables corresponding to functions fully inlined into other
instrumented callers would end up without a data variable, resulting in
loss of coverage information. This patch does the following:
1.) Move the call of createDataVariable() to getOrCreateRegionCounters()
so that the creation of the data variable will happen indirectly either
from ::new() or during profile intrinsic lowering when it is needed.
This effectively restores the behavior prior to the refactor and ensures
that all data variables are created when needed (and not duplicated).
2.) Process all MC/DC bitmap parameter intrinsics in ::run() prior to
calling getOrCreateRegionCounters(). This ensures bitmap regions are
created for each function including functions that are fully inlined. It
also ensures that the bitmap region is created for each function prior
to the creation of the data variable because it is referenced by the
data variable. Again, duplication is prevented if the same parameter
intrinsic is inlined into multiple functions.
3.) No longer pass the MC/DC intrinsic to createDataVariable(). This
decouples the creation of the data variable from a specific MC/DC
intrinsic. Instead, with #2 above, store the number of bitmap bytes
required in the PerFunctionProfileData in the ProfileDataMap along with
the function's CounterRegion and BitmapRegion variables. This ties the
bitmap information directly to the function to which it belongs, and the
data variable created for that function can reference that.
Fixes a bug introduced by
commit f95b2f1acf11 ("Reland [InstrProf][compiler-rt] Enable MC/DC
Support in LLVM Source-based Code Coverage (1/3)")
createDataVariable() needs to check that a data variable wasn't already
created before creating it. Previously, this was done inadvertantly in
getOrCreateRegionCounters(), which checked that the RegionCounters was
not created multiple times before creating the counter section and the
data variable. When the creation of the data variable was abstracted
into its own function (createDataVariable()), there was no corresponding
check. This was failing on a case in which an instrumented function was
being inlined into multiple functions and a duplicate data variable was
created, which led to a segfault in emitNameData(). Test case added
based on the repro that also ensures a single data variable was created
in this case.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
This reverts commit 32db121b29f78e4c41116b2a8f1c730f9522b202 and subsequent commits.
This causes time regression on llvm-cov even with debug info correlation off.
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D157913
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
`instrprof` currently does not set `__llvm_prf_vnds`'s alignment after creating it. The consequence is that the alignment is set to 16 later (c0f3ac1d00/llvm/lib/IR/DataLayout.cpp (L1019)). This can lead to undefined behaviour when we calculate `NumVNodes` in `lprofGetLoadModuleSignature` (c0f3ac1d00/compiler-rt/lib/profile/InstrProfilingMerge.c (L32)). The reason is that when the `__llvm_prf_vnds` array is 16 byte aligned, `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` may not be a multiple of the size of ValueProfNode (which is 24, 20 on 32 bit targets).
This patch sets `__llvm_prf_vnds`'s alignment to its ABI alignment, which always divides its size. Then `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` will be a multiple of `sizeof(ValueProfNode)`.
Reviewed By: w2yehia, MaskRay
Differential Revision: https://reviews.llvm.org/D144302
Class InstrProfIncrementInstStep inherits from InstrProfIncrementInst but cannot cast to InstrProfIncrementInst, because InstrProfIncrementInst::classof does not cover such circumstance。
Function InstrProfiling::run traverse all instruction in a module and try to cast them to InstrProfIncrementInst using dyn_cast, but it will return nullptr if the instruction is InstrProfIncrementInstStep(subclass of InstrProfIncrementInst).
Reviewed By: tejohnson, ellis
Differential Revision: https://reviews.llvm.org/D141579
1) Use a static array of pointer to retain the dummy vars.
2) Associate liveness of the array with that of the runtime hook variable
__llvm_profile_runtime.
3) Perform the runtime initialization through the runtime hook variable.
4) Preserve the runtime hook variable using the -u linker flag.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D136192
COFF has a verifier check that private global variables don't have a comdat of the same name.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D131043
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision. I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.
Differential Revision: https://reviews.llvm.org/D127506
Profiling stopped working for us after D98061, which was largely a
Fuschia-specific patch but in one place used `isOSBinFormatELF` to
make a decision. I'm adding a PS4/PS5 exception to that, so we can
get profiling to work again.
Differential Revision: https://reviews.llvm.org/D127506
This patch switches the PGO implementation on AIX from using the runtime
registration-based section tracking to the __start_SECNAME/__stop_SECNAME
based. In order to enable the recognition of __start_SECNAME/__stop_SECNAME
symbols in the AIX linker, the -bdbg:namedsects:ss needs to be used.
Reviewed By: jsji, MaskRay, davidxl
Differential Revision: https://reviews.llvm.org/D124857
LTO objects might compiled with different `mbranch-protection` flags which will cause an error in the linker.
Such a setup is allowed in the normal build with this change that is possible.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D123493
NVPTX does not support generating binary files, which is required for
these tests.
The majority of tests in 'DebugInfo/Generic' also require emitting
object files, so they all are disabled for NVPTX.
Differential Revision: https://reviews.llvm.org/D121996
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
This reverts commit 800bf8ed29fbcaa9436540e83bc119ec92e7d40f.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565
The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114268
We generate symbols like `profc`/`profd` for each function, and put them into csects.
When there are weak functions, we generate weak symbols for the functions as well,
with ELF (and some others), linker (binder) will discard and only keep one copy of the weak symbols.
However, on AIX, the current binder can NOT discard the weak symbols if we put all of them into the same csect,
as binder can NOT discard a subset of a csect.
This creates a unique challenge for using those symbols to calculate some relative offsets.
This patch changed the linkage of `profc`/`profd` symbols to be private, so that all the profc/profd for each weak symbol will be *local* to objects, and all kept in the csect, so we won't have problem. Although only one of the counters will be used, all the pointer in the profd is correct.
The downside is that we won't be able to discard the duplicated counters and profile data,
but those can not be discarded even if we keep the weak linkage,
due to the binder limitation of not discarding a subsect of the csect either .
Reviewed By: Whitney, MaskRay
Differential Revision: https://reviews.llvm.org/D110422