llvm-project

Author	SHA1	Message	Date
NAKAMURA Takumi	d2f77eb8ec	[MC/DC][Coverage] Introduce "Bitmap Bias" for continuous mode (#96126 ) `counter_bias` is incompatible to Bitmap. The distance between Counters and Bitmap is different between on-memory sections and profraw image. Reference to `__llvm_profile_bitmap_bias` is generated only if `-fcoverge-mcdc` `-runtime-counter-relocation` are specified. The current implementation rejected their options. ``` Runtime counter relocation is presently not supported for MC/DC bitmaps ```	2024-07-31 10:14:12 +09:00
xur-llvm	b1ca2a9546	[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535 ) In comparison to non-instrumented binaries, PGO instrumentation binaries can be significantly slower. For highly threaded programs, this slowdown can reach 10x due to data races or false sharing within counters. This patch incorporates sampling into the PGO instrumentation process to enhance the speed of instrumentation binaries. The fundamental concept is similar to the one proposed in https://reviews.llvm.org/D63949. Three sampling modes are introduced: 1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1. 2. Fast Burst Sampling: When not using simple sampling, and '-sampled-instr-period' is set to 65535. This is the default mode of sampling. 3. Full Burst Sampling: When neither simple nor fast burst sampling is used. Utilizing this sampled instrumentation significantly improves the binary's execution speed. Measurements show up to 5x speedup with default settings. Fast burst sampling now results in only around 20% to 30% slowdown (compared to 8 to 10x slowdown without sampling). Out tests show that profile quality remains good with sampling, with edge counts typically showing more than 90% overlap. For applications whose behavior changes due to binary speed, sampling instrumentation can enhance performance. Observations have shown some apps experiencing up to a ~2% improvement in PGO. A potential drawback of this patch is the increased binary size and compilation time. The Sampling method in this patch does not improve single threaded program instrumentation binary speed.	2024-07-22 09:19:17 -07:00
NAKAMURA Takumi	cfc22605f6	InstrProf: Mark BiasLI as invariant. (#95588 ) Bias doesn't change after startup. The test is enhanced for optimized sequences and atomic ops.	2024-07-20 10:44:30 +09:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
NAKAMURA Takumi	b347a720bf	[MC/DC][Coverage] Make tvbitmapupdate capable of atomic write (#96042 ) This also introduces "Test and conditional Read-Modify-Write". The flow to `atomicrmw or` is marked as `unlikely`.	2024-06-26 12:07:56 +09:00
NAKAMURA Takumi	a0e1b4a244	[MC/DC][Coverage] Split out Read-modfy-Write to rmw_or(ptr,i8) (#96040 ) `rmw_or` is defined as "private alwaysinline". At the moment, it has just only simple "Read, Or, and Write", which is just same as the current implementation.	2024-06-22 10:09:32 +09:00
NAKAMURA Takumi	a512854249	InstProfiling: Give the name to profc_bias. NFC. (#95587 )	2024-06-19 10:02:56 +09:00
NAKAMURA Takumi	139f896c0f	InstrProfiling: Split creating Bias offset to getOrCreateBiasVar(Name). NFC. (#95692 )	2024-06-19 08:54:08 +09:00
NAKAMURA Takumi	85a7bba7d2	Cleanup MC/DC intrinsics for #82448 (#95496 ) 3rd arg of `tvbitmap.update` was made unused. Remove 3rd arg. Sweep `condbitmap.update`, since it is no longer used.	2024-06-16 09:04:51 +09:00
NAKAMURA Takumi	71f8b441ed	Reapply: [MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 ) By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 -- Change(s) since llvmorg-19-init-14288-g7ead2d8c7e91 - Update compiler-rt/test/profile/ContinuousSyncMode/image-with-mcdc.c	2024-06-14 19:31:56 +09:00
Hans Wennborg	b422fa6b62	Revert "[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 )" This broke the lit tests on Mac: https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/1096/ > By storing possible test vectors instead of combinations of conditions, > the restriction is dramatically relaxed. > > This introduces two options to `cc1`: > > * `-fmcdc-max-conditions=32767` > * `-fmcdc-max-test-vectors=2147483646` > > This change makes coverage mapping, profraw, and profdata incompatible > with Clang-18. > > - Bitmap semantics changed. It is incompatible with previous format. > - `BitmapIdx` in `Decision` points to the end of the bitmap. > - Bitmap is packed per function. > - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. > > RFC: > https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 This reverts commit 7ead2d8c7e9114b3f23666209a1654939987cb30.	2024-06-14 10:47:41 +02:00
NAKAMURA Takumi	7ead2d8c7e	[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 ) By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798	2024-06-13 20:09:02 +09:00
Mingming Liu	1351d17826	[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825 ) (The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7	2024-04-01 08:52:35 -07:00
Mingming Liu	87e7140f72	[nfc][InstrProfiling]For comdat setting helper function, move comment closer to the code (#83757 )	2024-03-04 09:46:34 -08:00
Mingming Liu	f83858f87c	[nfc][InstrProfiling]Compute a boolean state as a constant and use it everywhere (#83756 )	2024-03-04 08:56:54 -08:00
NAKAMURA Takumi	cc53707a5c	LLVMInstrumentation: Simplify mcdc.tvbitmap.update with GEP.	2024-02-25 11:21:46 +09:00
Petr Hosek	60c4f82d3c	[InstrProfiling] No runtime registration for ELF, COFF, Mach-O and XCOFF (#77225 ) Whether runtime registration is needed is not dependent on the OS but the file format. For ELF, COFF, Mach-O or XCOFF, we can always use the linker support. This is important for baremetal platforms such as RTOS and UEFI platforms where there is no OS but we still don't want to use runtime registration and rely on linker support instead.	2024-01-07 16:07:17 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Mircea Trofin	a06c7d9e5f	[NFC][InstrProf] Rename internal `InstrProfiling` to `InstrLowerer` (#75139 ) Captures its responsibility a bit better.	2023-12-12 10:58:17 -08:00
Mircea Trofin	6ed1daa0c9	[NFC][InstrProf] Move `InstrProfiling` to the .cpp file (#75018 )	2023-12-11 15:42:57 -08:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Arthur Eubanks	66b919cb29	Reland [InstrProf][X86] Mark non-directly accessed globals as large (#74778 ) We'd like to make various instrprof globals large to make them not contribute to relocation pressure since there are no direct accesses to them in the module. Similar to what was done for asan_globals in #74514. This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names sections. The reland fixes platform.ll.	2023-12-08 09:54:57 -08:00
Arthur Eubanks	96a5135e56	Revert "[InstrProf][X86] Mark non-directly accessed globals as large (#74778 )" This reverts commit 5507f70cc205a7ec21d264a64c703b3d314b998c. Breaks bots, e.g. https://lab.llvm.org/buildbot/#/builders/232/builds/16374	2023-12-08 09:41:31 -08:00
Arthur Eubanks	5507f70cc2	[InstrProf][X86] Mark non-directly accessed globals as large (#74778 ) We'd like to make various instrprof globals large to make them not contribute to relocation pressure since there are no direct accesses to them in the module. Similar to what was done for asan_globals in #74514. This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names sections.	2023-12-08 09:33:40 -08:00
Mircea Trofin	284da049f5	[coro][pgo] Don't promote pgo counters in the suspend basic block (#71263 ) If a suspend happens in the resume part (this can happen in the case of chained coroutines), and that's part of a loop, the pre-split CFG has the suspend block as an exit of that loop. PGO Counter Promotion will then try to commit the temporary counter to the global in that "exit" block (it also does that in the other loop exit BBs, which also includes the "destroy" case). This interferes with symmetric transfer. We don't need to commit the counter in the suspend case - it's not a loop exit from the perspective of the behavior of the program. The regular loop exit, together with the "destroy" case, completely cover any updates that may need to happen to the global counter.	2023-11-30 11:58:26 -08:00
David Li	44c5593cd5	Fix stale comment (#73846 ) Fix stale comment.	2023-11-29 12:58:42 -08:00
Youngsuk Kim	3f225708c4	[llvm][InstrProfiling] Remove ptr-to-ptr bitcasts (NFC) Opaque ptr cleanup effort (NFC).	2023-11-17 09:57:10 -06:00
Fangrui Song	7ca135cd86	[Instrumentation] Remove unneeded pointer casts and migrate away from getInt8PtrTy. NFC After opaque pointer migration, getInt8PtrTy() is considered legacy. Replace it with getPtrTy(), and while here, remove some unneeded pointer casts.	2023-11-15 12:50:49 -08:00
Alan Phipps	78702d3ad9	[InstrProfiling] Ensure data variables are always created for inlined functions (#72069 ) Fixes a bug introduced by commit f95b2f1acf11 ("Reland [InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)") The InstrProfiling pass was refactored when introducing support for MC/DC such that the creation of the data variable was abstracted and called only once per function from ::run(). Because ::run() only iterated over functions there were not fully inlined, and because it only created the data variable for the first intrinsic that it saw, data variables corresponding to functions fully inlined into other instrumented callers would end up without a data variable, resulting in loss of coverage information. This patch does the following: 1.) Move the call of createDataVariable() to getOrCreateRegionCounters() so that the creation of the data variable will happen indirectly either from ::new() or during profile intrinsic lowering when it is needed. This effectively restores the behavior prior to the refactor and ensures that all data variables are created when needed (and not duplicated). 2.) Process all MC/DC bitmap parameter intrinsics in ::run() prior to calling getOrCreateRegionCounters(). This ensures bitmap regions are created for each function including functions that are fully inlined. It also ensures that the bitmap region is created for each function prior to the creation of the data variable because it is referenced by the data variable. Again, duplication is prevented if the same parameter intrinsic is inlined into multiple functions. 3.) No longer pass the MC/DC intrinsic to createDataVariable(). This decouples the creation of the data variable from a specific MC/DC intrinsic. Instead, with #2 above, store the number of bitmap bytes required in the PerFunctionProfileData in the ProfileDataMap along with the function's CounterRegion and BitmapRegion variables. This ties the bitmap information directly to the function to which it belongs, and the data variable created for that function can reference that.	2023-11-14 12:37:58 -06:00
JOE1994	c42d006f05	[llvm][InstrProfiling] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque ptr cleanup effort (NFC).	2023-11-12 13:44:06 -05:00
Alan Phipps	d3d49bca3e	[InstrProfiling] Don't attempt to create duplicate data variables. (#71998 ) Fixes a bug introduced by commit f95b2f1acf11 ("Reland [InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)") createDataVariable() needs to check that a data variable wasn't already created before creating it. Previously, this was done inadvertantly in getOrCreateRegionCounters(), which checked that the RegionCounters was not created multiple times before creating the counter section and the data variable. When the creation of the data variable was abstracted into its own function (createDataVariable()), there was no corresponding check. This was failing on a case in which an instrumented function was being inlined into multiple functions and a duplicate data variable was created, which led to a segfault in emitNameData(). Test case added based on the repro that also ensures a single data variable was created in this case.	2023-11-11 18:34:29 -06:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Zequan Wu	db7a1ed9a2	Revert "[Profile] Refactor profile correlation. (#70712 )" This reverts commit 4b383d0af93136b80841fc140da0823dfc441dd4.	2023-10-31 10:53:45 -04:00
Zequan Wu	4b383d0af9	[Profile] Refactor profile correlation. (#70712 ) Refactor some code from https://github.com/llvm/llvm-project/pull/69493. Rebase of https://github.com/llvm/llvm-project/pull/69656 on top of main as it was messed up.	2023-10-31 10:41:01 -04:00
Alan Phipps	f95b2f1acf	Reland "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)" Part 1 of 3. This includes the LLVM back-end processing and profile reading/writing components. compiler-rt changes are included. Differential Revision: https://reviews.llvm.org/D138846	2023-10-30 11:15:02 -05:00
Tom Stellard	e7247f1010	[profiling] Move option declarations into headers This will make it possible to add visibility attributes to these variables. This also fixes some type mismatches between the declaration and the definition. Reviewed By: bogner, huangjd Differential Revision: https://reviews.llvm.org/D156599	2023-09-30 18:51:28 -07:00
Zequan Wu	4d5d9a5390	Revert "[Coverage] Allow Clang coverage to be used with debug info correlation." This reverts commit 32db121b29f78e4c41116b2a8f1c730f9522b202 and subsequent commits. This causes time regression on llvm-cov even with debug info correlation off.	2023-09-26 20:57:09 -04:00
Youngsuk Kim	e5026f0179	[llvm] Remove uses of Type::getPointerTo() (NFC) Partial progress towards removing in-tree uses of `getPointerTo()`, by employing the following options: * Drop the call entirely if the sole purpose of it is to support a no-op bitcast (remove the no-op bitcast as well). * Replace with `PointerType::get()`/`PointerType::getUnqual()` This is a NFC cleanup effort. Reviewed By: barannikov88 Differential Revision: https://reviews.llvm.org/D155232	2023-09-22 19:44:38 -04:00
Hans Wennborg	53a2923bf6	Revert "[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3)" This seems to cause Clang to crash, see comments on the code review. Reverting until the problem can be investigated. > Part 1 of 3. This includes the LLVM back-end processing and profile > reading/writing components. compiler-rt changes are included. > > Differential Revision: https://reviews.llvm.org/D138846 This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.	2023-09-21 12:20:24 +02:00
Alan Phipps	a50486fd73	[InstrProf][compiler-rt] Enable MC/DC Support in LLVM Source-based Code Coverage (1/3) Part 1 of 3. This includes the LLVM back-end processing and profile reading/writing components. compiler-rt changes are included. Differential Revision: https://reviews.llvm.org/D138846	2023-09-19 17:07:23 -05:00
Zequan Wu	32db121b29	[Coverage] Allow Clang coverage to be used with debug info correlation. Debug info correlation is an option in InstrProfiling pass, which is used by both IR instrumentation and front-end instrumentation. So, Clang coverage can also benefits the binary size saving from it. Reviewed By: ellis Differential Revision: https://reviews.llvm.org/D157913	2023-09-15 13:47:23 -04:00
Ellis Hoag	d687caae00	[InstrProf] Emit warnings when correlating lightweight profiles Emit warnings when `InstrProfCorrelator` finds problems with debug info for lightweight instrumentation profile correlation. To prevent excessive printing, only emit the first 5 warnings. In addition, remove a diagnostic about missing debug info in `InstrProfiling.cpp`. Some compiler-generated functions, e.g., `__clang_call_terminate`, does not emit debug info and will fail a build if `-Werror` is used. This warning is not actionable by the user and I have not seen non-compiler-generated functions fail this test. Reviewed By: smeenai Differential Revision: https://reviews.llvm.org/D156006	2023-08-15 15:28:16 -07:00
Ellis Hoag	244be0b0de	[InstrProf] Temporal Profiling As described in [0], this extends IRPGO to support //Temporal Profiling//. When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile. Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively. In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup. Special thanks to Julian Mestre for helping with reservoir sampling. [0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068 Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D147287	2023-04-11 08:30:52 -07:00
Ellis Hoag	167e8f8b6b	[InstrProf] Minimal Block Coverage This diff implements minimal block coverage instrumentation. When the `-pgo-block-coverage` option is used, basic blocks will be instrumented for block coverage using single byte booleans. The coverage of some basic blocks can be inferred from others, so not every basic block is instrumented. In fact, we found that only ~60% of basic blocks need to be instrumented. These differences lead to less size overhead when compared to instrumenting block counts. For example, block coverage on the clang binary has an overhead of 20 Mi (17%) compared to 56 Mi (47%) with block counts. Even though block coverage profiles have less precision than block count profiles, they can still be used to guide optimizations. In `PGOUseFunc` we use block coverage to populate edge weights such that BFI gives nonzero counts to only covered blocks. We do this by 1) setting the entry count of covered functions to a large value, i.e., 10000 and 2) populating edge weights using block coverage. In the next diff https://reviews.llvm.org/D125743 we use BFI to guide the machine outliner to avoid outlining covered blocks. This `-pgo-block-coverage` option provides a trade off of generating less precise profiles for faster and smaller instrumented binaries. The `BlockCoverageInference` class defines the algorithm to find the minimal set of basic blocks that need to be instrumented for coverage. This is different from the Kirchhoff circuit law optimization that is used for edge counts because that does not work for block coverage. The reason for this is that edge counts can be added together to find a missing count while block coverage cannot since they store boolean values. So we need a new algorithm to find which blocks must be instrumented. The details on this algorithm can be found in this paper titled "Minimum Coverage Instrumentation": https://arxiv.org/abs/2208.13907 Special thanks to Julian Mestre for creating this block coverage inference algorithm. Binary size of `clang` using `-O2`: * Base * `.text`: 65.8 Mi * Total: 119 Mi * IRPGO (`-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate`) * `.text`: 93.0 Mi * `__llvm_prf_cnts`: 14.5 Mi * Total: 175 Mi * Minimal Block Coverage (`-fprofile-generate -mllvm -disable-vp -mllvm -debug-info-correlate -mllvm -pgo-block-coverage`) * `.text`: 82.1 Mi * `__llvm_prf_cnts`: 1.38 Mi * Total: 139 Mi Reviewed By: spupyrev, kyulee Differential Revision: https://reviews.llvm.org/D124490	2023-03-29 16:24:20 -07:00
Qiongsi Wu	401f768061	[PGO] Setting ValueProfNode Array's Alignment `instrprof` currently does not set `__llvm_prf_vnds`'s alignment after creating it. The consequence is that the alignment is set to 16 later (`c0f3ac1d00/llvm/lib/IR/DataLayout.cpp (L1019)`). This can lead to undefined behaviour when we calculate `NumVNodes` in `lprofGetLoadModuleSignature` (`c0f3ac1d00/compiler-rt/lib/profile/InstrProfilingMerge.c (L32)`). The reason is that when the `__llvm_prf_vnds` array is 16 byte aligned, `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` may not be a multiple of the size of ValueProfNode (which is 24, 20 on 32 bit targets). This patch sets `__llvm_prf_vnds`'s alignment to its ABI alignment, which always divides its size. Then `__llvm_profile_end_vnodes() - __llvm_profile_begin_vnodes()` will be a multiple of `sizeof(ValueProfNode)`. Reviewed By: w2yehia, MaskRay Differential Revision: https://reviews.llvm.org/D144302	2023-02-23 19:07:29 -05:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Paul Kirth	f2916becc5	Reland [pgo] Avoid introducing relocations by using private alias In many cases, we can use an alias to avoid a symbolic relocations, instead of using the public, interposable symbol. When the instrumented function is in a COMDAT, we can use a hidden alias, and still avoid references to discarded sections. Previous versions of this patch allowed the compiler to name the generated alias, but that would only be valid when the functions were local. Since the alias may be used across TUs we use a more deterministic naming convention, and add a ".local" suffix to the alias name just as we do for relative vtables aliases. https://reviews.llvm.org/rG20894a478da224bdd69c91a22a5175b28bc08ed9 removed an incorrect assertion on Mach-O which caused assertion failures in LLD. We prevent duplicate symbols under ThinLTO + PGO + CFI by disabling alias generation when the target function has MD_type metadata used in CFI. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D137982	2023-01-25 22:19:19 +00:00
Arthur Eubanks	1f3f3c0ea7	Revert "Reland [pgo] Avoid introducing relocations by using private alias" This reverts commit da5a8d14b8cc6cea16ee0929413c0672b47c93d9. Causes more duplicate symbol errors, see https://bugs.chromium.org/p/chromium/issues/detail?id=1408161.	2023-01-19 10:20:38 -08:00

1 2 3 4 5 ...

294 Commits