llvm-project

Author	SHA1	Message	Date
Joseph Huber	ffd6a13b5f	[compiler-rt] Rework profile data handling for GPU targets (#187136 ) Summary: Currently, the GPU iterates through all of the present symbols and copies them by prefix. This is inefficient as it requires a lot of small high-latency data transfers rather than a few large ones. Additionally, we force every single profiling symbol to have protected visibility. This means potentially hundreds of unnecessary symbols in the symbol table. This PR changes the interface to move towards the start / stop section handling. AMDGPU supports this natively as an ELF target, so we need little changes. Instead of overriding visibility, we use a single table to define the bounds that we can obtain with one contiguous load. Using a table interface should also work for the in-progress HIP implementation for this, as it wraps the start / stop sections into standard void pointers which will be inside of an already mapped region of memory, so they should be accessible from the HIP API. NVPTX is more difficult as it is an ELF platform without this support. I have hooked up the 'Other' handling to work around this, but even then it's a bit of a stretch. I could remove this support here, but I wanted to demonstrate that we can share the ABI. However, NVPTX will only work if we force LTO and change the backend to emit variables in the same TL;DR, we now do this: ```c struct { start1, stop1, start2, stop2, start3, stop3, version; } device; struct host = DtoH(lookup("device")); counters = DtoH(host.stop - host.start) version = DtoH(host.version); ```	2026-03-26 10:17:43 -05:00
Joseph Huber	d18a784d41	[compiler-rt] Define GPU specific handling of profiling functions (#185763 ) Summary: The changes in https://www.github.com/llvm/llvm-project/pull/185552 allowed us to start building the standard `libclang_rt.profile.a` for GPU targets. This PR expands this by adding an optimized GPU routine for counter increment and removing the special-case handling of these functions in the OpenMP runtime. Vast majority of these functions are boilerplate, but we should be able to do more interesting things with this in the future, like value or memory profiling.	2026-03-19 10:51:48 -05:00
Fangrui Song	7d6a642161	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options. NFC (#186044 ) Similar to commit 557efc9a8b68628c2c944678c6471dac30ed9e8e (2022). cl::ZeroOrMore is the default for cl::list and is unnecessary for cl::opt since the "may only occur zero or one times!" error was removed. Also remove cl::init(false) on modified cl::opt<bool> lines.	2026-03-12 07:17:57 +00:00
Shilei Tian	70905e0afa	[RFC][IR] Remove `Constant::isZeroValue` (#181521 ) `Constant::isZeroValue` currently behaves same as `Constant::isNullValue` for all types except floating-point, where it additionally returns true for negative zero (`-0.0`). However, in practice, almost all callers operate on integer/pointer types where the two are equivalent, and the few FP-relevant callers have no meaningful dependence on the `-0.0` behavior. This PR removes `isZeroValue` to eliminate the confusing API. All callers are changed to `isNullValue` with no test failures. `isZeroValue` will be reintroduced in a future change with clearer semantics: when null pointers may have non-zero bit patterns, `isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will check for the semantic null (which may be non-zero).	2026-02-15 12:06:42 -05:00
Jameson Nash	d10b2b566a	[NFCI] replace getValueType with new getGlobalSize query (#177186 ) Returns uint64_t to simplify callers. The goal is eventually replace getValueType with this query, which should return the known minimum reference-able size, as provided (instead of a Type) during create. Additionally the common isSized query would be replaced with an isExactKnownSize query to test if that size is an exact definition.	2026-01-22 13:55:53 -05:00
Aiden Grossman	86b0acd35f	[InstrProf] Mark __llvm_profile_runtime_user cold (#174174 ) This function is only created to use a global so that it does not get pruned by the compiler. We could probably get rid of it (https://reviews.llvm.org/D98325), but there are some complications around certain platforms. Given that it will never be called, mark it as cold.	2026-01-05 08:06:39 -08:00
Ethan Luis McDonough	38cade7cc6	[PGO][Offload] Fix missing names bug in GPU PGO (#166444 ) After #163011 was merged, the tests in [`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1) broke because the offload plugins were no longer able to find `__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm` visible to the host on GPU targets and reverses the changes made in f7e9968a5ba99521e6e51161f789f0cc1745193f.	2025-11-10 10:11:53 -06:00
Ellis Hoag	cc1022ca0b	[InstrProf] Remove deprecated -debug-info-correlate flag (#165289 )	2025-10-30 09:03:45 -07:00
Yi-Chi Lee	964b4abe6c	[Instrumentation] Fix typos across files in Transforms/Instrumentation (#165251 ) Closes #165240.	2025-10-27 16:23:45 +01:00
Ethan Luis McDonough	67ff66e677	[PGO][Offload] Fix offload coverage mapping (#143490 ) This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.	2025-06-10 20:19:38 -05:00
Andrew Rogers	b2584e0b17	[llvm] annotate interfaces in llvm/Transforms for DLL export (#143413 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Transforms` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). The bulk of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The following manual adjustments were also applied after running IDS on Linux: - Removed a redundant `operator<<` from Attributor.h. IDS only auto-annotates the 1st declaration, and the 2nd declaration being un-annotated resulted in an "inconsistent linkage" error on Windows when building LLVM as a DLL. - `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and remove the local declaration of the `vfs::FileSystem` class. This is required because exporting the `PGOInstrumentationUse` constructor requires the class be fully defined because it is used by an argument. - Add #include "llvm/Support/Compiler.h" to files where it was not auto-added by IDS due to no pre-existing block of include statements. - Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported instantiated templates. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-10 08:10:17 -07:00
Kazu Hirata	d328510f23	[Instrumentation] Remove an unused local variable (NFC) (#138383 )	2025-05-03 07:04:33 -07:00
Nikita Popov	eea1efed30	[InstrProfiling] Avoid unnecessary bitcast (NFC) Not needed with opaque pointers.	2025-04-23 15:29:49 +02:00
Mircea Trofin	a6208ce4c1	[nfc] move `isPresplitCoroSuspendExitEdge` to Analysis/CFG (#135849 )	2025-04-15 15:07:03 -07:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Jeremy Morse	8e70273509	[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583 ) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).	2025-01-24 10:53:11 +00:00
Kazu Hirata	4d12a14357	[Instrumentation] Remove unused includes (NFC) (#115117 ) Identified with misc-include-cleaner.	2024-11-06 08:36:34 -08:00
Michael O'Farrell	b4fcaa137f	[PGO][SampledInstr] Correct off by 1s and allow 100% sampling (#113350 ) This corrects a couple off by ones related to the sampling of instrumented counters, and enables setting 100% rates for burst sampling (burst duration = period). Off by ones: Prior to this change it was impossible to set a period of 65535 because this was converted to fast sampling which rollsover at USHRT_MAX + 1 (65536). Similarly the burst durations would collect burst duration + 1 counts as they used an ULE comparison. 100% sampling: Although this is not useful for a productionized use case, it does allow for more deterministic testing with the sampling checks in place. After all the off by ones are fixed, allowing for 100% sampling is a matter of letting burst duration = period.	2024-10-22 16:01:13 -07:00
Rahul Joshi	6924fc0326	[LLVM] Add `Intrinsic::getDeclarationIfExists` (#112428 ) Add `Intrinsic::getDeclarationIfExists` to lookup an existing declaration of an intrinsic in a `Module`.	2024-10-16 07:21:10 -07:00
Yuta Saito	d4efc3e097	[Coverage][WebAssembly] Add initial support for WebAssembly/WASI (#111332 ) Currently, WebAssembly/WASI target does not provide direct support for code coverage. This patch set fixes several issues to unlock the feature. The main changes are: 1. Port `compiler-rt/lib/profile` to WebAssembly/WASI. 2. Adjust profile metadata sections for Wasm object file format. - [CodeGen] Emit `__llvm_covmap` and `__llvm_covfun` as custom sections instead of data segments. - [lld] Align the interval space of custom sections at link time. - [llvm-cov] Copy misaligned custom section data if the start address is not aligned. - [llvm-cov] Read `__llvm_prf_names` from data segments 3. [clang] Link with profile runtime libraries if requested See each commit message for more details and rationale. This is part of the effort to add code coverage support in Wasm target of Swift toolchain.	2024-10-15 02:41:43 +09:00
NAKAMURA Takumi	6c331e50e4	[MC/DC] Rework tvbitmap.update to get rid of the inlined function (#110792 ) Per the discussion in #102542, it is safe to insert BBs under `lowerIntrinsics()` since #69535 has made tolerant of modifying BBs. So, I can get rid of using the inlined function `rmw_or`, introduced in #96040.	2024-10-03 17:57:03 +09:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Antonio Frighetto	2ae968a0d9	[Instrumentation] Move out to Utils (NFC) (#108532 ) Utility functions have been moved out to Utils. Minor opportunity to drop the header where not needed.	2024-09-15 21:07:40 -07:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
gulfemsavrun	f5b81aa6ec	[InstrProf] Support conditional counter updates (#102542 ) This patch adds support for conditional counter updates in single byte counters mode to reduce the write contention by first checking whether the counter is set before overwriting it. --------- Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com>	2024-08-16 12:33:50 -07:00
NAKAMURA Takumi	d2f77eb8ec	[MC/DC][Coverage] Introduce "Bitmap Bias" for continuous mode (#96126 ) `counter_bias` is incompatible to Bitmap. The distance between Counters and Bitmap is different between on-memory sections and profraw image. Reference to `__llvm_profile_bitmap_bias` is generated only if `-fcoverge-mcdc` `-runtime-counter-relocation` are specified. The current implementation rejected their options. ``` Runtime counter relocation is presently not supported for MC/DC bitmaps ```	2024-07-31 10:14:12 +09:00
xur-llvm	b1ca2a9546	[PGO] Sampled instrumentation in PGO to speed up instrumentation binary (#69535 ) In comparison to non-instrumented binaries, PGO instrumentation binaries can be significantly slower. For highly threaded programs, this slowdown can reach 10x due to data races or false sharing within counters. This patch incorporates sampling into the PGO instrumentation process to enhance the speed of instrumentation binaries. The fundamental concept is similar to the one proposed in https://reviews.llvm.org/D63949. Three sampling modes are introduced: 1. Simple Sampling: When '-sampled-instr-bust-duration' is set to 1. 2. Fast Burst Sampling: When not using simple sampling, and '-sampled-instr-period' is set to 65535. This is the default mode of sampling. 3. Full Burst Sampling: When neither simple nor fast burst sampling is used. Utilizing this sampled instrumentation significantly improves the binary's execution speed. Measurements show up to 5x speedup with default settings. Fast burst sampling now results in only around 20% to 30% slowdown (compared to 8 to 10x slowdown without sampling). Out tests show that profile quality remains good with sampling, with edge counts typically showing more than 90% overlap. For applications whose behavior changes due to binary speed, sampling instrumentation can enhance performance. Observations have shown some apps experiencing up to a ~2% improvement in PGO. A potential drawback of this patch is the increased binary size and compilation time. The Sampling method in this patch does not improve single threaded program instrumentation binary speed.	2024-07-22 09:19:17 -07:00
NAKAMURA Takumi	cfc22605f6	InstrProf: Mark BiasLI as invariant. (#95588 ) Bias doesn't change after startup. The test is enhanced for optimized sequences and atomic ops.	2024-07-20 10:44:30 +09:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
NAKAMURA Takumi	b347a720bf	[MC/DC][Coverage] Make tvbitmapupdate capable of atomic write (#96042 ) This also introduces "Test and conditional Read-Modify-Write". The flow to `atomicrmw or` is marked as `unlikely`.	2024-06-26 12:07:56 +09:00
NAKAMURA Takumi	a0e1b4a244	[MC/DC][Coverage] Split out Read-modfy-Write to rmw_or(ptr,i8) (#96040 ) `rmw_or` is defined as "private alwaysinline". At the moment, it has just only simple "Read, Or, and Write", which is just same as the current implementation.	2024-06-22 10:09:32 +09:00
NAKAMURA Takumi	a512854249	InstProfiling: Give the name to profc_bias. NFC. (#95587 )	2024-06-19 10:02:56 +09:00
NAKAMURA Takumi	139f896c0f	InstrProfiling: Split creating Bias offset to getOrCreateBiasVar(Name). NFC. (#95692 )	2024-06-19 08:54:08 +09:00
NAKAMURA Takumi	85a7bba7d2	Cleanup MC/DC intrinsics for #82448 (#95496 ) 3rd arg of `tvbitmap.update` was made unused. Remove 3rd arg. Sweep `condbitmap.update`, since it is no longer used.	2024-06-16 09:04:51 +09:00
NAKAMURA Takumi	71f8b441ed	Reapply: [MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 ) By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 -- Change(s) since llvmorg-19-init-14288-g7ead2d8c7e91 - Update compiler-rt/test/profile/ContinuousSyncMode/image-with-mcdc.c	2024-06-14 19:31:56 +09:00
Hans Wennborg	b422fa6b62	Revert "[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 )" This broke the lit tests on Mac: https://green.lab.llvm.org/job/llvm.org/job/clang-stage1-RA/1096/ > By storing possible test vectors instead of combinations of conditions, > the restriction is dramatically relaxed. > > This introduces two options to `cc1`: > > * `-fmcdc-max-conditions=32767` > * `-fmcdc-max-test-vectors=2147483646` > > This change makes coverage mapping, profraw, and profdata incompatible > with Clang-18. > > - Bitmap semantics changed. It is incompatible with previous format. > - `BitmapIdx` in `Decision` points to the end of the bitmap. > - Bitmap is packed per function. > - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. > > RFC: > https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 This reverts commit 7ead2d8c7e9114b3f23666209a1654939987cb30.	2024-06-14 10:47:41 +02:00
NAKAMURA Takumi	7ead2d8c7e	[MC/DC][Coverage] Loosen the limit of NumConds from 6 (#82448 ) By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798	2024-06-13 20:09:02 +09:00
Mingming Liu	1351d17826	[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825 ) (The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7	2024-04-01 08:52:35 -07:00
Mingming Liu	87e7140f72	[nfc][InstrProfiling]For comdat setting helper function, move comment closer to the code (#83757 )	2024-03-04 09:46:34 -08:00
Mingming Liu	f83858f87c	[nfc][InstrProfiling]Compute a boolean state as a constant and use it everywhere (#83756 )	2024-03-04 08:56:54 -08:00
NAKAMURA Takumi	cc53707a5c	LLVMInstrumentation: Simplify mcdc.tvbitmap.update with GEP.	2024-02-25 11:21:46 +09:00
Petr Hosek	60c4f82d3c	[InstrProfiling] No runtime registration for ELF, COFF, Mach-O and XCOFF (#77225 ) Whether runtime registration is needed is not dependent on the OS but the file format. For ELF, COFF, Mach-O or XCOFF, we can always use the linker support. This is important for baremetal platforms such as RTOS and UEFI platforms where there is no OS but we still don't want to use runtime registration and rely on linker support instead.	2024-01-07 16:07:17 -08:00
Zequan Wu	ab3430f891	[Profile] Add binary profile correlation for code coverage. (#69493 ) ## Motivation Since we don't need the metadata sections at runtime, we can somehow offload them from memory at runtime. Initially, I explored [debug info correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113), which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it's be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a platform independent option. ## Design The idea is to use new section names for profile name and data sections and mark them as metadata sections. Under this mode, the new sections are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile. ## Data For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduce from 128M to 68M (46.9%) ``` $ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data [NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table [ = ] 0 +65% +1.23Ki .relro_padding +62% +1.20Ki [ = ] 0 [Unmapped] +13% +448 +19% +448 .init_array +8.8% +192 [ = ] 0 [ELF Section Headers] +0.0% +136 +0.0% +80 [7 Others] +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.5% +80 +1.2% +64 .plt [ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]] +195% +64.0Mi +194% +64.0Mi TOTAL $ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped FILE SIZE VM SIZE -------------- -------------- +121% +30.4Mi +121% +30.4Mi .text [NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts +95% +1.75Mi +95% +1.75Mi .eh_frame +108% +400Ki +108% +400Ki .eh_frame_hdr +9.5% +211Ki +9.5% +211Ki .rela.dyn +9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro +5.0% +87.3Ki +5.0% +87.3Ki .rodata [ = ] 0 +13% +47.0Ki .bss +40% +1.78Ki +40% +1.78Ki .got +12% +1.49Ki +12% +1.49Ki .gcc_except_table +13% +448 +19% +448 .init_array +0.1% +96 +0.1% +96 .dynsym +1.2% +96 +1.2% +96 .rela.plt +1.2% +64 +1.2% +64 .plt +2.9% +64 [ = ] 0 [ELF Section Headers] +0.0% +40 +0.0% +40 .data +1.2% +32 +1.2% +32 .got.plt +0.0% +24 +0.0% +8 [5 Others] [ = ] 0 -22.9% -872 [LOAD #5 [RW]] -74.5% -1.44Ki [ = ] 0 [Unmapped] [ = ] 0 -76.5% -1.45Ki .relro_padding +118% +38.8Mi +117% +38.8Mi TOTAL ``` A few things to note: 1. llvm-profdata doesn't support filter raw profiles by binary id yet, so when a raw profile doesn't belongs to the binary being digested by llvm-profdata, merging will fail. Once this is implemented, llvm-profdata should be able to only merge raw profiles with the same binary id as the binary and discard the rest (with mismatched/missing binary id). The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging. 2. Note: In COFF, currently they are still loaded into memory but not used. I didn't do it in this patch because I noticed that `.lcovmap` and `.lcovfunc` are loaded into memory. A separate patch will address it. 3. This should works with PGO when value profiling is disabled as debug info correlation currently doing, though I haven't tested this yet.	2023-12-14 14:16:38 -05:00
Mircea Trofin	a06c7d9e5f	[NFC][InstrProf] Rename internal `InstrProfiling` to `InstrLowerer` (#75139 ) Captures its responsibility a bit better.	2023-12-12 10:58:17 -08:00
Mircea Trofin	6ed1daa0c9	[NFC][InstrProf] Move `InstrProfiling` to the .cpp file (#75018 )	2023-12-11 15:42:57 -08:00
Mircea Trofin	1d608fc755	[NFC][InstrProf] Refactor InstrProfiling lowering pass (#74970 ) Akin other passes - refactored the name to `InstrProfilingLoweringPass` to better communicate what it does, and split the pass part and the transformation part to avoid needing to initialize object state during `::run`. A subsequent PR will move `InstrLowering` to the .cpp file and rename it to `InstrLowerer`.	2023-12-10 18:03:08 -08:00
Arthur Eubanks	66b919cb29	Reland [InstrProf][X86] Mark non-directly accessed globals as large (#74778 ) We'd like to make various instrprof globals large to make them not contribute to relocation pressure since there are no direct accesses to them in the module. Similar to what was done for asan_globals in #74514. This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names sections. The reland fixes platform.ll.	2023-12-08 09:54:57 -08:00
Arthur Eubanks	96a5135e56	Revert "[InstrProf][X86] Mark non-directly accessed globals as large (#74778 )" This reverts commit 5507f70cc205a7ec21d264a64c703b3d314b998c. Breaks bots, e.g. https://lab.llvm.org/buildbot/#/builders/232/builds/16374	2023-12-08 09:41:31 -08:00
Arthur Eubanks	5507f70cc2	[InstrProf][X86] Mark non-directly accessed globals as large (#74778 ) We'd like to make various instrprof globals large to make them not contribute to relocation pressure since there are no direct accesses to them in the module. Similar to what was done for asan_globals in #74514. This affects the __llvm_prf_vals, __llvm_prf_vnds, and __llvm_prf_names sections.	2023-12-08 09:33:40 -08:00

1 2 3 4 5 ...

319 Commits