llvm-project

Author	SHA1	Message	Date
Mircea Trofin	7efceca079	[nfc][pgo] `const`-ify some APIs in `InstrProfSymtab` (#153284 ) The main reason some `const` - sounding APIs weren't const was because their state is lazily updated (ensuring ordering).	2025-08-13 18:08:08 +02:00
Kazu Hirata	cf18e5e0f8	[ProfileData] Remove an unnecessary cast (NFC) (#152087 ) new already returns ValueProfData *.	2025-08-05 07:39:14 -07:00
Kazu Hirata	b809d5e2ac	[ProfileData] Use lambdas instead of std::bind (NFC) (#146625 ) Lambdas are a lot shorter than std::bind here.	2025-07-01 22:50:04 -07:00
Mircea Trofin	82cbd68504	[NFC][PGO] Use constants rather than free strings for metadata labels (#145721 )	2025-06-25 16:20:10 -07:00
Mingming Liu	f3f28323ad	[StaticDataLayout][PGO] Add profile format for static data layout, and the classes to operate on the profiles. (#138170 ) Context: For https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744#p-336543-background-3, we propose to profile memory loads and stores via hardware events, symbolize the addresses of binary static data sections and feed the profile back into compiler for data partitioning. This change adds the profile format for static data layout, and the classes to operate on it. The profile and its format 1. Conceptually, a piece of data (call it a symbol) is represented by its symbol name or its content hash. The former applies to majority of data whose mangled name remains relatively stable over binary releases, and the latter applies to string literals (with name patterns like `.str.<N>[.llvm.<hash>]`. - The symbols with samples are hot data. The number of hot symbols is small relative to all symbols. The profile tracks its sampled counts and locations. Sampled counts come from hardware events, and locations come from debug information in the profiled binary. The symbols without samples are cold data. The number of such cold symbols is large. The profile tracks its representation (the name or content hash). - Based on a preliminary study, debug information coverage for data symbols is partial and best-effort. In the LLVM IR, global variables with source code correspondence may or may not have debug information. Therefore the location information is optional in the profiles. 2. The profile-and-compile cycle is similar to SamplePGO. Profiles are sampled from production binaries, and used in next binary releases. Known cold symbols and new hot symbols can both have zero sampled counts, so the profile records known cold symbols to tell the two for next compile. In the profile's serialization format, strings are concatenated together and compressed. Individual records stores the index. A separate PR will connect this class to InstrProfReader/Writer via MemProfReader/Writer. --------- Co-authored-by: Kazu Hirata <kazu@google.com>	2025-05-15 18:31:50 -07:00
Kazu Hirata	2f3067ed69	[llvm] Remove unused local variables (NFC) (#138454 )	2025-05-04 09:38:16 -07:00
Owen Rodley	d3d856ad84	Clean up external users of GlobalValue::getGUID(StringRef) (#129644 ) See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.	2025-04-28 11:09:43 +10:00
Mingming Liu	2f0cd0c68e	[NFCI] Move ProfOStream from InstrProfWriter.cpp to InstrProf.h/cpp (#136791 ) ProfOStream is a wrapper class for output stream, and used by InstrProfWriter.cpp to serialize various profiles, like PGO profiles and MemProf. This change proposes to move it into InstrProf.h/cpp. After this is in, InstrProfWriter can dispatch serialization of various formats into methods like `obj->serialize()`, and the serialization code could be move out of InstrProfWriter.cpp into individual classes (each in a smaller cpp file). One example is that we can gradually move writeMemprof [1] into llvm/*/ProfileData/MemProf.h/cpp, where a couple of classes already have `serialize/deserialize` methods. [1] `85b35a9077/llvm/lib/ProfileData/InstrProfWriter.cpp (L774-L791)`	2025-04-23 09:21:07 -07:00
Nick Sarnie	48b7530273	[clang][flang][Triple][llvm] Add isOffload function to LangOpts and isGPU function to Triple (#126956 ) I'm adding support for SPIR-V, so let's consolidate these checks. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-03-28 14:19:20 +00:00
Nikita Popov	979c275097	[IR] Store Triple in Module (NFC) (#129868 ) The module currently stores the target triple as a string. This means that any code that wants to actually use the triple first has to instantiate a Triple, which is somewhat expensive. The change in #121652 caused a moderate compile-time regression due to this. While it would be easy enough to work around, I think that architecturally, it makes more sense to store the parsed Triple in the module, so that it can always be directly queried. For this change, I've opted not to add any magic conversions between std::string and Triple for backwards-compatibilty purses, and instead write out needed Triple()s or str()s explicitly. This is because I think a decent number of them should be changed to work on Triple as well, to avoid unnecessary conversions back and forth. The only interesting part in this patch is that the default triple is Triple("") instead of Triple() to preserve existing behavior. The former defaults to using the ELF object format instead of unknown object format. We should fix that as well.	2025-03-06 10:27:47 +01:00
Kazu Hirata	e264b0e856	[ProfileData] Avoid repeated hash lookups (NFC) (#128829 )	2025-02-26 00:57:28 -08:00
Teresa Johnson	594e11ce42	[MemProf] Avoid incorrect ICP symtab canonicalization (#115419 ) ICP builds a symtab from the symbols in the module allowing mapping from the VP metadata GUIDs to the Function. MemProf uses this same symtab handling for its ICP during cloning. When symbols are added to the symtab, the handling adds both a GUID computed from the function name, or from the attached PGOFuncName metadata for locals, as well as a GUID computed from the "canonicalized" name, which strips all "." suffixes other than ".__uniq". This was originally meant to remove the ".llvm.*" suffix added to promoted locals (done earlier in the ThinLTO backend). In theory, it should no longer be needed as locals should have PGOFuncName metadata. However, this was causing a linker unsat, in code that used coroutines. For an original coroutine function, there were several additional functions created that had the same name, but different "." suffixes. Therefore the canonical name for these additional functions had the same GUID as that of the original function, leading to extra entries in the symtab, and to selecting the wrong function for promotion. For regular ICP this can happen, but is just a performance issue. However, for memprof the promoted direct call calls a memprof clone, and because we called the wrong function, in this case it didn't have a memprof clone and we got a linker unsat. We may be able to remove the canonical name handling for ICP in general, but for now disable it for MemProf. At worst this could lead to not finding a GUID in the symtab and not performing an ICP, so should be conservatively correct.	2024-11-07 21:00:42 -08:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Kazu Hirata	6c8ff4cbb8	[ProfileData] Take ArrayRef<InstrProfValueData> in addValueData (NFC) (#97363 ) This patch fixes another place in ProfileData where we have a pointer to an array of InstrProfValueData and its length separately. addValueData is a bit unique in that it remaps incoming values in place before adding them to ValueSites. AFAICT, no caller of addValueData uses updated incoming values. With this patch, we add value data to ValueSites first and then remaps values there. This way, we can take ArrayRef<InstrProfValueData> as a parameter.	2024-07-11 16:38:44 -07:00
Mircea Trofin	afbd7d1e7c	[NFC] Coding style: drop `k` in `kGlobalIdentifierDelimiter` (#98230 )	2024-07-09 15:44:55 -07:00
Mircea Trofin	e291f31f89	[NFC] Coding style fixes in InstrProf.cpp (#98211 )	2024-07-09 13:28:35 -07:00
Kazu Hirata	b8eaa5bb10	[ProfileData] Remove the old version of getValueProfDataFromInst (#97374 ) I've migrated uses of the old version of getValueProfDataFromInst to the one that returns SmallVector<InstrProfValueData, 4>. This patch removes the old version.	2024-07-02 11:46:31 -07:00
Mingming Liu	1518b260ce	[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442 ) Clang's `-fwhole-program-vtables` is required for this optimization to take place. If `-fwhole-program-vtables` is not enabled, this change is no-op. * Function-comparison (before): ``` %vtable = load ptr, ptr %obj %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn %cond = icmp eq ptr %func, @callee br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: call %func ``` * VTable-comparison (after): ``` %vtable = load ptr, ptr %obj %cond = icmp eq ptr %vtable, @vtable-address-point br i1 %cond, label bb1, label bb2: bb1: call @callee bb2: %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1 %func = load ptr, ptr %vfn call %func ``` Key changes: 1. Find out virtual calls and the vtables they come from. - The ICP relies on type intrinsic `llvm.type.test` to find out virtual calls and the compatible vtables, and relies on type metadata to find the address point for comparison. 2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold. 3. Sink the function addressing and vtable load instruction to indirect fallback. - The sink helper functions are simplified versions of `InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues` and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks. 4. Keep value profiles updated 1) Update vtable value profiles after inline 2) For either function-based comparison or vtable-based comparison, update both vtable and indirect call value profiles.	2024-06-29 23:21:33 -07:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit 5fd2af38e461445c583d7ffc2fe23858966eee76. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
Kazu Hirata	b0ae923ada	[ProfileData] Add a variant of getValueProfDataFromInst (#95993 ) This patch adds a variant of getValueProfDataFromInst that returns std::vector<InstrProfValueData> instead of std::unique<InstrProfValueData[]>. The new return type carries the length with it, so we can drop out parameter ActualNumValueData. Also, the caller can directly feed the return value into a range-based for loop as shown in the patch. I'm planning to migrate other callers of getValueProfDataFromInst to the new variant in follow-up patches.	2024-06-22 00:40:36 -07:00
Kazu Hirata	beba2e7385	[ProfileData] Teach addValueData to honor parameter Site (#96233 ) This patch teaches addValueData to honor Site for verification purposes. It does not affect the profile data in any manner.	2024-06-20 22:25:19 -07:00
Kazu Hirata	d6b0b7acf3	[ProfileData] Remove getValueProfDataFromInst (#95617 ) I've migrated all uses to the new version of getValueProfDataFromInst that returns std::unique_ptr<InstrProfValueData[]>.	2024-06-17 18:50:08 -07:00
Kazu Hirata	9ad102f03b	[ProfileData] Migrate to getValueArrayForSite (#95493 ) This patch migrates uses of getValueForSite to getValueArrayForSite. Each hunk is self-contained, meaning that each one can be applied independently of the others. In the unit test, there are cases where the array length check is performed a lot earlier than the array content check. For now, I'm leaving the length checks where they are. I'll consider moving them when I migrate uses of getNumValueDataForSite to getValueArrayForSite in a follow-up patch.	2024-06-14 06:38:48 -07:00
Kazu Hirata	31440738bd	[ProfileData] Use std::vector for ValueData (NFC) (#95194 ) This patch changes the type of ValueData to std::vector<InstrProfValueData> so that, in a follow-up patch, we can teach getValueForSite to return ArrayRef<InstrProfValueData>. Currently, a typical traversal over the value data looks like: uint32_t NV = Func.getNumValueDataForSite(VK, I); std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, I); for (uint32_t V = 0; V < NV; V++) Do something with VD[V].Value and/or VD[V].Count; Note that we need to call getNumValueDataForSite and getValueForSite separately. If getValueForSite returns ArrayRef<InstrProfValueData> in the future, then we'll be able to do something like: for (const auto &V : Func.getValueForSite(VK, I)) Do something with V.Value and/or V.Count; If ArrayRef<InstrProfValueData> directly points to ValueData, then getValueForSite won't need to allocate memory with std::make_unique. Now, switching to std::vector requires us to update several places: - sortByTargetValues switches to llvm::sort because we don't need to worry about sort stability. - sortByCount retains sort stability because std::list::sort also performs stable sort. - merge builds another array and move it back to ValueData to avoid a potential quadratic behavior with std::vector::insert into the middle of a vector.	2024-06-12 11:22:49 -07:00
Kazu Hirata	00fa3fbfb8	[ProfileData] Compute sum in annotateValueSite (NFC) (#95199 ) getValueForSite computes the total count -- the total number of times a given value site is visited. The problem is that, excluding tests, annotateValueSite is the only place that needs the total count. This patch moves the total count computation to annotateValueSite.	2024-06-12 10:14:33 -07:00
Kazu Hirata	bfa937a487	[ProfileData] Add const to a few places (NFC) (#94803 )	2024-06-07 15:06:04 -07:00
Kazu Hirata	7476c20c48	[ProfileData] Remove swapToHostOrder (#94665 ) This patch removes swapToHostOrder in favor of llvm::support::endian::readNext as swapToHostOrder is too thin a wrapper around readNext. Note that there are two variants of readNext: - readNext<type, endian, align>(ptr) - readNext<type, align>(ptr, endian) swapToHostOrder uses the former, but this patch switches to the latter. While we are at it, this patch teaches readNext to default to unaligned just as I did in: commit 568368a43e5b4adb3c5d105a0eff3e0c13c0af8c Author: Kazu Hirata <kazu@google.com> Date: Mon Apr 15 19:05:30 2024 -0700	2024-06-06 13:25:52 -07:00
Mingming Liu	c803c29039	[nfc][InstrProf]Remove 'offsetOf' when parsing indexed profiles (#93346 ) - In `Header::readFromBuffer`, read the buffer in the forward direction by using `readNext`. - When compute the header size, spell out the constant. With the changes above, we can remove `offsetOf` in InstrProf.cpp --------- Co-authored-by: Kazu Hirata <kazu@google.com>	2024-05-30 12:44:29 -07:00
Mingming Liu	737a3018e8	[nfc][InstrFDO] Add Header::getIndexedProfileVersion and use it to decide profile version. (#93613 ) This is a split of https://github.com/llvm/llvm-project/pull/93346 as discussed.	2024-05-29 10:15:17 -07:00
Ellis Hoag	73eb9b3314	[InstrProf] Evaluate function order using test traces (#92451 ) The `llvm-profdata order` command is used to compute a function order using traces from the input profile. Add the `--num-test-traces` flag to keep aside N traces to evalute this order. These test traces are assumed to be the actual function execution order in some experiment. The output is a number that represents how many page faults we got. Lower is better. I tested on a large profile I already had. ``` llvm-profdata order default.profdata --num-test-traces=30 # Ordered 149103 functions # Total area under the page fault curve: 2.271827e+09 ... ``` I also improved `TemporalProfTraceTy::createBPFunctionNodes()` in a few ways: * Simplified how `UN`s are computed * Change how the initial `Node` order is computed * Filter out rare and common `UN`s * Output vector is an aliased argument instead of a return These changes slightly improved the evaluation in my test. ``` llvm-profdata order default.profdata --num-test-traces=30 # Ordered 149103 functions # Total area under the page fault curve: 2.268586e+09 ... ```	2024-05-23 11:19:29 -07:00
Mingming Liu	b66779b5bf	[nfc][InstrProfReader]Store header fields in native endianness (#92947 ) - Use `Header.Version` directly and remove Header::formatVersion --------- Co-authored-by: Kazu Hirata <kazu@google.com>	2024-05-21 21:25:12 -07:00
Mingming Liu	98c1ba460a	[InstrProf] Add vtables with type metadata into symtab (#81051 ) The indirect-call-promotion pass will look up the vtable to find out the virtual function [1], and add vtable-derived information in icall candidate [2] for cost-benefit analysis. [1] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR395-R416 [2] https://github.com/llvm/llvm-project/pull/81442/files#diff-a95d1ac8a0da69713fcb3346135d4b219f0a73920318d2549495620ea215191bR195-R199	2024-05-09 10:41:23 -07:00
Kazu Hirata	bb6df0804b	[llvm] Use StringRef::operator== instead of StringRef::equals (NFC) (#91441 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 70 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-08 10:33:53 -07:00
Kazu Hirata	f430e37446	[llvm] Drop unaligned from calls to readNext (NFC) (#88841 ) Now readNext defaults to unaligned accesses. This patch drops unaligned to improve readability.	2024-04-16 12:47:02 -07:00
Mingming Liu	08e210c6af	[NFC][IndirectCallProm] Refactor function-based conditional devirtualization and indirect call value profile update into one helper function (#80762 ) * The motivation is to move indirect callee profile update inside the function-based speculative indirect-call promotion, so that there are fewer diffs the vtable-based transformation and profile update is implemented in a follow-up patch. * The Parent patch is https://github.com/llvm/llvm-project/pull/79381	2024-04-11 13:28:20 -07:00
Mingming Liu	1e15371dd8	[ThinLTO][TypeProf] Implement vtable def import (#79381 ) Add annotated vtable GUID as referenced variables in per function summary, and update bitcode writer to create value-ids for these referenced vtables. - This is the part3 of type profiling work, and described in the "Virtual Table Definition Import" [1] section of the RFC. [1] https://github.com/llvm/llvm-project/pull/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	2024-04-01 15:14:49 -07:00
Mingming Liu	1351d17826	[InstrFDO][TypeProf] Implement binary instrumentation and profile read/write (#66825 ) (The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691) * For InstrFDO value profiling, implement instrumentation and lowering for virtual table address. * This is controlled by `-enable-vtable-value-profiling` and off by default. * When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads. * Implement profile reader and writer support * Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols. * Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't happen since IR is used to construct InstrProfSymtab. * Indexed profile writer collects the list of vtable names, and stores that to index profiles. * Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type. * `llvm-profdata show -show-vtables <args> <profile>` is implemented. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7	2024-04-01 08:52:35 -07:00
wanglei	f439c71373	[InstrProf][NFC] Fix -Wimplicit-fallthrough warning in InstrProf.cpp after #82711	2024-03-06 10:20:30 +08:00
Mingming Liu	16e74fd489	Reland "[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling." (#82711 ) New change on top of [reviewed patch](https://github.com/llvm/llvm-project/pull/81691) are [in commits after this one](`d0757f46b3`). Previous commits are restored from the remote branch with timestamps. 1. Fix build breakage for non-ELF platforms, by defining the missing functions {`__llvm_profile_begin_vtables`, `__llvm_profile_end_vtables`, `__llvm_profile_begin_vtabnames `, `__llvm_profile_end_vtabnames`} everywhere. * Tested on mac laptop (for darwins) and Windows. Specifically, functions in `InstrProfilingPlatformWindows.c` returns `NULL` to make it more explicit that type prof isn't supported; see comments for the reason. * For the rest (AIX, other), mostly follow existing examples (like this [one](`f95b2f1acf`)) 2. Rename `__llvm_prf_vtabnames` -> `__llvm_prf_vns` for shorter section name, and make returned pointers [const](`a825d2a4ec (diff-4de780ce726d76b7abc9d3353aef95013e7b21e7bda01be8940cc6574fb0b5ffR120-R121)`) Original Description * Raw profile format - Header: records the byte size of compressed vtable names, and the number of profiled vtable entries (call it `VTableProfData`). Header also records padded bytes of each section. - Payload: adds a section for compressed vtable names, and a section to store `VTableProfData`. Both sections are padded so the size is a multiple of 8. * Indexed profile format - Header: records the byte offset of compressed vtable names. - Payload: adds a section to store compressed vtable names. This section is used by `llvm-profdata` to show the list of vtables profiled for an instrumented site. [The originally reviewed patch](https://github.com/llvm/llvm-project/pull/66825) will have profile reader/write change and llvm-profdata change. - To ensure this PR has all the necessary profile format change along with profile version bump, created a copy of the originally reviewed patch in https://github.com/llvm/llvm-project/pull/80761. The copy doesn't have profile format change, but it has the set of tests which covers type profile generation, profile read and profile merge. Tests pass there. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600 --------- Co-authored-by: modiking <modiking213@gmail.com>	2024-02-27 11:07:40 -08:00
Mingming Liu	0e8d1877cd	Revert type profiling change as compiler-rt test break on Windows. (#82583 ) Examples https://lab.llvm.org/buildbot/#/builders/127/builds/62532/steps/8/logs/stdio	2024-02-21 21:41:33 -08:00
Mingming Liu	db7e9e6841	[TypeProf][InstrPGO] Introduce raw and instr profile format change for type profiling. (#81691 ) * Raw profile format - Header: records the byte size of compressed vtable names, and the number of profiled vtable entries (call it `VTableProfData`). Header also records padded bytes of each section. - Payload: adds a section for compressed vtable names, and a section to store `VTableProfData`. Both sections are padded so the size is a multiple of 8. * Indexed profile format - Header: records the byte offset of compressed vtable names. - Payload: adds a section to store compressed vtable names. This section is used by `llvm-profdata` to show the list of vtables profiled for an instrumented site. [The originally reviewed patch](https://github.com/llvm/llvm-project/pull/66825) will have profile reader/write change and llvm-profdata change. - To ensure this PR has all the necessary profile format change along with profile version bump, created a copy of the originally reviewed patch in https://github.com/llvm/llvm-project/pull/80761. The copy doesn't have profile format change, but it has the set of tests which covers type profile generation, profile read and profile merge. Tests pass there. rfc in https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600 --------- Co-authored-by: modiking <modiking213@gmail.com>	2024-02-21 20:59:42 -08:00
Mingming Liu	2422e969bf	[NFC][InstrProf]Factor out getCanonicalName to compute the canonical name given a pgo name. (#81547 ) - Also update the `InstrProf::addFuncWithName` to call the newly added `getCanonicalName`.	2024-02-13 10:49:35 -08:00
Mingming Liu	05091aa3ac	[NFC][InstrProf]Generalize getParsedIRPGOFuncName to getParsedIRPGOName (#81054 ) - Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the function. - Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.	2024-02-07 20:03:44 -08:00
spupyrev	30aa9fb4c1	Revert "[InstrProf] Adding utility weights to BalancedPartitioning (#72717 )" This reverts commit 5954b9dca21bb0c69b9e991b2ddb84c8b05ecba3 due to broken Windows build	2024-01-19 15:13:47 -08:00
spupyrev	5954b9dca2	[InstrProf] Adding utility weights to BalancedPartitioning (#72717 ) Adding weights to utility nodes in BP so that we can give more importance to certain utilities. This is useful when we optimize several objectives jointly.	2024-01-19 13:36:59 -08:00
Fangrui Song	0c6dc80531	BalancedPartitioning: minor updates (#77568 ) When LargestTraceSize is a power of two, createBPFunctionNodes does not allocate a group ID for Trace[LargestTraceSize-1] (as N is off by 1). Fix this and change floor+log2 to Log2_64. BalancedPartitioning::bisect can use unstable sort because `Nodes` contains distinct `InputOrderIndex`s. BalancedPartitioning::runIterations: use one DenseMap and simplify the node renumbering code.	2024-01-17 10:46:34 -08:00
Ellis Hoag	9a2df55f47	[InstrProf] No linkage prefixes in IRPGO names (#76994 ) Change the format of IRPGO counter names to `[<filepath>;]<mangled-name>` which is computed by `GlobalValue::getGlobalIdentifier()` to fix #74565. In fe051934cbb0aaf25d960d7d45305135635d650b (https://reviews.llvm.org/D156569) the format of IRPGO counter names was changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O (yes, it is confusing that `<linkage-name>` is computed with `Mangler().getNameWithPrefix()` while `<mangled-name>` is just `F.getName()`). We discovered in #74565 that this causes some missed import issues on some targets and #74008 is a partial fix. Since `<mangled-name>` may not match the `<linkage-name>` on some targets like Mach-O, we will need to post-process the output of `llvm-profdata order` before passing to the linker via `-order_file`. Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will become stale after this diff, but I think this is acceptable since that patch landed after the LLVM 18 cut which hasn't been released yet.	2024-01-04 16:13:57 -08:00
Mingming Liu	78a195e100	Reland the reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. " (#75954 ) Simplify the compiler-rt test to make it more general for different platforms, and use `DAG` matchers for lines that may be emitted out-of-order. - The compiler-rt test passed on a Windows machine. Previously name matchers don't work for MSVC mangling (https://lab.llvm.org/buildbot/#/builders/127/builds/59907) - `DAG` matchers fixed the error in https://lab.llvm.org/buildbot/#/builders/94/builds/17924 This is the second reland and fixed errors caught in first reland (https://github.com/llvm/llvm-project/pull/75860) Original commit message Commit fe05193 (phab D156569), IRPGO names uses format `[<filepath>;]<linkage-name>` while prior format is `[<filepath>:<mangled-name>`. The format change would break the use case demonstrated in (updated) `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` This patch changes `GlobalValues::getGlobalIdentifer` to use the semicolon. To elaborate on the scenario how things break without this PR 1. IRPGO raw profiles stores (compressed) IRPGO names of functions in one section, and per-function profile data in another section. The [NameRef](`fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72)`) field in per-function profile data is the MD5 hash of IRPGO names. 2. When raw profiles are converted to indexed format profiles, the profiled address is [mapped](`fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885)`) to the MD5 hash of the callee. 3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names will be [annotated](`fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707)`) as value profiles, and used to import indirect-call-prom candidates. If the annotated MD5 hash is computed from the new format while import uses the prior format, the callee cannot be imported. * `compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp` is added to have an end-to-end test. * `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` is updated to have better test coverage from another aspect (as runtime tests are more sensitive to the environment and may be skipped by some contributors)	2023-12-19 12:25:56 -08:00
Mingming Liu	6ce23ea0ab	Revert "Reland "[PGO][GlobalValue][LTO]In GlobalValues::getGlobalIdentifier, use semicolon as delimiter for local-linkage varibles. "" (#75888 ) Reverts llvm/llvm-project#75860 - Mangled name mismatch on Windows (https://lab.llvm.org/buildbot/#/builders/127/builds/59907/steps/8/logs/stdio)	2023-12-18 19:31:18 -08:00

1 2 3 4 5 ...

301 Commits