llvm-project

Author	SHA1	Message	Date
Snehasish Kumar	0528848def	[NFC][MemProf] Move IndexedMemProfData to its own header. (#140503 ) Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types	2025-05-19 16:21:51 -07:00
Snehasish Kumar	ad3c1d2091	[NFC][MemProf] Move getGUID out of IndexedMemProfRecord (#140502 ) Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types	2025-05-19 16:18:57 -07:00
Snehasish Kumar	a53b306c47	[NFC][MemProf] Move Radix tree methods to their own header and cpp. (#140501 ) Part of a larger refactoring with the following goals 1. Reduce the size of MemProf.h 2. Avoid including ModuleSummaryIndex just for a couple of types	2025-05-19 16:16:09 -07:00
Snehasish Kumar	099a0fa3f2	[MemProf] Add v4 which contains CalleeGuids to CallSiteInfo. (#137394 ) This patch adds CalleeGuids to the serialized format and increments the version number to 4. The unit tests are updated to include a new test for v4 and the YAML format is also updated to be able to roundtrip the v4 format.	2025-05-01 20:17:21 -07:00
Snehasish Kumar	a61048bbcd	[MemProf][NFC] Hoist size computation out of the loop for v3 (#137479 ) Similar to the suggestion in #137394. In this case apply it to the current binary format (v3).	2025-04-28 11:16:29 -07:00
Owen Rodley	d3d856ad84	Clean up external users of GlobalValue::getGUID(StringRef) (#129644 ) See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801 for context. This is a non-functional change which just changes the interface of GlobalValue, in preparation for future functional changes. This part touches a fair few users, so is split out for ease of review. Future changes to the GlobalValue implementation can then be focused purely on that class. This does the following: * Rename GlobalValue::getGUID(StringRef) to getGUIDAssumingExternalLinkage. This is simply making explicit at the callsite what is currently implicit. * Where possible, migrate users to directly calling getGUID on a GlobalValue instance. * Otherwise, where possible, have them call the newly renamed getGUIDAssumingExternalLinkage, to make the assumption explicit. There are a few cases where neither of the above are possible, as the caller saves and reconstructs the necessary information to compute the GUID themselves. We want to migrate these callers eventually, but for this first step we leave them be.	2025-04-28 11:09:43 +10:00
Snehasish Kumar	e1ac57d53a	[MemProf] Extend CallSite information to include potential callees. (#130441 ) * Added YAML traits for `CallSiteInfo` * Updated the `MemProfReader` to pass `Frames` instead of the entire `CallSiteInfo` * Updated test cases to use `testing::Field` * Add YAML sequence traits for CallSiteInfo in MemProfYAML * Also extend IndexedMemProfRecord * XFAIL the MemProfYaml round trip test until we update the profile format For now we only read and write the additional information from the YAML format. The YAML round trip test will be enabled when the serialized format is updated.	2025-03-12 09:55:56 -07:00
Kazu Hirata	6fb967ec9e	[memprof] Move Frame::hash and hashCallStack to IndexedMemProfData (NFC) (#120365 ) Now that IndexedMemProfData::{addFrame,addCallStack} are the only callers of Frame::hash and hashCallStack, respectively, this patch moves those functions into IndexedMemProfData and makes them private. With this patch, we can obtain FrameId and CallStackId only through addFrame and addCallStack, respectively.	2024-12-18 10:56:45 -08:00
Kazu Hirata	ff7b42c194	[memprof] Speed up llvm-profdata (#117446 ) CallStackRadixTreeBuilder::build takes the parameter MemProfFrameIndexes by value, involving copies: std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>> MemProfFrameIndexes Then "build" makes another copy of MemProfFrameIndexe and passes it to encodeCallStack for every call stack, which is painfully slow. This patch changes the type to a pointer so that we don't have to make a copy every time we pass the argument. Without this patch, it takes 553 seconds to run "llvm-profdata merge" on a large MemProf raw profile. This patch shortenes that down to 67 seconds.	2024-11-24 21:08:54 -08:00
Kazu Hirata	9d8a11fb39	[memprof] Remove verifyIndexedMemProfRecord and verifyFunctionProfileData (#117412 ) This patch removes two functions to verify the consistency between: - IndexedAllocationInfo::CallStack - IndexedAllocationInfo::CSId Now that MemProf format Version 1 has been removed, IndexedAllocationInfo::CallStack doesn't participate in either serialization or deserialization, so we don't care about the consistency between the two fields in IndexAllocationInfo. Subsequent patches will remove uses of the old field and eventually remove the field.	2024-11-22 21:58:01 -08:00
Teresa Johnson	776476c282	Reapply "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395 ) (#117404 ) This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.	2024-11-22 16:18:30 -08:00
Teresa Johnson	fdb050a502	Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395 ) Reverts llvm/llvm-project#117066 This is causing some build bot failures that need investigation.	2024-11-22 14:57:58 -08:00
Teresa Johnson	ccb4702038	[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066 ) Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.	2024-11-22 14:49:55 -08:00
Kazu Hirata	ad2bdd8fab	[memprof] Remove MemProf format Version 1 (#117357 ) This patch removes MemProf format Version 1 now that Version 2 and 3 are working well.	2024-11-22 11:53:31 -08:00
Teresa Johnson	e14827f082	[MemProf] Templatize CallStackRadixTreeBuilder (NFC) (#117014 ) Prepare for usage in the bitcode reader/writer where we already have a LinearFrameId: - templatize input frame id type in CallStackRadixTreeBuilder - templatize input frame id type in computeFrameHistogram - make the map from FrameId to LinearFrameId optional We plan to use the same radix format in the ThinLTO summary records, where we already have a LinearFrameId.	2024-11-20 10:08:58 -08:00
Kazu Hirata	0d38f64e7d	[memprof] Remove MemProf format Version 0 (#116442 ) This patch removes MemProf format Version 0 now that version 2 and 3 seem to be working well. I'm not touching version 1 for now because some tests still rely on version 1. Note that Version 0 is identical to Version 1 except that the MemProf section of the indexed format has a MemProf version field.	2024-11-15 15:37:00 -08:00
Kazu Hirata	e2d539bbba	[memprof] Fix comment typos (NFC)	2024-06-10 16:38:24 -07:00
Kazu Hirata	4e0ff05460	[memprof] Remove extraneous memprof:: (NFC) (#94825 )	2024-06-07 18:32:58 -07:00
Kazu Hirata	dc3f8c2f58	[memprof] Improve deserialization performance in V3 (#94787 ) We call llvm::sort in a couple of places in the V3 encoding: - We sort Frames by FrameIds for stability of the output. - We sort call stacks in the dictionary order to maximize the length of the common prefix between adjacent call stacks. It turns out that we can improve the deserialization performance by modifying the comparison functions -- without changing the format at all. Both places take advantage of the histogram of Frames -- how many times each Frame occurs in the call stacks. - Frames: We serialize popular Frames in the descending order of popularity for improved cache locality. For two equally popular Frames, we break a tie by serializing one that tends to appear earlier in call stacks. Here, "earlier" means a smaller index within llvm::SmallVector<FrameId>. - Call Stacks: We sort the call stacks to reduce the number of times we follow pointers to parents during deserialization. Specifically, instead of comparing two call stacks in the strcmp style -- integer comparisons of FrameIds, we compare two FrameIds F1 and F2 with Histogram[F1] < Histogram[F2] at respective indexes. Since we encode from the end of the sorted list of call stacks, we tend to encode popular call stacks first. Since the two places use the same histogram, we compute it once and share it in the two places. Sorting the call stacks reduces the number of "jumps" by 74% when we deserialize all MemProfRecords. The cycle and instruction counts go down by 10% and 1.5%, respectively. If we sort the Frames in addition to the call stacks, then the cycle and instruction counts go down by 14% and 1.6%, respectively, relative to the same baseline (that is, without this patch).	2024-06-07 17:25:57 -07:00
Kazu Hirata	bfa937a487	[ProfileData] Add const to a few places (NFC) (#94803 )	2024-06-07 15:06:04 -07:00
Kazu Hirata	5c0df5fe22	[memprof] Add CallStackRadixTreeBuilder (#93784 ) Call stacks are a huge portion of the MemProf profile, taking up 70+% of the profile file size. This patch implements a radix tree to compress call stacks, which are known to have long common prefixes. Specifically, CallStackRadixTreeBuilder, introduced in this patch, takes call stacks in the MemProf profile, sorts them in the dictionary order to maximize the common prefix between adjacent call stacks, and then encodes a radix tree into a single array that is ready for serialization. The resulting radix array is essentially a concatenation of call stack arrays, each encoded with its length followed by the payload, except that these arrays contain "instructions" like "skip 7 elements forward" to borrow common prefixes from other call stacks. This patch does not integrate with the MemProf serialization/deserialization infrastructure yet. Once integrated, the radix tree is expected to roughly halve the file size of the MemProf profile.	2024-06-06 15:52:45 -07:00
Kazu Hirata	4a918f0710	[memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> (NFC) (#94432 ) This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, when I sort all call stacks by their lengths, the length at the first percentile is already 2. That is, 99 percent of call stacks do not take advantage of the inline element. Using std::vector<Frame> reduces the cycle and instruction counts by 11% and 22%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.	2024-06-06 14:24:43 -07:00
Kazu Hirata	9a8b73c741	[memprof] Replace uint32_t with LinearCallStackId where appropriate (NFC) (#94023 ) This patch replaces uint32_t with LinearCallStackId where appropriate. I'm replacing uint64_t with LinearCallStackId in writeMemProfCallStackArray, but that's OK because it's a value to be used as LinearCallStackId anyway.	2024-05-31 14:41:05 -07:00
Kazu Hirata	37f3023487	[memprof] Use uint32_t for linear call stack IDs (#93924 ) This patch switches to uint32_t for linear call stack IDs as uint32_t is sufficient to index into the call stack array.	2024-05-31 10:29:14 -07:00
Kazu Hirata	90acfbf90d	[memprof] Use linear IDs for Frames and call stacks (#93740 ) With this patch, we stop using on-disk hash tables for Frames and call stacks. Instead, we'll write out all the Frames as a flat array while maintaining mappings from FrameIds to the indexes into the array. Then we serialize call stacks in terms of those indexes. Likewise, we'll write out all the call stacks as another flat array while maintaining mappings from CallStackIds to the indexes into the call stack array. One minor difference from Frames is that the indexes into the call stack array are not contiguous because call stacks are variable-length objects. Then we serialize IndexedMemProfRecords in terms of the indexes into the call stack array. Now, we describe each call stack with 32-bit indexes into the Frame array (as opposed to the 64-bit FrameIds in Version 2). The use of the smaller type cuts down the profile file size by about 40% relative to Version 2. The departure from the on-disk hash tables contributes a little bit to the savings, too. For now, IndexedMemProfRecords refer to call stacks with 64-bit indexes into the call stack array. As a follow-up, I'll change that to uint32_t, including necessary updates to RecordWriterTrait.	2024-05-30 14:28:22 -07:00
Kazu Hirata	9e89d107a6	[memprof] Add MemProf format Version 3 (#93608 ) This patch adds Version 3 for development purposes. For now, this patch adds V3 as a copy of V2. For the most part, this patch adds "case Version3:" wherever "case Version2:" appears. One exception is writeMemProfV3, which is copied from writeMemProfV2 but updated to write out memprof::Version3 to the MemProf header. We'll incrementally modify writeMemProfV3 in subsequent patches.	2024-05-28 13:30:00 -07:00
Kazu Hirata	d2a103e682	[memprof] Remove const from the return type of toMemProfRecord (#93415 ) "const" being removed in this patch prevents the move semantics from being used in: AI.CallStack = Callback(IndexedAI.CSId); With this patch on an indexed MemProf Version 2 profile, the cycle count and instruction count go down by 13.3% and 26.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.	2024-05-28 07:31:29 -07:00
Kazu Hirata	8d2258fd3b	[memprof] Call llvm::SmallVector::reserve (#93324 )	2024-05-24 10:42:46 -07:00
Kazu Hirata	300663a190	[memprof] Use std::move in toMemProfRecord (#93133 ) std::move and reserve here result in a measurable speed-up in llvm-profdata modified to deserialize all MemProfRecords. The cycle count goes down by 7.1% while the instruction count goes down by 21%.	2024-05-23 12:27:38 -07:00
Kazu Hirata	26fabdded3	[memprof] Pass FrameIdConverter and CallStackIdConverter by reference (#92327 ) CallStackIdConverter sets LastUnmappedId when a mapping failure occurs. Now, since toMemProfRecord takes an instance of CallStackIdConverter by value, namely std::function, the caller of toMemProfRecord never receives the mapping failure that occurs inside toMemProfRecord. The same problem applies to FrameIdConverter. The patch fixes the problem by passing FrameIdConverter and CallStackIdConverter by reference, namely llvm::function_ref. While I am it, this patch deletes the copy constructor and copy assignment operator to avoid accidental copies.	2024-05-15 17:53:28 -07:00
Kazu Hirata	cb9589b227	[memprof] Move getFullSchema and getHotColdSchema outside PortableMemInfoBlock (#90103 ) These functions do not operate on PortableMemInfoBlock. This patch moves them outside the class.	2024-04-25 12:12:28 -07:00
Kazu Hirata	edf733bc32	[memprof] Take Schema into account in PortableMemInfoBlock::serializedSize (#89824 ) PortableMemInfoBlock::{serialize,deserialize} take Schema into account, allowing us to serialize/deserialize a subset of the fields. However, PortableMemInfoBlock::serializedSize does not. That is, it assumes that all fields are always serialized and deserialized. In other words, if we choose to serialize/deserialize a subset of the fields, serializedSize would claim more storage than we actually need. This patch fixes the problem by teaching serializedSize to take Schema into account. For now, this patch has no effect on the actual indexed MemProf profile because we serialize/deserialize all fields, but that might change in the future. Aside from check-llvm, I tested this patch by verifying that llvm-profdata generates bit-wise identical files for each version for a large raw MemProf file I have.	2024-04-23 13:44:31 -07:00
Kazu Hirata	f430e37446	[llvm] Drop unaligned from calls to readNext (NFC) (#88841 ) Now readNext defaults to unaligned accesses. This patch drops unaligned to improve readability.	2024-04-16 12:47:02 -07:00
Kazu Hirata	8137bd9e03	[memprof] Use CSId to construct MemProfRecord (#88362 ) We are in the process of referring to call stacks with CallStackId in IndexedMemProfRecord and IndexedAllocationInfo instead of holding call stacks inline (both in memory and the serialized format). Doing so deduplicates call stacks and reduces the MemProf profile file size. Before we can eliminate the two fields holding call stacks inline: - IndexedAllocationInfo::CallStack - IndexedMemProfRecord::CallSites we need to eliminate all the read operations on them. This patch is a step toward that direction. Specifically, we eliminate the read operations in the context of MemProfReader and RawMemProfReader. A subsequent patch will eliminate the read operations during the serialization.	2024-04-16 10:16:48 -07:00
Kazu Hirata	e09245b3b1	[memprof] Fix typos in serializedSizeV0 and serializedSizeV2 (#88629 ) The first field to serialize is the size of IndexedMemProfRecord::AllocSites. It has nothing to do with GlobalValue::GUID. This happens to work because of: using GUID = uint64_t;	2024-04-15 10:00:56 -07:00
Kazu Hirata	3f16ff4e68	[memprof] Use static instead of anonymous namespaces (#87889 ) This patch replaces anonymous namespaces with static as per LLVM Coding Standards.	2024-04-07 11:38:15 -07:00
Kazu Hirata	d89914f30b	[memprof] Add Version2 of IndexedMemProfRecord serialization (#87455 ) I'm currently developing a new version of the indexed memprof format where we deduplicate call stacks in IndexedAllocationInfo::CallStack and IndexedMemProfRecord::CallSites. We refer to call stacks with integer IDs, namely CallStackId, just as we refer to Frame with FrameId. The deduplication will cut down the profile file size by 80% in a large memprof file of mine. As a step toward the goal, this patch teaches IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A subsequent patch will add Version2 support to llvm-profdata. The essense of the patch is to replace the serialization of a call stack, a vector of FrameIDs, with that of a CallStackId. That is: const IndexedAllocationInfo &N = ...; ... LE.write<uint64_t>(N.CallStack.size()); for (const FrameId &Id : N.CallStack) LE.write<FrameId>(Id); becomes: LE.write<CallStackId>(N.CSId);	2024-04-03 21:48:38 -07:00
Kazu Hirata	6646fe884c	[memprof] Compute CallStackId when deserializing IndexedAllocationInfo (#86421 ) There are two ways to create in-memory instances of IndexedAllocationInfo -- deserialization of the raw MemProf data and that of the indexed MemProf data. With: commit 74799f424063a2d751e0f9ea698db1f4efd0d8b2 Author: Kazu Hirata <kazu@google.com> Date: Sat Mar 23 19:50:15 2024 -0700 we compute CallStackId for each call stack in IndexedAllocationInfo while deserializing the raw MemProf data. This patch does the same while deserilizing the indexed MemProf data. As with the patch above, this patch does not add any use of CallStackId yet.	2024-03-25 14:21:49 -07:00
Kazu Hirata	74799f4240	[memprof] Add call stack IDs to IndexedAllocationInfo (#85888 ) The indexed MemProf file has a huge amount of redundancy. In a large internal application, 82% of call stacks, stored in IndexedAllocationInfo::CallStack, are duplicates. We should work toward deduplicating call stacks by referring to them with unique IDs with actual call stacks stored in a separate data structure, much like we refer to memprof::Frame with memprof::FrameId. At the same time, we need to facilitate a graceful transition from the current version of the MemProf format to the next. We should be able to read (but not write) the current version of the MemProf file even after we move onto the next one. With those goals in mind, I propose to have an integer ID next to CallStack in IndexedAllocationInfo to refer to a call stack in a succinct manner. We'll gradually increase the areas of the compiler where IDs and call stacks have one-to-one correspondence and eventually remove the existing CallStack field. This patch adds call stack ID, named CSId, to IndexedAllocationInfo and teaches the raw profile reader to compute unique call stack IDs and store them in the new field. It does not introduce any user of the call stack IDs yet, except in verifyFunctionProfileData.	2024-03-23 19:50:15 -07:00
Teresa Johnson	749d595de9	[MemProf][NFC] Correct comment about stripping of suffixes in profile (#73840 ) The comment about the stripping of suffixes when creating the indexed MemProf profile was partially incorrect, as we do not strip ".__uniq." suffixes by default (by design). Update the comment accordingly.	2023-11-29 10:34:21 -08:00
Kazu Hirata	02f67c097d	Use llvm::endianness::{big,little,native} (NFC) Note that llvm::support::endianness has been renamed to llvm::endianness while becoming an enum class. This patch replaces {big,little,native} with llvm::endianness::{big,little,native}. This patch completes the migration to llvm::endianness and llvm::endianness::{big,little,native}. I'll post a separate patch to remove the migration helpers in llvm/Support/Endian.h: using endianness = llvm::endianness; constexpr llvm::endianness big = llvm::endianness::big; constexpr llvm::endianness little = llvm::endianness::little; constexpr llvm::endianness native = llvm::endianness::native;	2023-10-13 23:16:25 -07:00
Snehasish Kumar	0edc32fda5	[memprof] Canonicalize the function name prior to hashing. Canonicalize the function name (strip suffixes etc) to ensure that function name suffixes added by late stage passes do not cause mismatches when memprof profile data is consumed. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D159132	2023-08-29 20:45:39 +00:00
Snehasish Kumar	6dd6a6161f	[memprof] Deduplicate and outline frame storage in the memprof profile. The current implementation of memprof information in the indexed profile format stores the representation of each calling context fram inline. This patch uses an interned representation where the frame contents are stored in a separate on-disk hash table. The table is indexed via a hash of the contents of the frame. With this patch, the compressed size of a large memprof profile reduces by ~22%. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D123094	2022-04-08 09:15:20 -07:00
Snehasish Kumar	27a4f2545f	Reland "[memprof] Store callsite metadata with memprof records." This reverts commit f4b794427e8037a4e952cacdfe7201e961f31a6f. Reland with underlying msan issue fixed in D122260.	2022-03-22 14:40:02 -07:00
Mitch Phillips	f4b794427e	Revert "[memprof] Store callsite metadata with memprof records." This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765. Reason: Causes the MSan buildbot to fail (see comments on https://reviews.llvm.org/D121179 for more information	2022-03-21 15:59:13 -07:00
Snehasish Kumar	0d362c90d3	[memprof] Store callsite metadata with memprof records. To ease profile annotation, each of the callsites in a function can be annotated with profile data - "IR metadata format for MemProf" [1]. This patch extends the on-disk serialized record format to store the debug information for allocation callsites incl inline frames. This change is incompatible with the existing format i.e. indexed profiles must be regenerated, raw profiles are unaffected. [1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D121179	2022-03-21 13:58:29 -07:00
serge-sans-paille	fc97efa409	Cleanup includes: ProfileData Estimation of the impact on preprocessor output: before: 1067349756 after: 1065940348 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120434	2022-02-24 13:25:11 +01:00
Snehasish Kumar	0a4184909a	Reland "[memprof] Extend the index prof format to include memory profiles." This patch adds support for optional memory profile information to be included with and indexed profile. The indexed profile header adds a new field which points to the offset of the memory profile section (if present) in the indexed profile. For users who do not utilize this feature the only overhead is a 64-bit offset in the header. The memory profile section contains (1) profile metadata describing the information recorded for each entry (2) an on-disk hashtable containing the profile records indexed via llvm::md5(function_name). We chose to introduce a separate hash table instead of the existing one since the indexing for the instrumented fdo hash table is based on a CFG hash which itself is perturbed by memprof instrumentation. This commit also includes the changes reviewed separately in D120093. Differential Revision: https://reviews.llvm.org/D120103	2022-02-17 22:09:52 -08:00
Snehasish Kumar	19bdf44d85	Revert "Reland "[memprof] Extend the index prof format to include memory profiles."" This reverts commit 807ba7aace188ada83ddb4477265728e97346af1.	2022-02-17 15:51:04 -08:00
Snehasish Kumar	27b7c1e3f5	Revert "[memprof] Fix frame deserialization on big endian systems." This reverts commit c74389b4b58d8db3f8262ce15b9d514d62fe265c. This broke the ml-opt-x86-64 build. https://lab.llvm.org/buildbot#builders/9/builds/4127	2022-02-17 15:51:04 -08:00

1 2

56 Commits