llvm-project

Author	SHA1	Message	Date
Kazu Hirata	774b12c4a0	[memprof] Initialize AllocInfoIter and CallSitesIter (NFC) (#124972 ) This patch initializes AllocInfoIter and CallSitesIter to their respective end(). I'm doing this not because I'm worried about uninitialized iterators, but because the resulting code looks shorter and makes it clear which data structure each iterator is associated with.	2025-01-29 14:31:00 -08:00
Mats Jun Larsen	416f1c465d	[IR] Replace of PointerType::get(Type) with opaque version (NFC) (#123617 ) In accordance with https://github.com/llvm/llvm-project/issues/123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.	2025-01-21 00:32:56 +09:00
Kazu Hirata	adf0c817f3	[memprof] Undrift MemProf profile even when some frames are missing (#120500 ) This patch makes the MemProf undrifting process a little more lenient. Consider an inlined call hierarchy: foo -> bar -> ::new If bar tail-calls ::new, the profile appears to indicate that foo directly calls ::new. This is a problem because the perceived call hierarchy in the profile looks different from what we can obtain from the inline stack in the IR. Recall that undrifting works by constructing and comparing a list of direct calls from the profile and that from the IR. This patch modifies the construction of the latter. Specifically, if foo calls bar in the IR, but bar is missing the profile, we pretend that foo directly calls some heap allocation function. We apply this transformation only in the inline stack leading to some heap allocation function.	2024-12-20 15:40:08 -08:00
Teresa Johnson	c7451ffcb9	[MemProf] Supporting hinting mostly-cold allocations after cloning (#120633 ) Optionally unconditionally hint allocations as cold or not cold during the cloning step if the percentage of bytes allocated is at least that of the given threshold. This is similar to PR120301 which supports this during matching, but enables the same behavior during cloning, to reduce the false positives that can be addressed by cloning at the cost of carrying the additional size metadata/summary.	2024-12-20 11:27:54 -08:00
Kazu Hirata	2886576944	[memprof] clang-format MemProf-related files (NFC) (#120504 )	2024-12-19 10:25:29 -08:00
Kazu Hirata	ac8a9f8fff	[memprof] Undrift MemProfRecord (#120138 ) This patch undrifts source locations in MemProfRecord before readMemprof starts the matching process. The thoery of operation is as follows: 1. Collect the lists of direct calls, one from the IR and the other from the profile. 2. Compute the correspondence (called undrift map in the patch) between the two lists with longestCommonSequence. 3. Apply the undrift map just before readMemprof consumes MemProfRecord. The new function gated by a flag that is off by default.	2024-12-18 14:21:25 -08:00
Teresa Johnson	a15e7b11da	[MemProf] Add option to hint allocations at a given cold byte percentage (#120301 ) Optionally unconditionally hint allocations as cold or not cold during the matching step if the percentage of bytes allocated is at least that of the given threshold.	2024-12-17 15:53:56 -08:00
Kazu Hirata	7c294eb780	[memprof] Simplify readMemprof (NFC) (#119930 ) This patch essentially replaces: std::pair<const std::vector<Frame> *, unsigned> with: ArrayRef<Frame> This way, we can store and pass ArrayRef<Frame>, conceptually one item, instead of the pointer and index. The only problem is that we don't have an existing hash function for ArrayRef<Frame>>, so we provide a custom one, namely CallStackHash.	2024-12-14 00:03:27 -08:00
Ellis Hoag	2e33ed9ecc	[memprof] Use -memprof-runtime-default-options to set options during compile time (#118874 ) Add the `__memprof_default_options_str` variable, initialized via the `-memprof-runtime-default-options` LLVM flag, to hold the default options string for memprof. This allows us to set these options during compile time in the clang invocation. Also update the docs to describe the various ways to set these options.	2024-12-06 09:22:16 -08:00
Kazu Hirata	51cdf1f662	[memprof] Skip MemProfUsePass on the empty module (#117210 ) This patch teaches the MemProfUsePass to return immediately on the empty module. Aside from saving time to deserialize the MemProf profile, this patch ensures that we can obtain TLI like so: TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(*M.begin()); when we undrift the MemProf profile in near future.	2024-11-21 11:29:48 -08:00
Kazu Hirata	a2e266b346	[memprof] Add computeUndriftMap (#116478 ) This patch adds computeUndriftMap, a function to compute mappings from source locations in the MemProf profile to source locations in the IR.	2024-11-19 19:28:33 -08:00
Teresa Johnson	9513f2fdf2	[MemProf] Print full context hash when reporting hinted bytes (#114465 ) Improve the information printed when -memprof-report-hinted-sizes is enabled. Now print the full context hash computed from the original profile, similar to what we do when reporting matching statistics. This will make it easier to correlate with the profile. Note that the full context hash must be computed at profile match time and saved in the metadata and summary, because we may trim the context during matching when it isn't needed for distinguishing hotness. Similarly, due to the context trimming, we may have more than one full context id and total size pair per MIB in the metadata and summary, which now get a list of these pairs. Remove the old aggregate size from the metadata and summary support. One other change from the prior support is that we no longer write the size information into the combined index for the LTO backends, which don't use this information, which reduces unnecessary bloat in distributed index files.	2024-11-15 08:24:44 -08:00
Kazu Hirata	95554cbd77	[memprof] Teach extractCallsFromIR to recognize heap allocation functions (#115938 ) This patch teaches extractCallsFromIR to recognize heap allocation functions. Specifically, when we encounter a callee that is known to be a heap allocation function like "new", we set the callee GUID to 0. Note that I am planning to do the same for the caller-callee pairs extracted from the profile. That is, when I encounter a frame that does not have a callee, we assume that the frame is calling some heap allocation function with GUID 0. Technically, I'm not recognizing enough functions in this patch. TCMalloc is known to drop certain frames in the call stack immediately above new. This patch is meant to lay the groundwork, setting up GetTLI, plumbing it to extractCallsFromIR, and adjusting the unit tests. I'll address remaining issues in subsequent patches.	2024-11-12 21:37:29 -08:00
Kazu Hirata	c61832444d	[memprof] Teach extractCallsFromIR to look into inline stacks (#115441 ) To undrift the profile, we need to extract as many caller-callee pairs from the IR as we can to maximize the number of call sites in the profile we can undrift. Now, since MemProfUsePass runs after early inlining, some functions have been inlined, and we may no longer have bodies for those functions in the IR. To cope with this, this patch teaches extractCallsFromIR to extract caller-calee pairs from inline stacks. The output format of extractCallsFromIR remains the same. We still return a map from caller GUIDs to lists of corresponding call sites.	2024-11-08 18:24:38 -08:00
Kazu Hirata	e189d61924	[memprof] Add extractCallsFromIR (#115218 ) This patch adds extractCallsFromIR, a function to extract calls from the IR, which will be used to undrift call site locations in the MemProf profile. In a nutshell, the MemProf undrifting works as follows: - Extract call site locations from the IR. - Extract call site locations from the MemProf profile. - Undrift the call site locations with longestCommonSequence. This patch implements the first bullet point above. Specifically, given the IR, the new function returns a map from caller GUIDs to lists of corresponding call sites. For example: Given: foo() { f1(); f2(); f3(); } extractCallsFromIR returns: Caller: foo -> {{(Line 1, Column 3), Callee: f1}, {(Line 2, Column 3), Callee: f2}, {(Line 2, Column 9), Callee: f3}} where the line numbers, relative to the beginning of the caller, and column numbers are sorted in the ascending order. The value side of the map -- the list of call sites -- can be directly passed to longestCommonSequence. To facilitate the review process, I've only implemented basic features in extractCallsFromIR in this patch. - The new function extracts calls from the LLVM "call" instructions only. It does not look into the inline stack. - It does not recognize or treat heap allocation functions in any special way. I will address these missing features in subsequent patches.	2024-11-07 14:40:00 -08:00
Kazu Hirata	77b7d9de83	[memprof] Add const to isAllocationWithHotColdVariant (NFC) (#114719 )	2024-11-03 17:59:13 -08:00
Kazu Hirata	890c4bece2	[memprof] Use SmallVector for InlinedCallStack (NFC) (#114599 ) We can stay within 8 inlined elements more than 99% of the time while building a large application.	2024-11-01 19:52:11 -07:00
Snehasish Kumar	9b00ef5261	Revert "Add unit tests for size returning new funcs in the MemProf use pass. (#105473 )" (#106114 ) This reverts commit 2e426fe8ff314c2565073e73e27fdbdf36c140a3.	2024-08-26 11:26:47 -07:00
Snehasish Kumar	2e426fe8ff	Add unit tests for size returning new funcs in the MemProf use pass. (#105473 ) We use a unit test to verify correctness since: a) we don't have a text format profile b) size returning new isn't supported natively c) a raw profile will need to be manipulated artificially The changes this test covers were made in https://github.com/llvm/llvm-project/pull/102258.	2024-08-26 09:43:03 -07:00
Snehasish Kumar	95daf1aedf	Allow optimization of __size_returning_new variants. (#102258 ) https://github.com/llvm/llvm-project/pull/101564 added support to TLI to detect variants of operator new which provide feedback on the actual size of memory allocated (http://wg21.link/P0901R5). This patch extends SimplifyLibCalls to handle hot cold hinting of these variants.	2024-08-15 08:06:41 -07:00
Matthew Weingarten	17993eb162	[Memprof] Adds instrumentation support for memprof with histograms. (#100834 ) This patch allows running `-fmemory-profile` without the flag `-memprof-use-callbacks`, meaning the `RecordAccessesHistogram` is injected into IR as a sequence of instructions. This significantly increases performance of the instrumented binary.	2024-07-29 16:09:37 -07:00
Teresa Johnson	8c1bd67dee	[MemProf] Optionally print or record the profiled sizes of allocations (#98248 ) This is the first step in being able to track the total profiled sizes of allocations successfully marked as cold. Under a new option -memprof-report-hinted-sizes: - For unambiguous (non-context-sensitive) allocations, print the profiled size and the allocation coldness, along with a hash of the allocation's location (to allow for deduplication across modules or inline instances). - For context sensitive allocations, add the size as a 3rd operand on the MIB metadata. A follow on patch will propagate this through to the thin link where the sizes will be reported for each context after cloning.	2024-07-10 09:41:36 -07:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Matthew Weingarten	30b93db547	[Memprof] Adds the option to collect AccessCountHistograms for memprof. (#94264 ) Adds compile time flag -mllvm -memprof-histogram and runtime flag histogram=true\|false to turn Histogram collection on and off. The -memprof-histogram flag relies on -memprof-use-callbacks=true to work. Updates shadow mapping logic in histogram mode from having one 8 byte counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only supports this granularity as of now. Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary format, including changing to a new version of the raw binary format from version 3 to version 4. Updates creating MemoryInfoBlocks with and without Histograms. When two MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter Histogram is removed. Adds a memprof_histogram test case. Initial commit for adding AccessCountHistograms up until RawProfile for memprof	2024-06-26 08:37:22 -07:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Kazu Hirata	e2d539bbba	[memprof] Fix comment typos (NFC)	2024-06-10 16:38:24 -07:00
Kazu Hirata	211edca559	[memprof] Fix a build error	2024-06-07 16:40:19 -07:00
Teresa Johnson	7536474ea7	[MemProf] Add matching statistics and tracing (#94814 ) To help debug or surface matching issues, add more statistics to the matching. Also add optional emission of each context seen in the function profiles along with its allocation type, size in bytes, and whether it was matched. This information is emitted along with a hash of the full stack context, to allow deduplication across modules for allocations within header files.	2024-06-07 16:26:41 -07:00
Kazu Hirata	4a918f0710	[memprof] Use std::vector<Frame> instead of llvm::SmallVector<Frame> (NFC) (#94432 ) This patch replaces llvm::SmallVector<Frame> with std::vector<Frame>. llvm::SmallVector<Frame> sets aside one inline element. Meanwhile, when I sort all call stacks by their lengths, the length at the first percentile is already 2. That is, 99 percent of call stacks do not take advantage of the inline element. Using std::vector<Frame> reduces the cycle and instruction counts by 11% and 22%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.	2024-06-06 14:24:43 -07:00
Teresa Johnson	e5cbe8fd9c	[MemProf] Optionally match profiles on to manually hinted hot/cold new (#91027 ) While we don't currently rewrite the hints on manually hot/cold hinted allocations, enable optionally matching profiles onto those allocations as a first step to being able to do this. By explicitly checking whether the library function is in the list of operator new also fixes one limitation of the prior call to isNewLikeFn. Some operator new calls (those that specify nothrow) are considered Malloc-like because they may return null. We want to be able to match and rewrite these. Therefore the new test uses a nothrow variant to test the fix for this as well.	2024-05-03 16:18:04 -07:00
Enna1	e0ade45991	[MemProf][NFC] Rename DefaultShadowGranularity to DefaultMemGranulari… (#79412 ) …ty in instrumentation code, be consistent with runtime In runtime code, the size of memory block mapped to a single shadow location is called MEM_GRANULARITY. In instrumentation code, the size of memory block mapped to a single shadow location is called DefaultShadowGranularity. Actually, the SHADOW_GRANULARITY is 8 (1 << SHADOW_SCALE), and the MEM_GRANULARITY is 64. The wording of DefaultShadowGranularity in instrumentation code is a bit misleading, this patch renames DefaultShadowGranularity to DefaultMemGranularity, be consistent with runtime.	2024-01-26 10:04:48 +08:00
Jie Fu	a6161a2524	[Instrumentation] Remove unused variable 'DL' in MemProfiler.cpp (NFC) llvm-project/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp:375:21: error: unused variable 'DL' [-Werror,-Wunused-variable] const DataLayout &DL = I->getModule()->getDataLayout(); ^ 1 error generated.	2024-01-25 10:11:17 +08:00
Enna1	f8262cae69	[MemProf][NFC] remove unneeded sized memory access callback (#79260 ) As discussed in https://github.com/llvm/llvm-project/pull/79244, the sized memory access callback is leftover stuff carried over from Asan, can removed from the instrumentation.	2024-01-25 10:01:17 +08:00
Enna1	a7395891a7	[MemProf][NFC] remove unneeded TypeSize in InterestingMemoryAccess (#79244 ) Unlike ASan, MemProf uses the same memory access callback(inline sequence) for different size memory access, remove unneeded TypeSize stored in InterestingMemoryAccess.	2024-01-25 10:01:03 +08:00
Kazu Hirata	1daf2994de	[llvm] Use StringRef::contains (NFC)	2023-12-23 22:21:52 -08:00
lifengxiang1025	340cb19e15	[MemProf] Expand optimization scope to internal linkage function (#73236 ) Now MemProf can't do IR annotation right in the local linkage function and global initial function __cxx_global_var_init. In llvm-profdata which convert raw memory profile to memory profile, it uses function name in dwarf to create GUID. But when llvm consumes memory profile, it use `getIRPGOFuncName` or `getPGOFuncName` which returns local linkage function as `FileName;FunctionName` or `FileName:FunctionName` to get function name and create GUID. So profile creator's GUID is not same as profile consumer. So I think MemProf should be used with `unique-internal-linkage-names` and don't use PGOFuncName. __cxx_global_var_init is created later than where UniqueInternalLinkageNames works. So I add uniq suffix to __cxx_global_var_init additionally. Co-authored-by: lifengxiang <lifengxiang.1025@bytedance.com>	2023-12-01 14:20:19 +08:00
Fangrui Song	7ca135cd86	[Instrumentation] Remove unneeded pointer casts and migrate away from getInt8PtrTy. NFC After opaque pointer migration, getInt8PtrTy() is considered legacy. Replace it with getPtrTy(), and while here, remove some unneeded pointer casts.	2023-11-15 12:50:49 -08:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Teresa Johnson	87f5e22987	[MemProf] Tolerate missing leaf debug frames (#71233 ) Loosen up the matching so that a missing leaf debug frame in the profile does not prevent matching an allocation context if we can match further up the inlined call context. This relies on the pre-inliner, which was already the default when performing normal PGO feedback along with the MemProf feedback, but to ensure matching is not affected by the presence of PGO, enable the pre-inliner for MemProf feedback as well.	2023-11-03 21:01:07 -07:00
Teresa Johnson	2446439f51	[MemProf] Handle profiles with missing column numbers (#70520 ) Detect when we are matching a memprof profile with no column numbers, and in that case treat all column numbers as 0 when matching. The profiled binary might have been built with -gno-column-info, for example.	2023-10-30 13:19:37 -07:00
Kazu Hirata	d7b18d5083	Use llvm::endianness{,::little,::native} (NFC) Now that llvm::support::endianness has been renamed to llvm::endianness, we can use the shorter form. This patch replaces llvm::support::endianness with llvm::endianness.	2023-10-09 00:54:47 -07:00
Ellis Hoag	fe051934cb	[InstrProf] Encode linkage names in IRPGO counter names Prior to this diff, names in the `__llvm_prf_names` section had the format `[<filepath>:]<function-name>`, e.g., `main.cpp:foo`, `bar`. `<filepath>` is used to discriminate between possibly identical function names when linkage is local and `<function-name>` simply comes from `F.getName()`. This has two problems: * `:` is commonly found in Objective-C functions so that names like `main.mm:-[C foo::]` and `-[C bar::]` are difficult to parse * `<function-name>` might be different from the linkage name, so it cannot be used to pass a function order to the linker via `-symbol-ordering-file` or `-order_file` (see https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068) Instead, this diff changes the format to `[<filepath>;]<linkage-name>`, e.g., `main.cpp;_foo`, `_bar`. The hope is that `;` won't realistically be found in either `<filepath>` or `<linkage-name>`. To prevent invalidating all prior IRPGO profiles, we also lookup the prior name format when a record is not found (see `InstrProfSymtab::create()`, `readMemprof()`, and `getInstrProfRecord()`). It seems that Swift and Clang FE-PGO rely on the original `getPGOFuncName()`, so we cannot simply replace it. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D156569	2023-08-07 10:15:08 -07:00
Teresa Johnson	546ec641b4	Restore "[MemProf] Use new option/pass for profile feedback and matching" This restores commit b4a82b62258c5f650a1cccf5b179933e6bae4867, reverted in 3ab7ef28eebf9019eb3d3c4efd7ebfd160106bb1 because it was thought to cause a bot failure, which ended up being unrelated to this patch set. Differential Revision: https://reviews.llvm.org/D154856	2023-07-11 13:16:20 -07:00
Teresa Johnson	95014050da	Restore "[MemProf] Refactor memory profile matching into MemProfiler (NFC)" This restores commit 29252fdd530f68d0de38a0cd26ed428bb2f5c16a, reverted in 3498cf52ba1c23cbf8acdf99d649d2fa25291eef because it was thought to cause a bot failure, which ended up being unrelated to this patch set. Differential Revision: https://reviews.llvm.org/D154872	2023-07-11 13:16:20 -07:00
JP Lehr	3498cf52ba	Revert "[MemProf] Refactor memory profile matching into MemProfiler (NFC)" This reverts commit 29252fdd530f68d0de38a0cd26ed428bb2f5c16a. This broke AMD GPU OpenMP Offload buildbot	2023-07-11 05:55:55 -04:00
JP Lehr	3ab7ef28ee	Revert "[MemProf] Use new option/pass for profile feedback and matching" This reverts commit b4a82b62258c5f650a1cccf5b179933e6bae4867. Broke AMDGPU OpenMP Offload buildbot	2023-07-11 05:44:42 -04:00
Teresa Johnson	b4a82b6225	[MemProf] Use new option/pass for profile feedback and matching Previously the MemProf profile was expected to be in the same profile file as a normal PGO profile, passed via the usual -fprofile-use= option, and was matched in the same pass. To simplify profile preparation, since the raw MemProf profile requires the binary for symbolization and may be simpler to index separately from the raw PGO profile, and also to enable providing a MemProf profile for a SamplePGO build, separate out the MemProf feedback option and matching pass. This patch adds the -fmemory-profile-use=${file} option, and the provided file is passed down to LLVM and ultimately used in a new MemProfUsePass which performs the matching of just the memory profile contents of that file. Note that a single profile file containing both normal PGO and MemProf profile data is still supported, and the relevant profile data is matched by the appropriate matching pass(es) based on which option(s) the profile is provided with (the same profile file can be supplied to both feedback options). Differential Revision: https://reviews.llvm.org/D154856	2023-07-10 16:42:56 -07:00
Teresa Johnson	29252fdd53	[MemProf] Refactor memory profile matching into MemProfiler (NFC) Split out of D154856, this prepares for the addition of a new dedicated memory profile matching pass. Differential Revision: https://reviews.llvm.org/D154872	2023-07-10 13:12:58 -07:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00

1 2

70 Commits