llvm-project

Author	SHA1	Message	Date
Oskar Wirga	a9d4ddd98a	[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218 ) I noticed that we weren't propagating ALL type metadata that was attached to CFI functions: # BEFORE ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028 ``` # AFTER ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030 ``` This patch makes sure that the entire vector of metadata is copied over.	2024-04-10 15:37:27 -07:00
Lei Wang	1aceee7bb6	Remove unused variable (#88223 ) fix the CI	2024-04-09 19:25:08 -07:00
Teresa Johnson	a332cfc986	[MemProf] Perform cloning for each allocation separately (#87112 ) Restructures the cloning slightly to perform all cloning for each allocation separately. The prior algorithm would sometimes miss cloning opportunities in cases where trimmed cold contexts partially overlapped with longer contexts for different allocations. Most of the change is isolated to the helpers that move edges to new or existing clones, which now support moving a subset of context ids.	2024-04-09 14:12:32 -07:00
Stephen Tozer	708ce85690	[RemoveDIs][NFC] Use ScopedDbgInfoFormatSetter in more places (#87380 ) The class `ScopedDbgInfoFormatSetter` was added as a convenient way to temporarily change the debug info format of a function or module, as part of IR printing; since this process is repeated in a number of other places, this patch uses the format-setter class in those places as well.	2024-04-04 10:20:14 +01:00
Lei Wang	5bbce06ac6	[PseudoProbe] Mix block and call probe ID in lexical order (#75092 ) Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example: ``` main(): BB1 if(...) BB2 foo(..); else BB3 bar(...); BB4 ``` Before the profile is ``` main 1: .. 2: .. 3: ... 4: ... 5: foo ... 6: bar ... ``` Now the new order is ``` main 1: .. 2: .. 3: foo ... 4: ... 5: bar ... 6: ... ``` This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.	2024-04-03 11:18:29 -07:00
Krzysztof Pszeniczny	17642c7602	[SamplePGO] Support -salvage-stale-profile without probes too (#86116 ) Currently -salvage-stale-profile is a no-op if the profile is not probe-based. We observed that it can help for regular, non-probe- based profiles too: some of our internal benchmarks show 0.2-0.3% QPS improvement. There seems to be no good reason to limit this flag to only work for probe-based profiles.	2024-04-03 09:46:00 -07:00
Lei Wang	b8cc3ba409	[PseudoProbe] Extend to skip instrumenting probe into the dests of invoke (#79919 ) As before we only skip instrumenting probe of `unwind`(`KnownColdBlock`) block, this PR extends to skip the both EH flow from `invoke`, i.e. also skip the `normal` dest. For more contexts: when doing call-to-invoke conversion, the block is split by the `invoke` and two extra blocks(`normal` and `unwind`) are added. With this PR, the instrumentation is the same as the one before the call-to-invoke conversion. One significant benefit is this can help mitigate the "unstable IR" issue(https://discourse.llvm.org/t/ipo-for-linkonce-odr-functions/69404), the two versions now are on the same probe instrumentation, expected to be the same checksum. To achieve the same checksum, some tweaks is needed: - Now it also skips incrementing the probe ID for the skipped probe. - The checksum is also computed based on the CFG that skips the EH edges. We observed this fixes ~5% mismatched samples.	2024-04-01 13:54:54 -07:00
Lei Wang	1d99d7a6f8	[SampleFDO][NFC] Refactoring SampleProfileMatcher (#86988 ) Move all the stale profile matching stuffs into new files so that it can be shared for unit testing.	2024-03-28 20:03:03 -07:00
Lei Wang	f8bab38b6d	[CSSPGO] Fix the issue of missing callee profile matches (#85715 ) Two fixes related to the callee/inlinee profile: 1. Fix the bug that the matching results are missing to distribute to the callee profiles (should be pass-by-reference). 2. Narrow imported function matching to checksum mismatched functions. More context: before we run matchings for all imported functions even checksums are matched, however, after we fix 1), we got a regression, it's likely due to the matching is not no-op for checksum matched function, so we want to make it consistent to only run matching for checksum mismatched (imported)functions. Since the metadata(pseudo_probe_desc) are dropped for imported function, we leverage the function attribute mechanism and add a new function attribute(`profile-checksum-mismatch`) to transfer the info from pre-link to post-link.	2024-03-27 22:27:22 -07:00
Lei Wang	2598aa67c8	[CSSPGO] Reject high checksum mismatched profile (#84097 ) Error out the build if the checksum mismatch is extremely high, it's better to drop the profile rather than apply the bad profile. Note that the check is on a module level, the user could make big changes to functions in one single module but those changes might not be performance significant to the whole binary, so we want to be conservative, only expect to catch big perf regression. To do this, we select a set of the "hot" functions for the check. We use two parameter(`hot-func-cutoff-for-staleness-error` and `min-functions-for-staleness-error`) to control the function selection to make sure the selected are hot enough and the num of function is not small. Tuned the parameters on our internal services, it works to catch big perf regression due to the high mismatch .	2024-03-27 11:14:21 -07:00
Teresa Johnson	082e7c480e	[MemProf] Remove empty edges once after cloning (#85320 ) Restructure the handling of edges that become empty during the cloning process. Instead of removing them as they become empty (no context ids and alloc type), do this once after all cloning is complete. This has no effect on the cloning result, but prepares for a follow on change that does improve the cloning. The structural change here reduces the diffs for the follow on change, which would be much more difficult with the previous handling.	2024-03-26 20:06:27 -07:00
Matt Arsenault	733640d29e	Attributor: Handle inferring align from use by atomics (#85762 )	2024-03-21 10:54:03 +05:30
Nikita Popov	0f46e31cfb	[IR] Change representation of getelementptr inrange (#84341 ) As part of the migration to ptradd (https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699), we need to change the representation of the `inrange` attribute, which is used for vtable splitting. Currently, inrange is specified as follows: ``` getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2) ``` The `inrange` is placed on a GEP index, and all accesses must be "in range" of that index. The new representation is as follows: ``` getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2) ``` This specifies which offsets are "in range" of the GEP result. The new representation will continue working when canonicalizing to ptradd representation: ``` getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48) ``` The inrange offsets are relative to the return value of the GEP. An alternative design could make them relative to the source pointer instead. The result-relative format was chosen on the off-chance that we want to extend support to non-constant GEPs in the future, in which case this variant is more expressive. This implementation "upgrades" the old inrange representation in bitcode by simply dropping it. This is a very niche feature, and I don't think trying to upgrade it is worthwhile. Let me know if you disagree.	2024-03-20 10:59:45 +01:00
Lei Wang	12a2bc301f	[CSSPGO] Fix the issue of preinliner import function list (#85719 ) By design, when the nested profile is pre-inliner based, we should fully honor pre-inliner decision, fix it by setting threshold to zero. We observed a perf win on one internal service, no negative impact for other big services.	2024-03-19 16:50:48 -07:00
Stephen Tozer	ffd08c7759	[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216 ) This is the major rename patch that prior patches have built towards. The DPValue class is being renamed to DbgVariableRecord, which reflects the updated terminology for the "final" implementation of the RemoveDI feature. This is a pure string substitution + clang-format patch. The only manual component of this patch was determining where to perform these string substitutions: `DPValue` and `DPV` are almost exclusively used for DbgRecords, except for: - llvm/lib/target, where 'DP' is used to mean double-precision, and so appears as part of .td files and in variable names. NB: There is a single existing use of `DPValue` here that refers to debug info, which I've manually updated. - llvm/tools/gold, where 'LDPV' is used as a prefix for symbol visibility enums. Outside of these places, I've applied several basic string substitutions, with the intent that they only affect DbgRecord-related identifiers; I've checked them as I went through to verify this, with reasonable confidence that there are no unintended changes that slipped through the cracks. The substitutions applied are all case-sensitive, and are applied in the order shown: ``` DPValue -> DbgVariableRecord DPVal -> DbgVarRec DPV -> DVR ``` Following the previous rename patches, it should be the case that there are no instances of any of these strings that are meant to refer to the general case of DbgRecords, or anything other than the DPValue class. The idea behind this patch is therefore that pure string substitution is correct in all cases as long as these assumptions hold.	2024-03-19 20:07:07 +00:00
Orlando Cazalet-Hyams	435d4c12de	Reapply [RemoveDIs] Read/write DbgRecords directly from/to bitcode (#83251 ) Reaplying after revert in #85382 (861ebe6446296c96578807363aa292c69d827773). Fixed intermittent test failure by avoiding piping output in some RUN lines. If --write-experimental-debuginfo-iterators-to-bitcode is true (default false) and --expermental-debuginfo-iterators is also true then the new debug info format (non-instruction records) is written to bitcode directly. Added the following records: FUNC_CODE_DEBUG_RECORD_LABEL FUNC_CODE_DEBUG_RECORD_VALUE FUNC_CODE_DEBUG_RECORD_DECLARE FUNC_CODE_DEBUG_RECORD_ASSIGN FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses the last value available without widening the code-length for FUNCTION_BLOCK from 4 to 5 bits. Records are formatted as follows: All DbgRecord start with: 1. DILocation FUNC_CODE_DEBUG_RECORD_LABEL 2. DILabel DPValues then share common fields: 2. DILocalVariable 3. DIExpression FUNC_CODE_DEBUG_RECORD_VALUE 4. Location Metadata FUNC_CODE_DEBUG_RECORD_DECLARE 4. Location Metadata FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE 4. Location Value (single) FUNC_CODE_DEBUG_RECORD_ASSIGN 4. Location Metadata 5. DIAssignID 6. DIExpression (address) 7. Location Metadata (address) Encoding the DILocation metadata reference directly appeared to yield smaller bitcode files than encoding the operands seperately (as is done with instruction DILocations). FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record in optimized code (order of 5x-10x over other kinds). Unoptimized code should only contain FUNC_CODE_DEBUG_RECORD_DECLARE.	2024-03-15 12:33:55 +00:00
Orlando Cazalet-Hyams	861ebe6446	Revert "[RemoveDIs] Read/write DbgRecords directly from/to bitcode" (#85382 ) Reverts llvm/llvm-project#83251 Buildbot: https://lab.llvm.org/buildbot/#/builders/139/builds/61485	2024-03-15 11:04:21 +00:00
Orlando Cazalet-Hyams	d6d3d96b65	[RemoveDIs] Read/write DbgRecords directly from/to bitcode (#83251 ) If --write-experimental-debuginfo-iterators-to-bitcode is true (default false) and --expermental-debuginfo-iterators is also true then the new debug info format (non-instruction records) is written to bitcode directly. Added the following records: FUNC_CODE_DEBUG_RECORD_LABEL FUNC_CODE_DEBUG_RECORD_VALUE FUNC_CODE_DEBUG_RECORD_DECLARE FUNC_CODE_DEBUG_RECORD_ASSIGN FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE The last one has an abbrev in FUNCTION_BLOCK BLOCK_INFO. Incidentally, this uses the last value available without widening the code-length for FUNCTION_BLOCK from 4 to 5 bits. Records are formatted as follows: All DbgRecord start with: 1. DILocation FUNC_CODE_DEBUG_RECORD_LABEL 2. DILabel DPValues then share common fields: 2. DILocalVariable 3. DIExpression FUNC_CODE_DEBUG_RECORD_VALUE 4. Location Metadata FUNC_CODE_DEBUG_RECORD_DECLARE 4. Location Metadata FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE 4. Location Value (single) FUNC_CODE_DEBUG_RECORD_ASSIGN 4. Location Metadata 5. DIAssignID 6. DIExpression (address) 7. Location Metadata (address) Encoding the DILocation metadata reference directly appeared to yield smaller bitcode files than encoding the operands seperately (as is done with instruction DILocations). FUNC_CODE_DEBUG_RECORD_VALUE_SIMPLE is by far the most common DbgRecord record in optimized code (order of 5x-10x over other kinds). Unoptimized code should only contain FUNC_CODE_DEBUG_RECORD_DECLARE.	2024-03-15 10:47:48 +00:00
Stephen Tozer	2e865353ed	[RemoveDIs][NFC] Move DPValue::filter -> filterDbgVars (#85208 ) This patch changes DPValue::filter to be a non-member method filterDbgVars. There are two reasons for this: firstly, the name of DPValue is about to change to DbgVariableRecord, which will result in every `for` loop that uses DPValue::filter to require a line break. This is a small thing, but it makes the rename patch more difficult to review, and is just generally more awkward for what is a fairly common loop. Secondly, the intent is to later break up the DPValue class into subclasses, at which point it would be better to have a non-member function that allows template arguments for the cases we want to filter with greater specificity.	2024-03-14 12:19:15 +00:00
lcvon007	676c495195	[Attributor][FIX] Register right new created BB. (#84929 ) CBBB will keep same after the first iteration so registerManifestAddedBasicBlock would always register the same basic block later. Co-authored-by: laichunfeng <laichunfeng@tencent.com>	2024-03-13 16:38:48 +08:00
Stephen Tozer	15f3f446c5	[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords (#84793 ) As part of the effort to rename the DbgRecord classes, this patch renames the widely-used functions that operate on DbgRecords but refer to DbgValues or DPValues in their names to refer to DbgRecords instead; all such functions are defined in one of `BasicBlock.h`, `Instruction.h`, and `DebugProgramInstruction.h`. This patch explicitly does not change the names of any comments or variables, except for where they use the exact name of one of the renamed functions. The reason for this is reviewability; this patch can be trivially examined to determine that the only changes are direct string substitutions and any results from clang-format responding to the changed line lengths. Future patches will cover renaming variables and comments, and then renaming the classes themselves.	2024-03-12 14:53:13 +00:00
Florian Hahn	bba4a1daff	[ArgPromotion] Remove incorrect TranspBlocks set for loads. (#84835 ) The TranspBlocks set was used to cache aliasing decision for all processed loads in the parent loop. This is incorrect, because each load can access a different location, which means one load not being modified in a block doesn't translate to another load not being modified in the same block. All loads access the same underlying object, so we could perhaps use a location without size for all loads and retain the cache, but that would mean we loose precision. For now, just drop the cache. Fixes https://github.com/llvm/llvm-project/issues/84807 PR: https://github.com/llvm/llvm-project/pull/84835	2024-03-12 09:47:42 +00:00
lifengxiang1025	e40cabfea4	[MemProf] Match function's summary and definition strictly (#83665 ) Problem description: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1933468520 Solution: https://github.com/llvm/llvm-project/pull/81008#issuecomment-1934192548 (choose plan2)	2024-03-12 11:00:02 +08:00
Jeremy Morse	2fe81edef6	[NFC][RemoveDIs] Insert instruction using iterators in Transforms/ As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.	2024-03-05 15:12:22 +00:00
lifengxiang1025	daf3079222	[ThinLTO] Add metedata 'thinlto_src_module' and 'thinlto_src_file' (#83110 ) Originally, when `EnableImportMetadata` enabled, `SourceFileName` will be recorded as `thinlto_src_module`. Now `SourceFileName` will be recorded as `thinlto_src_file` and `ModuleIdentifier` will be recorded as `thinlto_src_module`.	2024-02-29 10:42:06 +08:00
Teresa Johnson	48bc9022b4	[MemProf] Fix the stack updating handling of pruned contexts (#81322 ) Fix a bug in the handling of cases where a callsite's stack ids partially overlap with the pruned context during matching of calls to the graph contructed from the profiled contexts. This fix makes the code match the comments.	2024-02-27 07:23:51 -08:00
Benjamin Kramer	e630a451b4	[HCS] Fix unused variable warnings. NFCI.	2024-02-22 18:58:50 +01:00
Shimin Cui	a51f4afc5a	[HCS] Externd to outline overlapping sub/super cold regions (#80732 ) Currently, with hot cold splitting, when a cold region is identified, it is added to the region list of ColdBlocks. Then when another cold region (B) identified overlaps with a ColdBlocks region (A) already added to the list, the region B is not added to the list because of the overlapping with region A. The splitting analysis is performed, and the region A may not get split, for example, if it’s considered too expansive. This is to improve the handling the overlapping case when the region A is not considered good for splitting, while the region B is good for splitting. The change is to move the cold region splitting analysis earlier to allow more cold region splitting. If an identified region cannot be split, it will not be added to the candidate list of ColdBlocks for overlapping check.	2024-02-22 12:04:08 -05:00
Matt	88e31f64a0	[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524 ) The deduplication of the calls to `omp_get_thread_limit` used to be legal when originally added in <`e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)`>, as the result (thread_limit) was immutable. However, now that we have `thread_limit` clause, we no longer have immutability; therefore `omp_get_thread_limit()` is not a deduplicable runtime call. Thus, removing `omp_get_thread_limit` from the `DeduplicableRuntimeCallIDs` array. Here's a simple example: ``` #include <omp.h> #include <stdio.h> int main() { #pragma omp target thread_limit(4) { printf("\n1:target thread_limit: %d\n", omp_get_thread_limit()); } #pragma omp target thread_limit(3) { printf("\n2:target thread_limit: %d\n", omp_get_thread_limit()); } return 0; } ``` GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq ``` 1:target thread_limit: 4 2:target thread_limit: 3 ``` Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn ``` 1:target thread_limit: 4 2:target thread_limit: 4 ``` By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>: > If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation deﬁned value between one and the value speciﬁed in the clause. The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered. Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have: https://clang.godbolt.org/z/G7dfhP7jh ``` <source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt] 8 \| printf("\n1:target thread_limit: %d\n",omp_get_thread_limit()); \| ^ ``` OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html > This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see. This optimization will trigger for known OpenMP runtime calls whose return value will not change. At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable." AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either: - `9548b74a83` - https://reviews.llvm.org/D69930 The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place): - [OpenMP][Opt] Annotate known runtime functions and deduplicate more, - `e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)` As a result, we're no longer unsoundly deduplicating the OpenMP runtime call `omp_get_thread_limit` as illustrated by the test case: Note the (correctly) repeated `call i32 @omp_get_thread_limit()`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-02-22 08:13:41 -06:00
Shoaib Meenai	d2942a86d7	[MergeFunctions] Fix thunks for non-instruction debug info (#82080 ) When MergeFunctions creates new thunk functions, it needs to copy over the debug info format kind from the original function, otherwise we'll mix debug info formats and run into assertions. This was exposed by a downstream change that runs MergeFunctions before inlining, which caused assertions when inlining attempted to inline thunks created by merging, and the added test covers both scenarios where merging creates thunks.	2024-02-20 09:42:18 -08:00
Jeremy Morse	3e76e6083d	[DebugInfo][RemoveDIs] Set new-dbg-info flag from Modules correctly (#82373 ) It turns out there's a pathway for Functions to be inserted into modules without having the "New" debug-info flag set correctly, which this patch fixes. Sadly there isn't a Module::insert method to instrument out there, everyone touches the list directly. This fix exposes a path where such functions are produced in the outliner in the wrong mode; requiring a fix there to correctly drop RemoveDIs-mode debug-info. This is covered by test/DebugInfo/AArch64/ir-outliner.ll	2024-02-20 16:13:28 +00:00
Orlando Cazalet-Hyams	ababa96475	[RemoveDIs][NFC] Introduce DbgRecord base class [1/3] (#78252 ) Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The patch stack adds a new base class -> 1. Add DbgRecord base class for DPValue and the not-yet-added DPLabel class. 2. Add the DPLabel class. 3. Enable dbg.label conversion and add support to passes. Patches 1 and 2 are NFC. In the near future we also will rename DPValue to DbgVariableRecord and DPLabel to DbgLabelRecord, at which point we'll overhaul the function names too. The name DPLabel keeps things consistent for now.	2024-02-20 16:00:55 +00:00
Lei Wang	c98da372cb	[CSSPGO] Compute and report profile matching recovered callsites and samples (#79090 ) This change adds the support to compute and report the staleness metrics after stale profile matching so that we can know how effective the fuzzy matching is, i. e. how many callsites and samples are recovered by the matching. Some implementation notes: - The function checksum mismatch metrics are not applicable here as it's function-level metrics, checksum mismatch remains the same before and after matching, so we need to compute based on the callsite samples. - Added two new counters `NumRecoveredCallsites`, `RecoveredCallsiteSamples` for this and removed `TotalCallsiteSamples` as now the we can use the `TotalFuncHashSamples` as base, and renamed some counters. - In profile matching, we changed to use a state machine to represent the callsite's matching state changes. See the `MatchState` for the state, and used a new function `recordCallsiteMatchStates` to compute and record the callsite's match states changes before and after the matching, , the result is compressed and saved into a `FuncCallsiteMatchStates` map for later counting use. - Changed the counting function to run on module-level and moved it to the end of the whole process(`computeAndReportProfileStaleness`). The reason is before the callsite is only counted on top-level function, this change extends it to count(recursively) on the inlined functions and samples, which is more accurate.	2024-02-19 11:36:20 -08:00
Björn Pettersson	7677453886	[ConstantFolding] Do not consider padded-in-memory types as uniform (#81854 ) Teaching ConstantFoldLoadFromUniformValue that types that are padded in memory can't be considered as uniform. Using the big hammer to prevent optimizations when loading from a constant for which DataLayout::typeSizeEqualsStoreSize would return false. Main problem solved would be something like this: store i17 -1, ptr %p, align 4 %v = load i8, ptr %p, align 1 If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't really tell where the padding goes. And even if we assume that the 15 most significant bits are padding, then they should be considered as undefined (even if LLVM backend typically would pad with zeroes). Anyway, for a big-endian target the load would read those most significant bits, which aren't guaranteed to be one's. So it would be wrong to constant fold the load as returning -1. If LLVM IR had been more explicit about the placement of padding, then we could allow the constant fold of the load in the example, but only for little-endian. Fixes: https://github.com/llvm/llvm-project/issues/81793	2024-02-15 15:40:21 +01:00
Jeremy Morse	fa77e1f546	[DebugInfo][RemoveDIs] Convert back to intrinsic form for ThinLTO As explained on discourse [0] (comment 12), to get the non-intrinsic form of debug-info records enabled and testing, we're only using it inside of the pass manager in LLVM right now. Things like the textual IR writer and bitcode writing _passes_ are instrumented to convert back to intrinsic-form when writing a module out, but it turns out we missed the ThinLTO bitcode writing pass. That causes uh, all variable location debug-info to be dropped in ThinLTO mode (oops). This patch adds that conversion; it should be low risk as it's identical to what happens in all the other passes. However should this commit turn out to cause trouble, please instead revert d759618df76 or whichever is the most recent commit to set UseNewDbgInfoFormat to default to true. That'll revert LLVM back to the definitely-correct behaviour. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939	2024-02-13 21:24:04 +00:00
yozhu	c7a0db1e20	[CFI][annotation] Leave alone function pointers in function annotations (#80173 ) Function annotation, as part of llvm.metadata, is for the function itself and doesn't apply to its corresponding jump table entry, so with CFI we shouldn't replace function pointer in function annotation with pointer to its corresponding jump table entry.	2024-02-09 13:55:08 -08:00
Teresa Johnson	9eed89908c	[MemProf] Handle empty stack context during ThinLTO cloning (#81008 ) Fix for assert after PR#78264. Handle the case where the MIB context is empty after skipping the callsite context, because the callsite context is actually longer than the MIB context. Presumably this happened as a result of inlining, but in theory the metadata should have been replaced with an attribute in that case. Need to investigate why this is occuring, but for now handle this gracefully to fix the build regression.	2024-02-07 10:46:34 -08:00
Jeremy Morse	34f61cfa66	[DebugInfo][RemoveDIs] Instrument MergeFunctions for DPValues (#80974 ) The MergeFunctions pass has a "preserve some debug-info" mode that tries to preserve parameter values. This patch generalises its decision-making so that it applies to both debug-info stored in intrinsics, and debug-info stored in DPValue objects. For the most part this involves using a generic lambda and applying it to each type of object. (Normally we avoid debug-info affecting the code generated, but this is hidden behind a command line switch, so won't usually be encountered by users). Note that this diff is messy, but that's because I'm hoisting some code into lambdas. The actual decision making processes here are identical.	2024-02-07 18:19:39 +00:00
Jon Roelofs	e976385415	[llvm][GlobalOpt] Optimize statically resolvable IFuncs (#80606 )	2024-02-06 13:58:58 -08:00
Mingming Liu	8ea858b967	[CallPromotionUtil] See through function alias when devirtualizing a virtual call on an alloca. (#80736 ) - Extract utility function from `DevirtModule::tryFindVirtualCallTargets` [1], which sees through an alias to a function. Call this utility function in the WPD callsite. - For type profiling work, this helper function will be used by indirect-call-promotion pass to find the function pointer at a specified vtable offset (an example in [2]) [1] `b99163fe8f/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp (L1069-L1082)` [2] `77a0ef12de/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (L347)`	2024-02-06 09:22:34 -08:00
lifengxiang1025	6ccb06a7ab	[MemProf] Fix assert when exists direct recursion (#78264 ) Fix assert in `MemProfContextDisambiguation::applyImport` when exists direct recursion.	2024-01-26 20:55:44 +08:00
Jeremy Morse	0065d06760	[NFC][DebugInfo] Maintain RemoveDIs flag when attributor creates functions (#79143 ) We're using this flag (IsNewDbgInfoFormat) to detect the boundaries in LLVM of what's treating debug-info as intrinsics (i.e. dbg.value), and what's using DPValue objects (the non-intrinsic replacement). The attributor tends to create new wrapper functions and doesn't insert them into Modules in the usual way, thus we have to manually update that flag to signal what debug-info mode it's using. I've added some --try-experimental-debuginfo-iterators RUN lines to tests that would otherwise crash because of this, so that they're exercised by our new-debuginfo-iterators buildbot. NB: there's an attributor test with a dbg.value in it, however attributes re-order themselves in RemoveDIs mode for various reasons, so we're going to address that in a different patch.	2024-01-24 15:20:05 +00:00
Paul Kirth	9d476e1e1a	[clang][FatLTO] Avoid UnifiedLTO until it can support WPD/CFI (#79061 ) Currently, the UnifiedLTO pipeline seems to have trouble with several LTO features, like SplitLTO units, which means we cannot use important optimizations like Whole Program Devirtualization or security hardening instrumentation like CFI. This patch reverts FatLTO to using distinct pipelines for Full LTO and ThinLTO. It still avoids module cloning, since that was error prone.	2024-01-23 14:04:52 -08:00
Jeremy Morse	4782ac8dd3	[DebugInfo][RemoveDIs] Use splice in Outliner rather than moveBefore (#79124 ) This patch replaces a utility in the outliner that moves the contents of one basic block into another basic block, with a call to splice instead. I think it's NFC, however I'd like a second pair of eyes to look at it just in case. The reason for doing this is an edge case in the handling of DPValue objects, the replacement for dbg.values. If there's a variable assignment "dangling" at the end of a block (which happens when we delete the terminator), inserting instructions at end() doesn't shift the DPValue up into the block. We could probably fix this; but it's much easier to use splice at the only call site that does this. Patch adds --try-experimental-debuginfo-iterators to a test to exercise this code path.	2024-01-23 16:23:48 +00:00
Teresa Johnson	070738ba88	[MemProf][NFC] Explicitly specify llvm version of function_ref (#77783 ) As suggested in https://github.com/llvm/llvm-project/pull/75823, to avoid confusion with std::function_ref, qualify all uses with llvm:: (we were already using the llvm version, but this avoids ambiguity).	2024-01-16 11:20:55 -08:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Kazu Hirata	7b9bc4729b	[IPO] Use a range-based for loop (NFC)	2024-01-11 22:48:22 -08:00
Teresa Johnson	c37699b9e3	[MemProf] Add missing <unordered_map> include to fix buildbot (#77788 ) Should fix buildbot failure https://lab.llvm.org/buildbot/#/builders/54/builds/8451 from #75823.	2024-01-11 07:48:55 -08:00
Teresa Johnson	26a8664ed4	[MemProf] Handle missing tail call frames (#75823 ) If tail call optimization was not disabled for the profiled binary, the call contexts will be missing frames for tail calls. Handle this by performing a limited search through tail call edges for the profiled callee when a discontinuity is detected. The search depth is adjustable but defaults to 5. If we are able to identify a short sequence of tail calls, update the graph for those calls. In the case of ThinLTO, synthesize the necessary CallsiteInfos for carrying the cloning information to the backends.	2024-01-11 06:57:48 -08:00
Bill Wendling	fc6b5666db	[NFC][ObjectSizeOffset] Use classes instead of std::pair (#76882 ) The use of std::pair makes the values it holds opaque. Using classes improves this while keeping the POD aspect of a std::pair. As a nice addition, the "known" functions held inappropriately in the Visitor classes can now properly reside in the value classes. :-)	2024-01-05 18:08:53 -08:00

1 2 3 4 5 ...

6172 Commits