llvm-project

Author	SHA1	Message	Date
Philip Reames	e6b4a21849	[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856 ) Goal is simply to reduce direct usage of getLength and setLength so that if we end up moving memset.pattern (whose length is in elements) there are fewer places to audit.	2025-08-20 13:50:11 -07:00
Nikita Popov	5ae749b77d	[FunctionAttr] Invalidate callers with mismatching signature (#154289 ) If FunctionAttrs infers additional attributes on a function, it also invalidates analysis on callers of that function. The way it does this right now limits this to calls with matching signature. However, the function attributes will also be used when the signatures do not match. Use getCalledOperand() to avoid a signature check. This is not a correctness fix, just improves analysis quality. I noticed this due to https://github.com/llvm/llvm-project/pull/144497#issuecomment-3199330709, where LICM ends up with a stale MemoryDef that could be a MemoryUse (which is a bug in LICM, but still non-optimal).	2025-08-20 11:38:31 +02:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Kazu Hirata	cbf5af9668	[llvm] Remove unused includes (NFC) (#154051 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-08-17 23:46:35 -07:00
Owen Anderson	69e4514978	[GlobalOpt] Do not fold away addrspacecasts which may be runtime operations (#153753 ) Specifically in the context of the once-stored transformation, GlobalOpt would strip all pointer casts unconditionally, even though addrspacecasts might be runtime operations. This manifested particularly on CHERI targets. This patch was inspired by an existing change in CHERI LLVM (`91afa60f17`), but has been reimplemented with updated conventions, and a testcase constructed from scratch.	2025-08-18 02:11:51 +00:00
Kaitlin Peng	0bb1af478a	[DirectX] Add GlobalDCE pass after finalize linkage pass in DirectX backend (#151071 ) Fixes #139023. This PR essentially removes unused global variables: - Restores the `GlobalDCE` Legacy pass and adds it to the DirectX backend after the finalize linkage pass - Converts external global variables with no usage to internal linkage in the finalize linkage pass - (so they can be removed by `GlobalDCE`) - Makes the `dxil-finalize-linkage` pass usable using the new pass manager flag syntax - Adds tests to `finalize_linkage.ll` that make sure unused global variables are removed - Adds a use for variable `@CBV` in `opaque-value_as_metadata.ll` so it isn't removed - Changes the `scalar-data.ll` run command to avoid removing its global variables --------- Co-authored-by: Farzon Lotfi <farzonlotfi@microsoft.com>	2025-08-15 10:45:34 -07:00
Tobias Stadler	d803a93f55	[Inliner] Report inlining decision before deleting Callee contents (#153616 ) Call `recordInliningWithCalleeDeleted` before dropping the contents of the Callee. Otherwise the handlers don't have access to e.g. the DebugLoc, so the Callee DebugLoc was missing in inlining remarks for functions with internal linkage. The test is the same as `optimization-remarks-passed-yaml.ll` except that the function `foo` has internal linkage instead of external linkage.	2025-08-15 12:00:34 +01:00
Nikita Popov	35bad229c1	[PredicateInfo] Use bitcast instead of ssa.copy (#151174 ) PredicateInfo needs some no-op to which the predicate can be attached. Currently this is an ssa.copy intrinsic. This PR replaces it with a no-op bitcast. Using a bitcast is more efficient because we don't have the overhead of an overloaded intrinsic. It also makes things slightly simpler overall.	2025-08-11 09:25:01 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Teresa Johnson	dc90472532	[MemProf] Ensure node merging happens for newly created nodes (#151593 ) We weren't performing node merging on newly created nodes in some cases. Use a simple iteration over the node and its callers until no more opportunities are found. I confirmed that for several large codes the max iterations is 3 (meaning we only needed to do any work on the first 2, as expected). This can potentially be made more elegant in the future, but it is a simple and effective solution. Also fix a bug exposed by the test case, getting the function for a call instruction in the FullLTO handling, using an existing method to look through aliases if needed.	2025-08-01 12:51:12 -07:00
Kazu Hirata	228e96b28a	[llvm] Use std::make_optional (NFC) (#151627 ) std::make_optional<T> is a lot like std::make_unique<T> in that it performs perfect forwarding of arguments for T's constructor. As a result, we don't have to repeat type names twice.	2025-08-01 00:24:40 -07:00
Peter Collingbourne	ff38981a58	LTO: Redesign the CFI !aliases metadata. With the current aliases metadata we lose information about which groups of aliases survive symbol resolution. This causes various problems such as #150075 where symbol resolution breaks the link between alias groups. In this redesign of the aliases metadata, we stop representing the individual aliases in !aliases. Instead, the individual aliases are represented in !cfi.functions in the same way as functions, and the alias groups (i.e. groups of symbols with the same address) are stored in !aliases. At symbol resolution time, we filter out all non-prevailing members of !aliases; the resulting set is used by LowerTypeTests to recreate the aliases. With this change it is now possible for a jump table entry to refer to an alias in one of the ThinLTO object files (e.g. if a function is non-prevailing but its alias is prevailing), so instead of deleting them, rename them with the ".cfi" suffix. Fixes #150070. Fixes #150075. Reviewers: teresajohnson, vitalybuka Reviewed By: vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/150690	2025-07-30 14:04:11 -07:00
Teresa Johnson	d4562a1991	[MemProf] Use DenseMap for call map (NFC) (#151161 ) There is no reason to use std::map for the call maps maintained for function clones during function clone assignment, as we don't iterate over them and don't need deterministic ordering, so use the more efficient DenseMap.	2025-07-29 08:18:31 -07:00
Nikita Popov	ef51514c38	[FunctionAttrs] Don't bail out on unknown calls (#150958 ) When inferring attributes, we should not bail out early on unknown calls (such as virtual calls), as we may still have call-site attributes that can be used for inference. Fixes https://github.com/llvm/llvm-project/issues/150817.	2025-07-29 11:45:31 +02:00
Kazu Hirata	255bba0136	[memprof] Fix a warning This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9: error: non-void lambda does not return a value in all control paths [-Werror,-Wreturn-type]	2025-07-28 19:35:02 -07:00
Teresa Johnson	f3761ab340	Reapply "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) (#151055 ) This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying PR150735 with a fix for the unstable iteration order exposed by the new tests (PR151039).	2025-07-28 17:04:45 -07:00
Teresa Johnson	ced3b90738	[MemProf] Change map to vector to avoid unstable iteration (#151039 ) We iterate over a std::map indexed by FuncInfo, which is a pair of a pointer and a clone number. In the ThinLTO case, this isn't an issue as the function pointer always points to the same FunctionSummary object. However, for regular LTO, this is a pointer to a Function object, which is different for each clone. This will lead to unstable iteration order. This was exposed in a test case added for PR150735, which added a new instance of iteration over this map. Since these function clones are added and numbered sequentially, change this to a vector indexed by clone number, which points to a structure containing the clone FuncInfo and the call map (the old map's key and value, respectively).	2025-07-28 14:20:49 -07:00
Teresa Johnson	314e22bcab	Revert "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856 ) Reverts llvm/llvm-project#150735 due to bot failures that I need to investigate	2025-07-27 15:55:22 -07:00
Teresa Johnson	0f2484a740	[MemProf] Ensure all callsite clones are assigned a function clone (#150735 ) Fix a bug in function assignment where we were not assigning all callsite clones to a function clone. This led to incorrect call updates because multiple callsite clones could look like they were assigned to the same function clone. Add in a stat and debug message to help identify and debug cases where this is still happening.	2025-07-27 11:48:30 -07:00
Teresa Johnson	e4963834e4	[MemProf] Include caller clone information in dot graph nodes (#150492 ) We already included the assigned clone of the callsite node's callee in the dot graph after function assignment. This adds the same information for the enclosing caller function to aid debugging.	2025-07-25 07:29:22 -07:00
Alexandros Lamprineas	3ab64c5b29	[NFC][Clang][FMV] Make FMV priority data type future proof. (#150079 ) FMV priority is the returned value of a polymorphic function. On RISC-V and X86 targets a 32-bit value is enough. On AArch64 we currently need 64 bits and we will soon exceed that. APInt seems to be a suitable replacement for uint64_t, presumably with minimal compile time overhead. It allows bit manipulation, comparison and variable bit width.	2025-07-23 10:37:29 +01:00
Teresa Johnson	0e42c665f9	[MemProf] Update the declaration DISubprogram linkageName for clones (#149864 ) Follow up to PR145385 to also update the linkageName on any separate DISubprogram for the clone function declaration.	2025-07-21 12:27:36 -07:00
Jeremy Morse	c9d8b68676	[DebugInfo] Suppress lots of users of DbgValueInst (#149476 ) This is another prune of dead code -- we never generate debug intrinsics nowadays, therefore there's no need for these codepaths to run. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2025-07-18 11:31:52 +01:00
Jeremy Morse	5b8c15c6e7	[DebugInfo] Remove getPrevNonDebugInstruction (#148859 ) With the advent of intrinsic-less debug-info, we no longer need to scatter calls to getPrevNonDebugInstruction around the codebase. Remove most of them -- there are one or two that have the "SkipPseudoOp" flag turned on, however they don't seem to be in positions where skipping anything would be reasonable.	2025-07-16 11:41:32 +01:00
Jeremy Morse	57a5f9c47e	[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383 ) There are no longer debug-info instructions, thus we don't need this skipping. Horray!	2025-07-15 15:34:10 +01:00
Peter Collingbourne	11325fd0c9	LowerTypeTests: Remove unused variables.	2025-07-11 18:06:01 -07:00
Teresa Johnson	ac39d26dc4	[MemProf] Don't mutate the function type when calling clone (#147829 ) In rare cases the declaration of a function may not match its callsite after function importing, when the declaration was imported from a module where the function had void return type (presumably due to incomplete types). Instead of using setCalledFunction() to change a call to call its clone, which updates the call's function type as well, just call setCalledOperand directly so the only thing changed is the function target. Note this can't happen for the other places where we call setCalledFunction: FullLTO requires the cloned callee to be defined in the same FullLTO merged module; ThinLTO memprof ICP calls an ICP facility to first perform the promotion and that will be blocked if the function type doesn't match the callsite (the new test explicitly tests this latter case).	2025-07-11 11:33:43 -07:00
Teresa Johnson	838701a540	MemProf: Add minimum count threshold for inlining of promoted calls (#148001 ) Allow users to set the minimum absolute count for inlining of indirect calls promoted during cloning. This is primarily meant to enable generation of synthetic vp metadata introduced in PR141164 when profiling memprof-optimized binaries.	2025-07-10 13:48:16 -07:00
Shoreshen	181b014c06	Attributor: Infer noalias.addrspace metadata for memory instructions (#136553 ) Add noalias.addrspace metadata for store, load and atomic instruction in AMDGPU backend.	2025-07-08 09:50:31 +08:00
Andreas Jonson	0a067dc107	[Attributor] Swap range metadata to attribute for calls. (#108835 )	2025-07-05 16:47:03 +02:00
zGoldthorpe	f393211454	[Reland][IPO] Added attributor for identifying invariant loads (#146584 ) Patched and tested the `AAInvariantLoadPointer` attributor from #141800, which identifies pointers whose loads are eligible to be marked as `!invariant.load`. The bug in the attributor was due to `AAMemoryBehavior` always identifying pointers obtained from `alloca`s as having no writes. I'm not entirely sure why `AAMemoryBehavior` behaves this way, but it seems to be beceause it identifies the scope of an `alloca` to be limited to only that instruction (and, certainly, no memory writes occur within the `alloca` instructin). This patch just adds a check to disallow all loads from `alloca` pointers from being marked `!invariant.load` (since any well-defined program will have to write to stack pointers at some point).	2025-07-01 17:46:19 -04:00
Shivam Gupta	e44fbea0a1	[FunctionAttrs] Handle ConstantRange overflow in memset initializes inference (#145739 ) Avoid constructing invalid ConstantRange when Offset + Length in memset overflows signed 64-bit integer space. This prevents assertion failures when inferring the initializes attribute. Fixes #140345	2025-07-01 18:34:52 +05:30
Nikita Popov	183acdd279	[GlobalOpt] Revert global widening transform (#144652 ) Partially reverts e37d736def5b95a2710f92881b5fc8b0494d8a05. The transform has a number of correctness and code quality issues, and will benefit from a from-scratch re-review more than incremental fixes. The correctness issues are hinted at in https://github.com/llvm/llvm-project/pull/144641, but I think it needs a larger rework to stop working on ArrayTypes and the implementation could use some other improvements (like callInstIsMemcpy should just be `dyn_cast<MemCpyInst>`). I can comment in more detail on a resubmission of the patch.	2025-06-30 14:48:37 +02:00
Paul Kirth	23daa31341	[llvm] Don't preserve analysis results after EmbedBitcodePass (#146118 ) Expensive checks complains when we mark them as preserved. The bitcode being embedded generally doesn't change anything important in the module, but some things are modified under ThinLTO, like vtables under WPD. This became a non-issue when we cloned the module, but after we had to revert that in #145987, we need to handle this case properly.	2025-06-27 12:49:20 -07:00
Paul Kirth	9179322447	Revert "[llvm][EmbedBitcodePass] Prevent modifying the module with ThinLTO" (#145987 ) Reverts llvm/llvm-project#139999 This has a reported crash in https://github.com/llvm/llvm-project/pull/139999#issuecomment-2993622494 This PR was intended to fix an error when linking, which is unfortunately preferable to crashing clang. For now, we'll revert and investigate the problem.	2025-06-26 22:35:38 -07:00
Kazu Hirata	3d5903c4d8	[llvm] Use llvm::is_contained (NFC) (#145844 ) llvm::is_contained is shorter than llvm::all_of plus a lambda.	2025-06-26 08:41:18 -07:00
Kazu Hirata	2a35414e98	[Transforms] Use range-based for loops (NFC) (#145252 ) Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-06-25 10:08:26 -07:00
Teresa Johnson	90a6819cfe	[MemProf] Update the DISubprogram linkageName for clones (#145385 ) This corrects the debug information for the cloned functions so that it contains the correct linkage name.	2025-06-23 18:57:22 -07:00
Nikita Popov	879a55793a	[ExpandVariadics] Clean up intrinsic declaration lookup (NFC) The comment was outdated, as getDeclarationIfExists has been introduced in the meantime. We also only use this in one place where neither the Tys.empty() case nor the FT is relevant, so just include the call to getDeclarationIfExists().	2025-06-23 15:31:17 +02:00
Shilei Tian	be7e4113c8	[NFC] Add comment to describe the intention use of newly added `avail-extern-gv-in-addrspace-to-local` (#144911 )	2025-06-20 18:55:41 -04:00
Tianle Liu	6001a8bb94	[WholeProgramDevirt] Add check for AvailableExternal and give up icall.branch.funnel (#143468 ) When a customer class inherits from a libc++ class, and is built with "-flto -fwhole-program-vtables -static-libstdc++ \ -Wl,-plugin-opt=-whole-program-visibility", the libc++ class's vtable is available_externally, meanwhile the customer class vtable is private. And both of them are !vcall_visibility == Linkage Unit. In this case, icall.branch.funnel might be generated. But the icall.branch.funnel would cause crash in LowerTypeTests because available_externally Global_Object's GlobalTypeMember would not be saved and finally leads to a NULL GlobalTypeMember which causes a crash. Even saving the available_externally GO's GlobalTypeMember so that it is not NULL to avoid the crash in LowerTypeTests, it still will crash in SelectionDAGBuilder or Verifier, because operands linkage type consistency check of icall.branch.funnel can not pass. So any one of available externally vtable would stop to generate icall.branch.funnel. This patch fixes FullLTO mode and split-LTO-unit ThinLTO mode.	2025-06-20 08:01:32 +08:00
Nikita Popov	f4db14229c	[SCCP] Move logic for removing ssa.copy into Solver (NFC) So it can be reused between IPSCCP and SCCP. Make the implementation a bit more efficient by only lookup the PredicateInfo once.	2025-06-19 17:28:39 +02:00
zGoldthorpe	00ae89a1cb	Revert "[IPO] Added attributor for identifying invariant loads" (#144808 ) Reverts llvm/llvm-project#141800 The implementation critically misunderstands the `AAMemoryBehavior` attributor, which it relies on heavily. @shiltian, since I do not have commit permissions.	2025-06-18 18:35:01 -04:00
Craig Topper	255b55c602	[GlobalOpt] Use cast instead of dyn_cast. NFC (#144634 ) The dyn_cast was not checked for null, and the cast is guaranteed to succeed by an earlier check.	2025-06-18 01:35:56 -07:00
Peter Collingbourne	9265b1f0cf	LowerTypeTests: Use jump table entry type as value type of jump table alias. The motivation for this is that it causes the jump table entry's symbol to have an st_size equal to the jump table entry size, instead of being equal to the size of the entire jump table, which is incorrect and can lead to unexpected behavior in binary analysis tools that rely on the size field such as Bloaty. Reviewers: fmayer Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/144462	2025-06-17 18:15:06 -07:00
Shilei Tian	15482c83aa	[ElimAvailExtern] Add an option to allow to convert global variables in a specified address space to local (#144287 ) Currently, the `EliminateAvailableExternallyPass` only converts certain available externally functions to local if `avail-extern-to-local` is set or in contextual profiling mode. For global variables, it only drops their initializers. This PR adds an option to allow the pass to convert global variables in a specified address space to local. The motivation for this change is to correctly support lowering of LDS variables (`__shared__` variables, in more generic terminology) when ThinLTO is enabled for AMDGPU. A `__shared__` variable is lowered to a hidden global variable in a particular address space by the frontend, which is roughly same as a `static` local variable. To properly lower it in the backend, the compiler needs to check all its uses. Enabling ThinLTO currently breaks this when a function containing a `__shared__` variable is imported from another module. Even though the global variable is imported along with its associated function, and the function is privatized by the `EliminateAvailableExternallyPass`, the global variable itself is not. It's safe to privatize such global variables, because they're _local_ to their associated functions. If the function itself is privatized, its associated global variables should also be privatized accordingly.	2025-06-17 19:58:24 -04:00
Slava Zakharin	bec9ac2daf	[llvm] Lower latency bonus threshold in function specialization. (#143954 ) Related to #143219. Function specialization does not kick in if flang sets `noalias` attributes on the function arguments of `digits_2`, because PRE optimizes several `srem` instructions and other memory accesses from the inner loops causing the latency bonus to be lower than the current 40% threshold. While looking at this, I did not really get why we compute the latency bonus as a ratio of the latency of the "eliminated" instructions and the code-size of the whole function. It did not make much sense to me. I tried computing the total latency as a sum of latencies of the instructions that belong to non-dead code (including the instructions that would be executed had they not been "eliminated" due to the constant propagation). This total latency should identify the total cost of executing the function with the given argument being dynamically equal to the tried constant value. Then the latency bonus would be computed as the ratio between the latency of the "eliminated" instructions and the total latency. Unfortunately, this did not given me a good heuristics either. The bonus was close to 0% on some targets, and as big as 3-5% on other targets. This does match very well with the performance gain achieved by function specialization for exchange2, so it seemd like another artificial heuristic not better than the current one. It seems that GCC uses a set of different heuristics for function specialization, but I am not an expert here and I cannot say if we can match them in LLVM. With all that said, I decided to try to lower the threshold to avoid the regression and be able to re-enable the generally good change for `noalias` attribute. With this patch, I was able to reduce the effect of `noalias`, so that `-force-no-alias=true` is only ~10% slower than `-force-no-alias=false` code on neoverse-v1 and neoverse-v2. On neoverse-n1, `-force-no-alias=true` is >2x faster than `-force-no-alias=false` regardless of this patch. This threshold has been changed before also due to improved alias information: `2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)` Please let me know what testing I should run to make sure this change is safe. As I understand, it may affect the compilation time performance, and I will appreciate it if someone points out which benchmarks need to be checked before merging this.	2025-06-17 16:13:42 -07:00
Jeremy Morse	14286244f1	Follow up to 9eb0020555, squelch unused variable warning It turns out that this now-deleted debug-intrinsic code was the only use of CI.	2025-06-17 16:24:12 +01:00
Jeremy Morse	9eb0020555	[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389 ) Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-17 15:55:14 +01:00
PiJoules	964888d01f	[llvm][CFI] Ensure COFF comdat renaming applies for imported functions (#143421 ) I ran into the same issue as https://github.com/llvm/llvm-project/pull/139962 regarding the comdat corresponding to a renamed key function but for thinlto. My last patch had not considered the thinlto case, so this applies the same fix for imported functions.	2025-06-16 16:24:45 -07:00

1 2 3 4 5 ...

6703 Commits