6696 Commits

Author SHA1 Message Date
Nikita Popov
35bad229c1
[PredicateInfo] Use bitcast instead of ssa.copy (#151174)
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
    
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
2025-08-11 09:25:01 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Teresa Johnson
dc90472532
[MemProf] Ensure node merging happens for newly created nodes (#151593)
We weren't performing node merging on newly created nodes in some cases.
Use a simple iteration over the node and its callers until no more
opportunities are found. I confirmed that for several large codes the
max iterations is 3 (meaning we only needed to do any work on the first
2, as expected). This can potentially be made more elegant in the
future, but it is a simple and effective solution.

Also fix a bug exposed by the test case, getting the function for a call
instruction in the FullLTO handling, using an existing method to look
through aliases if needed.
2025-08-01 12:51:12 -07:00
Kazu Hirata
228e96b28a
[llvm] Use std::make_optional (NFC) (#151627)
std::make_optional<T> is a lot like std::make_unique<T> in that it
performs perfect forwarding of arguments for T's constructor.  As a
result, we don't have to repeat type names twice.
2025-08-01 00:24:40 -07:00
Peter Collingbourne
ff38981a58
LTO: Redesign the CFI !aliases metadata.
With the current aliases metadata we lose information about which groups
of aliases survive symbol resolution. This causes various problems such
as #150075 where symbol resolution breaks the link between alias groups.

In this redesign of the aliases metadata, we stop representing the
individual aliases in !aliases. Instead, the individual aliases are
represented in !cfi.functions in the same way as functions, and the
alias groups (i.e. groups of symbols with the same address) are stored
in !aliases. At symbol resolution time, we filter out all non-prevailing
members of !aliases; the resulting set is used by LowerTypeTests to
recreate the aliases.

With this change it is now possible for a jump table entry to refer
to an alias in one of the ThinLTO object files (e.g. if a function is
non-prevailing but its alias is prevailing), so instead of deleting them,
rename them with the ".cfi" suffix.

Fixes #150070.

Fixes #150075.

Reviewers: teresajohnson, vitalybuka

Reviewed By: vitalybuka

Pull Request: https://github.com/llvm/llvm-project/pull/150690
2025-07-30 14:04:11 -07:00
Teresa Johnson
d4562a1991
[MemProf] Use DenseMap for call map (NFC) (#151161)
There is no reason to use std::map for the call maps maintained for
function clones during function clone assignment, as we don't iterate
over them and don't need deterministic ordering, so use the more
efficient DenseMap.
2025-07-29 08:18:31 -07:00
Nikita Popov
ef51514c38
[FunctionAttrs] Don't bail out on unknown calls (#150958)
When inferring attributes, we should not bail out early on unknown calls
(such as virtual calls), as we may still have call-site attributes that
can be used for inference.

Fixes https://github.com/llvm/llvm-project/issues/150817.
2025-07-29 11:45:31 +02:00
Kazu Hirata
255bba0136 [memprof] Fix a warning
This patch fixes:

  llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:4771:9:
  error: non-void lambda does not return a value in all control paths
  [-Werror,-Wreturn-type]
2025-07-28 19:35:02 -07:00
Teresa Johnson
f3761ab340
Reapply "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856) (#151055)
This reverts commit 314e22bcab2b0f3d208708431a14215058f0718f, reapplying
PR150735 with a fix for the unstable iteration order exposed by the new
tests (PR151039).
2025-07-28 17:04:45 -07:00
Teresa Johnson
ced3b90738
[MemProf] Change map to vector to avoid unstable iteration (#151039)
We iterate over a std::map indexed by FuncInfo, which is a pair of a
pointer and a clone number. In the ThinLTO case, this isn't an issue as
the function pointer always points to the same FunctionSummary object.
However, for regular LTO, this is a pointer to a Function object, which
is different for each clone. This will lead to unstable iteration order.

This was exposed in a test case added for PR150735, which added a new
instance of iteration over this map.

Since these function clones are added and numbered sequentially, change
this to a vector indexed by clone number, which points to a structure
containing the clone FuncInfo and the call map (the old map's key and
value, respectively).
2025-07-28 14:20:49 -07:00
Teresa Johnson
314e22bcab
Revert "[MemProf] Ensure all callsite clones are assigned a function clone" (#150856)
Reverts llvm/llvm-project#150735 due to bot failures that I need to
investigate
2025-07-27 15:55:22 -07:00
Teresa Johnson
0f2484a740
[MemProf] Ensure all callsite clones are assigned a function clone (#150735)
Fix a bug in function assignment where we were not assigning all
callsite clones to a function clone. This led to incorrect call updates
because multiple callsite clones could look like they were assigned to
the same function clone.

Add in a stat and debug message to help identify and debug cases where
this is still happening.
2025-07-27 11:48:30 -07:00
Teresa Johnson
e4963834e4
[MemProf] Include caller clone information in dot graph nodes (#150492)
We already included the assigned clone of the callsite node's callee in
the dot graph after function assignment. This adds the same information
for the enclosing caller function to aid debugging.
2025-07-25 07:29:22 -07:00
Alexandros Lamprineas
3ab64c5b29
[NFC][Clang][FMV] Make FMV priority data type future proof. (#150079)
FMV priority is the returned value of a polymorphic function. On RISC-V
and X86 targets a 32-bit value is enough. On AArch64 we currently need
64 bits and we will soon exceed that. APInt seems to be a suitable
replacement for uint64_t, presumably with minimal compile time overhead.
It allows bit manipulation, comparison and variable bit width.
2025-07-23 10:37:29 +01:00
Teresa Johnson
0e42c665f9
[MemProf] Update the declaration DISubprogram linkageName for clones (#149864)
Follow up to PR145385 to also update the linkageName on any separate
DISubprogram for the clone function declaration.
2025-07-21 12:27:36 -07:00
Jeremy Morse
c9d8b68676
[DebugInfo] Suppress lots of users of DbgValueInst (#149476)
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-07-18 11:31:52 +01:00
Jeremy Morse
5b8c15c6e7
[DebugInfo] Remove getPrevNonDebugInstruction (#148859)
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
2025-07-16 11:41:32 +01:00
Jeremy Morse
57a5f9c47e
[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
2025-07-15 15:34:10 +01:00
Peter Collingbourne
11325fd0c9 LowerTypeTests: Remove unused variables. 2025-07-11 18:06:01 -07:00
Teresa Johnson
ac39d26dc4
[MemProf] Don't mutate the function type when calling clone (#147829)
In rare cases the declaration of a function may not match its callsite
after function importing, when the declaration was imported from a
module where the function had void return type (presumably due to
incomplete types). Instead of using setCalledFunction() to change a call
to call its clone, which updates the call's function type as well, just
call setCalledOperand directly so the only thing changed is the function
target.

Note this can't happen for the other places where we call
setCalledFunction: FullLTO requires the cloned callee to be defined in
the same FullLTO merged module; ThinLTO memprof ICP calls an ICP
facility to first perform the promotion and that will be blocked if the
function type doesn't match the callsite (the new test explicitly tests
this latter case).
2025-07-11 11:33:43 -07:00
Teresa Johnson
838701a540
MemProf: Add minimum count threshold for inlining of promoted calls (#148001)
Allow users to set the minimum absolute count for inlining of indirect
calls promoted during cloning. This is primarily meant to enable
generation of synthetic vp metadata introduced in PR141164 when
profiling memprof-optimized binaries.
2025-07-10 13:48:16 -07:00
Shoreshen
181b014c06
Attributor: Infer noalias.addrspace metadata for memory instructions (#136553)
Add noalias.addrspace metadata for store, load and atomic instruction in
AMDGPU backend.
2025-07-08 09:50:31 +08:00
Andreas Jonson
0a067dc107
[Attributor] Swap range metadata to attribute for calls. (#108835) 2025-07-05 16:47:03 +02:00
zGoldthorpe
f393211454
[Reland][IPO] Added attributor for identifying invariant loads (#146584)
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.

The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
2025-07-01 17:46:19 -04:00
Shivam Gupta
e44fbea0a1
[FunctionAttrs] Handle ConstantRange overflow in memset initializes inference (#145739)
Avoid constructing invalid ConstantRange when Offset + Length in memset
overflows signed 64-bit integer space. This prevents assertion failures
when inferring the initializes attribute.

Fixes #140345
2025-07-01 18:34:52 +05:30
Nikita Popov
183acdd279
[GlobalOpt] Revert global widening transform (#144652)
Partially reverts e37d736def5b95a2710f92881b5fc8b0494d8a05.

The transform has a number of correctness and code quality issues, and
will benefit from a from-scratch re-review more than incremental fixes.

The correctness issues are hinted at in
https://github.com/llvm/llvm-project/pull/144641, but I think it needs a
larger rework to stop working on ArrayTypes and the implementation could
use some other improvements (like callInstIsMemcpy should just be
`dyn_cast<MemCpyInst>`). I can comment in more detail on a resubmission
of the patch.
2025-06-30 14:48:37 +02:00
Paul Kirth
23daa31341
[llvm] Don't preserve analysis results after EmbedBitcodePass (#146118)
Expensive checks complains when we mark them as preserved. The bitcode
being embedded generally doesn't change anything important in the
module, but some things are modified under ThinLTO, like vtables under
WPD. This became a non-issue when we cloned the module, but after we had
to revert that in #145987, we need to handle this case properly.
2025-06-27 12:49:20 -07:00
Paul Kirth
9179322447
Revert "[llvm][EmbedBitcodePass] Prevent modifying the module with ThinLTO" (#145987)
Reverts llvm/llvm-project#139999

This has a reported crash in
https://github.com/llvm/llvm-project/pull/139999#issuecomment-2993622494

This PR was intended to fix an error when linking, which is
unfortunately preferable to crashing clang. For now, we'll revert and
investigate the problem.
2025-06-26 22:35:38 -07:00
Kazu Hirata
3d5903c4d8
[llvm] Use llvm::is_contained (NFC) (#145844)
llvm::is_contained is shorter than llvm::all_of plus a lambda.
2025-06-26 08:41:18 -07:00
Kazu Hirata
2a35414e98
[Transforms] Use range-based for loops (NFC) (#145252)
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-25 10:08:26 -07:00
Teresa Johnson
90a6819cfe
[MemProf] Update the DISubprogram linkageName for clones (#145385)
This corrects the debug information for the cloned functions so that it
contains the correct linkage name.
2025-06-23 18:57:22 -07:00
Nikita Popov
879a55793a [ExpandVariadics] Clean up intrinsic declaration lookup (NFC)
The comment was outdated, as getDeclarationIfExists has been
introduced in the meantime.

We also only use this in one place where neither the Tys.empty()
case nor the FT is relevant, so just include the call to
getDeclarationIfExists().
2025-06-23 15:31:17 +02:00
Shilei Tian
be7e4113c8
[NFC] Add comment to describe the intention use of newly added avail-extern-gv-in-addrspace-to-local (#144911) 2025-06-20 18:55:41 -04:00
Tianle Liu
6001a8bb94
[WholeProgramDevirt] Add check for AvailableExternal and give up icall.branch.funnel (#143468)
When a customer class inherits from a libc++ class, and is built with
"-flto  -fwhole-program-vtables -static-libstdc++ \
-Wl,-plugin-opt=-whole-program-visibility", the libc++ class's vtable is
available_externally, meanwhile the customer class vtable is private.
And
both of them are !vcall_visibility == Linkage Unit.
In this case, icall.branch.funnel might be generated.

But the icall.branch.funnel would cause crash in LowerTypeTests because
available_externally Global_Object's GlobalTypeMember would not be
saved and finally leads to a NULL GlobalTypeMember which causes a crash.
Even saving the available_externally GO's GlobalTypeMember so that it is
not NULL to avoid the crash in LowerTypeTests, it still will crash in
SelectionDAGBuilder or Verifier, because operands linkage type
consistency
check of icall.branch.funnel can not pass.

So any one of available externally vtable would stop to generate
icall.branch.funnel.
This patch fixes FullLTO mode and split-LTO-unit ThinLTO mode.
2025-06-20 08:01:32 +08:00
Nikita Popov
f4db14229c [SCCP] Move logic for removing ssa.copy into Solver (NFC)
So it can be reused between IPSCCP and SCCP.

Make the implementation a bit more efficient by only lookup the
PredicateInfo once.
2025-06-19 17:28:39 +02:00
zGoldthorpe
00ae89a1cb
Revert "[IPO] Added attributor for identifying invariant loads" (#144808)
Reverts llvm/llvm-project#141800

The implementation critically misunderstands the `AAMemoryBehavior`
attributor, which it relies on heavily.

@shiltian, since I do not have commit permissions.
2025-06-18 18:35:01 -04:00
Craig Topper
255b55c602
[GlobalOpt] Use cast instead of dyn_cast. NFC (#144634)
The dyn_cast was not checked for null, and the cast is guaranteed to
succeed by an earlier check.
2025-06-18 01:35:56 -07:00
Peter Collingbourne
9265b1f0cf
LowerTypeTests: Use jump table entry type as value type of jump table alias.
The motivation for this is that it causes the jump table entry's symbol
to have an st_size equal to the jump table entry size, instead of being
equal to the size of the entire jump table, which is incorrect and can
lead to unexpected behavior in binary analysis tools that rely on the
size field such as Bloaty.

Reviewers: fmayer

Reviewed By: fmayer

Pull Request: https://github.com/llvm/llvm-project/pull/144462
2025-06-17 18:15:06 -07:00
Shilei Tian
15482c83aa
[ElimAvailExtern] Add an option to allow to convert global variables in a specified address space to local (#144287)
Currently, the `EliminateAvailableExternallyPass` only converts certain
available externally functions to local if `avail-extern-to-local` is
set or in
contextual profiling mode. For global variables, it only drops their
initializers.

This PR adds an option to allow the pass to convert global variables in
a
specified address space to local. The motivation for this change is to
correctly
support lowering of LDS variables (`__shared__` variables, in more
generic
terminology) when ThinLTO is enabled for AMDGPU.

A `__shared__` variable is lowered to a hidden global variable in a
particular
address space by the frontend, which is roughly same as a `static` local
variable. To properly lower it in the backend, the compiler needs to
check all
its uses. Enabling ThinLTO currently breaks this when a function
containing a
`__shared__` variable is imported from another module. Even though the
global
variable is imported along with its associated function, and the
function is
privatized by the `EliminateAvailableExternallyPass`, the global
variable itself
is not.

It's safe to privatize such global variables, because they're _local_ to
their
associated functions. If the function itself is privatized, its
associated
global variables should also be privatized accordingly.
2025-06-17 19:58:24 -04:00
Slava Zakharin
bec9ac2daf
[llvm] Lower latency bonus threshold in function specialization. (#143954)
Related to #143219.

Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.

While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.

I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.

It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.

With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.

With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.

This threshold has been changed before also due to improved
alias information:
2fb51fba8c (diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd)

Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.
2025-06-17 16:13:42 -07:00
Jeremy Morse
14286244f1 Follow up to 9eb0020555, squelch unused variable warning
It turns out that this now-deleted debug-intrinsic code was the only use of
CI.
2025-06-17 16:24:12 +01:00
Jeremy Morse
9eb0020555
[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.

This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17 15:55:14 +01:00
PiJoules
964888d01f
[llvm][CFI] Ensure COFF comdat renaming applies for imported functions (#143421)
I ran into the same issue as
https://github.com/llvm/llvm-project/pull/139962 regarding the comdat
corresponding to a renamed key function but for thinlto. My last patch
had not considered the thinlto case, so this applies the same fix for
imported functions.
2025-06-16 16:24:45 -07:00
zGoldthorpe
25dcd231bf
[IPO] Added attributor for identifying invariant loads (#141800)
The attributor conservatively marks pointers whose loads are eligible to
be marked as `!invariant.load`.
It does so by identifying:
1. Pointers marked `noalias` and `readonly`
2. Pointers whose underlying objects are all eligible for invariant
loads.

The attributor then manifests this attribute at non-atomic non-volatile
load instructions.
2025-06-16 11:16:47 -05:00
Nikita Popov
3824a2dbce [MemoryBuiltins] Support allocas in getInitialValueOfAllocation (NFC) 2025-06-16 11:52:16 +02:00
Peter Collingbourne
a89df72ec0
WholeProgramDevirt: Fix importing in llvm.type.checked.load case.
We were clearing SummaryTypeCheckedLoadUsers to prevent devirtualized
llvm.type.checked.load calls from being converted to llvm.type.test,
which meant that AddCalls would not see them in the list of
callsites and they would not get imported. Fix that by not clearing
SummaryTypeCheckedLoadUsers so that the list survives to AddCalls and
using AllCallSitesDevirted to control whether to convert them instead.

Reviewers: teresajohnson

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/144019
2025-06-13 13:30:18 -07:00
David Spickett
36ac72f4e3 [llvm][MemProf] Fix unused variable warning in release build
g++-13 warned that:
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1645:8: warning: variable ‘PrevIterCreatedNode’ set but not used [-Wunused-but-set-variable]
 1645 |   bool PrevIterCreatedNode = false;
      |        ^~~~~~~~~~~~~~~~~~~

When asserts were not enabled.
2025-06-12 12:52:54 +00:00
Jeremy Morse
97ac6483aa
[DebugInfo][RemoveDIs] Delete debug-info-format flag (#143746)
This flag was used to let us incrementally introduce debug records
into LLVM, however everything is now using records. It serves no
purpose now, so delete it.
2025-06-12 11:51:58 +01:00
Stephen Tozer
aa8a1fa6f5
[DLCov][NFC] Annotate intentionally-blank DebugLocs in existing code (#136192)
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.

Some notes for reviewers:

- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
2025-06-11 17:42:10 +01:00
Jeremy Morse
354cfba520
[DebugInfo][RemoveDIs] Remove scoped-dbg-format-setter (#143450)
This was a utility for flipping between intrinsic and debug record mode
-- we don't need it any more. The "IsNewDbgInfoFormat" should be true
everywhere.
2025-06-11 11:23:24 +01:00