239 Commits

Author SHA1 Message Date
Hans Wennborg
3b81be803f
WholeProgramDevirt: Import/export the CVP byte directly in the summary (#188979)
rather than using absolute symbol constants on ELF/x86.

This leads to better codegen as the absolute symbol constants were not
resolved until link time (see bug for example).

Fixes #188470
2026-04-02 09:28:32 +02:00
Alexis Engelke
efcd3b6108
[IPO][InstCombine][Vectorize][NFCI] Drop uses of BranchInst (#186596)
Refactor remaining parts of Transforms apart from Scalar and Utils.
2026-03-14 17:49:00 +00:00
yonghong-song
3e05ab6322
[ThinLTO] Reduce the number of renaming due to promotions (#183793)
Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).

It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.

In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.

I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.

If some native objects are not participating with LTO, name promotions
have to be done to avoid potential linker issues. So the current
implementation cannot be on by default. But in certain cases, e.g., linux kernel
build, people can enable lld flag --lto-whole-program-visibility to reduce the
number of functions like foo.llvm.().

For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a
few other rare cases, reducing the number of renaming due to promotion,
is not implemented as lld flag '-lto-whole-program-visibility' is not
supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull
request only supports llvm-lto2 style workflow.

The feature is off by default. To enable the future, lld flag
'-lto-whole-program-visibility'  and llvm flag
'-always-rename-promoted-locals=false' are needed.

The link [3] has more context for the pull request discussions.

[1] https://lpc.events/event/19/contributions/2212
[2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
[3] https://github.com/llvm/llvm-project/pull/178587
2026-02-28 12:44:25 -08:00
yonghong-song
cd50a3074b
Revert "[ThinLTO] Reduce the number of renaming due to promotions (#178587)" (#183782)
There is a conflict with existing code. See
  https://github.com/llvm/llvm-project/pull/178587
Revert and resolve the conflict and then will submit later.
2026-02-27 10:04:30 -08:00
yonghong-song
975dba2863
[ThinLTO] Reduce the number of renaming due to promotions (#178587)
Currently for thin-lto, the imported static global values (functions,
variables, etc) will be promoted/renamed from e.g., foo() to
foo.llvm.<hash>(). Such a renaming caused difficulties in live patching
since function name is changed ([1]).

It is possible that some global value names have to be promoted to avoid
name collision and linker failure. But in practice, majority of name
promotions can be avoided.

In [2], the suggestion is that thin-lto pre-link decides whether
a particular global value needs name promotion or not. If yes, later on
in thinBackend() the name will be promoted.

I compiled a particular linux kernel version (latest bpf-next tree)
and found 1216 global values with suffix .llvm.<hash>. With this patch,
the number of promoted functions is 2, 98% reduction from the
original kernel build.

If some native objects are not participating with LTO, name promotions
have to be done to avoid potential linker issues. So the current
implementation cannot be on by default. But in certain cases, e.g., linux kernel
build, people can enable lld flag --lto-whole-program-visibility to reduce the
number of functions like foo.llvm.<hash>().

For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a
few other rare cases, reducing the number of renaming due to promotion,
is not implemented as lld flag '-lto-whole-program-visibility' is not supported
in ThinLTOCodeGenerator.cpp for now. In summary, this pull request
only supports llvm-lto2 style workflow.

  [1] https://lpc.events/event/19/contributions/2212
  [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400
2026-02-27 09:09:54 -08:00
Nikita Popov
b205396086
[IR] Add ConstantExpr::getPtrAdd() (#181365)
Add a ConstantExpr::getPtrAdd() API that creates a getelementptr i8
constant expression, similar to IRBuilder::CreatePtrAdd(). In the future
this will create a ptradd expression.
2026-02-16 11:10:15 +01:00
Teresa Johnson
b30971c4bb
[ThinLTO] Remove unused relative block frequency support (#177215)
This removes most of the handling of the relative block frequency
support added in 2018 in c73cec84c99e5a63dca961fef67998a677c53a3c, which
was disabled by default and never utilized in the thin link as expected.

Support for reading old Bitcode containing the record is maintained as
required for backwards compatibility requirements, as is the support for
parsing old LLVM assembly containing that information. Tests ensure that
this backwards compatibility is maintained.

This came up in the context of redundant BFI/DT computations which
existed largely for the purpose of computing this information
and are being addressed in PR176646.
2026-01-21 11:39:57 -08:00
Nikita Popov
42a47bf18a [WPD] Avoid implicit truncation when creating full set
Use the bit mask for the type instead of `~0`, so that we don't
rely on implicit truncation of the top bits.
2025-12-15 15:44:03 +01:00
Nikita Popov
89c37fee25 [WPD] Use getSigned() for offset
This offset is a signed int64_t which can take negative values.
2025-12-12 11:15:44 +01:00
Hassnaa Hamdi
9e7ce77573
[Clang]: Support opt-in speculative devirtualization (#159685)
This patch adds Clang support for speculative devirtualization and
integrates the related pass into the pass pipeline.
It's building on the LLVM backend implementation from PR #159048.
Speculative devirtualization transforms an indirect call (the virtual
function) to a guarded direct call.
It is guarded by a comparison of the virtual function pointer to the
expected target.
This optimization is still safe without LTO because it doesn't do direct
calls, it's conditional according to the function ptr.
This optimization:
- Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively`
- Works in non-LTO mode
- Handles publicly-visible objects.
- Uses guarded devirtualization with fallback to indirect calls when the
speculation is incorrect.

For this C++ example:
```
class Base {
public:
    __attribute__((noinline))
    virtual void virtual_function1() { asm volatile("NOP"); }
    virtual void virtual_function2() { asm volatile("NOP"); }
};
class Derived : public Base {
public:
    void virtual_function2() override { asm volatile("NOP"); }
};
__attribute__((noinline))
void foo(Base *BV) {
    BV->virtual_function1();
}
void bar() {
    Base *b = new Derived();
    foo(b);
}
```
Here is the IR without enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !6
  %0 = load ptr, ptr %vtable, align 8
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  ret void
}
```
IR after enabling speculative devirtualization:
```
define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 {
entry:
  %vtable = load ptr, ptr %BV, align 8, !tbaa !12
  %0 = load ptr, ptr %vtable, align 8
  %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
  br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15

if.true.direct_targ:                              ; preds = %entry
  tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.false.orig_indirect:                           ; preds = %entry
  tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
  br label %if.end.icp

if.end.icp:                                       ; preds = %if.false.orig_indirect, %if.true.direct_targ
  ret void
}
```
2025-12-07 14:24:30 +00:00
Aiden Grossman
979a987d3a
[WPD] Change Devirt Cutoff to use DebugCounter (#170009)
This removes the presence of global state from within the pass which is
blocking some efforts around test daemonization and is not good design
practice in general for LLVM. See

https://discourse.llvm.org/t/rfc-reducing-process-creation-overhead-in-llvm-regression-tests/88612/11
for more discussion.

This patch replaces the usage of global state with a DebugCounter, which
helps fix the global state problem and also increases the flexibility of
the option as now an explicit range can be passed.

Co-authored-by: Mingming Liu <mingmingl@google.com>
2025-12-01 15:34:54 +00:00
Teresa Johnson
a909ec64dc
[ThinLTO][WPD] LICM a loop invariant check (#164862)
Move a loop invariant check out of the innermost loop. I measured a
small but consistent thin link speedup from this change for a large
target (0.75%).
2025-10-23 12:02:27 -07:00
Teresa Johnson
f11899f647
[ThinLTO][WPD] Simplify check for local summary for efficiency (NFCI) (#164859)
Use the new HasLocal flag to avoid looking through all summaries to see
if there is a local copy.
2025-10-23 12:02:05 -07:00
Teresa Johnson
9c7b3047c4
[WPD] Reduce ThinLTO link time by avoiding unnecessary summary analysis (#164046)
We are scanning through every single definition of a vtable across all
translation units which is unnecessary in most cases.

If this is a local, we want to make sure there isn't another local with
the same GUID due to it having the same relative path. However, we were
always scanning through every single summary in all cases.

We can now check the new HasLocal flag added in PR164647 ahead of the
loop,
instead of checking on every iteration.

This cut down a large thin link by around 6%, which was over half the
time it spent in WPD.

Note that we previously took the last conforming vtable summary, and now
we use the first. This caused a test difference in one somewhat
contrived test for vtables in comdats.
2025-10-23 07:16:45 -07:00
Hassnaa Hamdi
8b2aba2e20
[WPD]: Enable speculative devirtualizatoin. (#159048)
This patch implements the speculative devirtualization feature in the
LLVM backend.
It handles the case of single implementation devirtualization where
there is a single possible callee of a virtual function.
- Add cl::opt 'devirtualize-speculatively' to enable it.
- Flag is disabled by default.
- It works regardless of the visibility of the object.
- Not enabled for LTO for now.
2025-10-22 12:16:11 +01:00
Teresa Johnson
683e2bf059
[ThinLTO] Make SummaryList private (NFC) (#164355)
In preparation for a follow on change that will require checking every
time a new summary is added to the SummaryList for a GUID, make the
SummaryList private and require all accesses to go through one of two
new interfaces. Most changes are to access the list via the read only
getSummaryList() method, and the few that add new summaries (e.g. while
building the combined summary) use the new addSummary() method.
2025-10-21 06:53:40 -07:00
Rahul Joshi
bea786d128
[NFC][LLVM] Use namespace qualifier to define DenseMapInfo specializations (#162683)
Use `llvm::DenseMapInfo` to define `DenseMapInfo` specializations
instead of surrounding it with `namespace llvm {}`.
2025-10-10 05:23:44 -07:00
Nicolai Hähnle
11a4b2d950
Cleanup the LLVM exported symbols namespace (#161240)
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.

While doing this, I noticed some other variables in the global namespace
and moved them as well.
2025-10-01 15:32:07 -07:00
Mircea Trofin
f2d827c444
[profcheck] Require unknown metadata have an origin parameter (#157594)
Rather than passes using `!prof = !{!”unknown”}`​for cases where don’t have enough information to emit profile values, this patch captures the pass (or some other information) that can help diagnostics - i.e. `!{!”unknown”, !”some-pass-name”}`​.

For example, suppose we emitted a `select`​ with the unknown metadata, and, later, end up needing to lower that to a conditional branch. If we observe (via sample profiling, for example) that the branch is biased and would have benefitted from a valid profile, the extra information can help speed up debugging.

We can also (in a subsequent pass) generate optimization remarks about such lowered selects, with a similar aim - identify patterns lowering to `select`​ that may be worth some extra investment in extracting a more precise profile.
2025-09-10 15:34:35 -07:00
Mircea Trofin
c55708c65c
[WPD] set the branch funnel function entry count (#155657)
We can compute the entry count of branch funnel functions, and potentially avoid them being deemed cold (also, keeping profile information coherent is always good for performance)

Issue #147390
2025-09-08 16:41:33 -07:00
Alexandre Ganea
5cda2424c8
[LLD][COFF] Add more --time-trace tags for ThinLTO linking (#156471)
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.

After PR, linking `clang.exe`:

<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>

Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).

<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
2025-09-05 15:28:19 -04:00
Mircea Trofin
329e706f8a
[NFC][WPD] Pass the module analysis manager instead of lambdas (#155338)
Easier to evolve - if we need more analyses, it becomes clumsy to keep passing around lambdas.
2025-08-26 17:01:02 -07:00
Mircea Trofin
9dfb04d225
[NFC][WPD] code style fixes (#155454) 2025-08-26 12:37:11 -07:00
Tianle Liu
6001a8bb94
[WholeProgramDevirt] Add check for AvailableExternal and give up icall.branch.funnel (#143468)
When a customer class inherits from a libc++ class, and is built with
"-flto  -fwhole-program-vtables -static-libstdc++ \
-Wl,-plugin-opt=-whole-program-visibility", the libc++ class's vtable is
available_externally, meanwhile the customer class vtable is private.
And
both of them are !vcall_visibility == Linkage Unit.
In this case, icall.branch.funnel might be generated.

But the icall.branch.funnel would cause crash in LowerTypeTests because
available_externally Global_Object's GlobalTypeMember would not be
saved and finally leads to a NULL GlobalTypeMember which causes a crash.
Even saving the available_externally GO's GlobalTypeMember so that it is
not NULL to avoid the crash in LowerTypeTests, it still will crash in
SelectionDAGBuilder or Verifier, because operands linkage type
consistency
check of icall.branch.funnel can not pass.

So any one of available externally vtable would stop to generate
icall.branch.funnel.
This patch fixes FullLTO mode and split-LTO-unit ThinLTO mode.
2025-06-20 08:01:32 +08:00
Peter Collingbourne
a89df72ec0
WholeProgramDevirt: Fix importing in llvm.type.checked.load case.
We were clearing SummaryTypeCheckedLoadUsers to prevent devirtualized
llvm.type.checked.load calls from being converted to llvm.type.test,
which meant that AddCalls would not see them in the list of
callsites and they would not get imported. Fix that by not clearing
SummaryTypeCheckedLoadUsers so that the list survives to AddCalls and
using AllCallSitesDevirted to control whether to convert them instead.

Reviewers: teresajohnson

Reviewed By: teresajohnson

Pull Request: https://github.com/llvm/llvm-project/pull/144019
2025-06-13 13:30:18 -07:00
PiJoules
f32f048719
[llvm] Use ABI instead of preferred alignment for const prop checks (#142500)
We'd hit an assertion checking proper alignment for an i8 when building
chromium because we used the prefered alignment (which is 4 bytes)
instead of the ABI alignment (which is 1 byte). The ABI alignment should
be used because that's the actual alignment needed to load a constant
from the vtable.

This also updates the two `virtual-const-prop-small-alignment-*` to
explicitly give ABI alignments for i64s.
2025-06-04 10:57:59 -07:00
Peter Collingbourne
645f0e6723
IR: Make Module::getOrInsertGlobal() return a GlobalVariable.
After pointer element types were removed this function can only return
a GlobalVariable, so reflect that in the type and comments and clean
up callers.

Reviewers: nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/141323
2025-05-27 12:23:12 -07:00
PiJoules
2661e995ce
[llvm] Ensure propagated constants in the vtable are aligned (#136630)
It's possible for virtual constant propagation in whole program
devirtualization to create unaligned loads. We originally saw this with
4-byte aligned relative vtables where we could store 8-byte values
before/after the vtable. But since the vtable is 4-byte aligned and we
unconditionally do an 8-byte load, we can't guarantee that the stored
constant will always be aligned to 8 bytes. We can also see this with
normal vtables whenever a 1-byte char is stored in the vtable because
the offset calculation for the GEP doesn't take into account the
original vtable alignment.

This patch introduces two changes to virtual constant propagation:
1. Do not propagate constants whose preferred alignment is larger than
the vtable alignment. This is required because if the constants are
stored in the vtable, we can only guarantee the constant will be stored
at an address at most aligned to the vtable's alignment.
2. Round up the offset used in the GEP before the load to ensure it's at
an address suitably aligned such that we can load from it.

This patch updates tests to reflect this alignment change and adds some
cases for relative vtables.
2025-05-15 11:52:25 -07:00
Owen Rodley
d3d856ad84
Clean up external users of GlobalValue::getGUID(StringRef) (#129644)
See https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
for context.

This is a non-functional change which just changes the interface of
GlobalValue, in preparation for future functional changes. This part
touches a fair few users, so is split out for ease of review. Future
changes to the GlobalValue implementation can then be focused purely on
that class.

This does the following:

* Rename GlobalValue::getGUID(StringRef) to
  getGUIDAssumingExternalLinkage. This is simply making explicit at the
  callsite what is currently implicit.
* Where possible, migrate users to directly calling getGUID on a
  GlobalValue instance.
* Otherwise, where possible, have them call the newly renamed
  getGUIDAssumingExternalLinkage, to make the assumption explicit.


There are a few cases where neither of the above are possible, as the
caller saves and reconstructs the necessary information to compute the
GUID themselves. We want to migrate these callers eventually, but for
this first step we leave them be.
2025-04-28 11:09:43 +10:00
PiJoules
0f8c075f7c
[llvm] Match llvm.type.checked.load.relative semantics to llvm.load.r… (#129583)
…elative

The semantics of `llvm.type.checked.load.relative` seem to be a little
different from that of `llvm.load.relative`. It looks like the semantics
for `llvm.type.checked.load.relative` is `ptr + offset + *(ptr +
offset)` whereas the semantics for `llvm.load.relative` is `ptr + *(ptr
+ offset)`. That is, the offset for the former is added to the offset
address whereas the later has the offset added to the original pointer.

It really feels like the checked intrinsic was meant to match the
semantics of the non-checked intrinsic, but I think for all cases the
checked intrinsic is used (swift being the only use I know of), the
calculation just happens to be the same because swift always uses an
offset of zero. Likewise, all llvm tests for this intrinsic happen to
use an offset of zero.

Relative vtables in clang happens to be the first time where we're using
this intrinsic and using it with non-zero values. This updates the
semantics of the checked intrinsic to match the non-checked one.
Effectively this shouldn't change any codegen by any users of this since
all current users seem to use a zero offset.

This PR also updates some tests with non-zero offsets.
2025-03-13 14:50:41 -07:00
Mingming Liu
e1aa1e43de
[WPD]Provide branch weight for checking mode. (#124084)
Checking mode aims to help diagnose and confirm undefined behavior. In
most cases, source code don't cast pointers between unrelated types for
virtual calls, so we expect direct calls in the frequent branch and
debug trap in the unlikely branch.

This way, the overhead of checking mode is not higher than an indirect
call promotion for a hot callsite as long as the callsite doesn't run the debug trap
branch.
2025-01-23 07:52:30 -08:00
Mats Jun Larsen
d7c14c8f97
[IR] Replace of PointerType::getUnqual(Type) with opaque version (NFC) (#123909)
Follow up to https://github.com/llvm/llvm-project/issues/123569
2025-01-23 18:23:05 +09:00
Mingming Liu
47cc9db797
[WPD]Regard unreachable function as a possible devirtualizable target (#115668)
https://reviews.llvm.org/D115492 skips unreachable functions and
potentially allows more static de-virtualizations. The motivation is to
ignore virtual deleting destructor of abstract class (e.g.,
`Base::~Base()` in https://gcc.godbolt.org/z/dWMsdT9Kz).
* Note WPD already handles most pure virtual functions (like `Base::x()`
in the godbolt example above), which becomes a `__cxa_pure_virtual` in
the vtable slot.

This PR proposes to undo the change, because it turns out there are
other unreachable functions that a general program wants to run and fail
intentionally, with `LOG(FATAL)` or `CHECK` [1] for example. While many
real-world applications are encouraged to check-fail sparingly, they are
allowed to do so on critical errors (e.g., misconfiguration or bug is
detected during server startup).
* Implementation-wise, this PR keeps the one-bit 'unreachable' state in
bitcode and updates WPD analysis.
 
https://gcc.godbolt.org/z/T1aMhczYr is a minimum reproducible example
extracted from unit test. `Base::func` is a one-liner of `LOG(FATAL) <<
"message"`, and lowered to one basic block ending with `unreachable`. A
real-world program is _allowed_ to invoke Base::func to terminate the
program as a way to report errors (in server initialization stage for
example), even if errors on the serving path should be handled more
gracefully.

[1] https://abseil.io/docs/cpp/guides/logging#CHECK and
https://abseil.io/docs/cpp/guides/logging#configuration-and-flags
2024-11-13 11:28:36 -08:00
Kazu Hirata
98ea1a81a2
[IPO] Remove unused includes (NFC) (#114716)
Identified with misc-include-cleaner.
2024-11-03 13:48:55 -08:00
Mingming Liu
60944177b8
[WPD][ThinLTO]Add cutoff option for WPD (#113383)
This option applies for _import_ WPD (i.e., when `DevirtModule` pass
de-virtualizes according to an imported summary, in ThinLTO backend
pipeline). It's meant for debugging (e.g., bisection).
2024-10-23 23:47:27 -07:00
Rahul Joshi
6924fc0326
[LLVM] Add Intrinsic::getDeclarationIfExists (#112428)
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
2024-10-16 07:21:10 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Jay Foad
e03f427196
[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
2024-09-19 16:16:38 +01:00
Youngsuk Kim
2051736f7b [llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC)
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.

Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-06-30 09:03:29 -05:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Nikita Popov
fba84ecc15 [WPD] Directly create geteleementptr inbounds (NFCI)
We know that this GEP is inbounds, so make it explicit. NFCI
because constant expression construction already infers this.
2024-05-29 15:43:11 +02:00
Vitaly Buka
c60aa430dc
[NFCI][sanitizers][metadata] Exctract create{Unlikely,Likely}BranchWeights (#89464)
We have a lot of repeated code with random constants.
Particular values are not important, the one just needs to be
bigger then another.

UR_NONTAKEN_WEIGHT is selected as it's the most common one.
2024-04-19 17:03:23 -07:00
Jeremy Morse
2fe81edef6 [NFC][RemoveDIs] Insert instruction using iterators in Transforms/
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.

There are two general flavours of update:
 * Almost all call-sites just call getIterator on an instruction
 * Several make use of an existing iterator (scenarios where the code is
   actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.

Noteworthy changes:
 * FindInsertedValue now takes an optional iterator rather than an
   instruction pointer, as we need to always insert with iterators,
 * I've added a few iterator-taking versions of some value-tracking and
   DomTree methods -- they just unwrap the iterator. These are purely
   convenience methods to avoid extra syntax in some passes.
 * A few calls to getNextNode become std::next instead (to keep in the
   theme of using iterators for positions),
 * SeparateConstOffsetFromGEP has it's insertion-position field changed.
   Noteworthy because it's not a purely localised spelling change.

All this should be NFC.
2024-03-05 15:12:22 +00:00
Mingming Liu
8ea858b967
[CallPromotionUtil] See through function alias when devirtualizing a virtual call on an alloca. (#80736)
- Extract utility function from
`DevirtModule::tryFindVirtualCallTargets` [1], which sees through an alias to a function. Call this utility function in
the WPD callsite.
- For type profiling work, this helper function will be used by indirect-call-promotion pass to find the function pointer at a specified vtable offset (an example in [2])

[1] b99163fe8f/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp (L1069-L1082)
[2] 77a0ef12de/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (L347)
2024-02-06 09:22:34 -08:00
Nikita Popov
6c2fbc3a68
[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582)
This abstracts over the common pattern of creating a gep with i8 element
type.
2024-01-12 14:21:21 +01:00
Teresa Johnson
88fbc4d3df
[ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.

The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.

I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
2023-12-06 08:41:44 -08:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Nikita Popov
c4c0ac10f1 [IPO] Remove unnecessary bitcasts (NFC) 2023-11-06 16:49:45 +01:00
Kazu Hirata
9c5a5a421d [llvm] Stop including llvm/ADT/iterator_range.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 15:41:18 -07:00
modimo
272bd6f9cc [WPD][LLD] Add option to validate RTTI is enabled on all native types and prevent devirtualization on types with native RTTI
Discussion about this approach: https://discourse.llvm.org/t/rfc-safer-whole-program-class-hierarchy-analysis/65144/18

When enabling WPD in an environment where native binaries are present, types we want to optimize can be derived from inside these native files and devirtualizing them can lead to correctness issues. RTTI can be used as a way to determine all such types in native files and exclude them from WPD providing a safe checked way to enable WPD.

The approach is:
1. In the linker, identify if RTTI is available for all native types. If not, under `--lto-validate-all-vtables-have-type-infos` `--lto-whole-program-visibility` is automatically disabled. This is done by examining all .symtab symbols in object files and .dynsym symbols in DSOs for vtable (_ZTV) and typeinfo (_ZTI) symbols and ensuring there's always a match for every vtable symbol.
2. During thinlink, if `--lto-validate-all-vtables-have-type-infos` is set and RTTI is available for all native types, identify all typename (_ZTS) symbols via their corresponding typeinfo (_ZTI) symbols that are used natively or outside of our summary and exclude them from WPD.

Testing:
ninja check-all
large Meta service that uses boost, glog and libstdc++.so runs successfully with WPD via --lto-whole-program-visibility. Previously, native types in boost caused incorrect devirtualization that led to crashes.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D155659
2023-09-18 15:51:49 -07:00