https://reviews.llvm.org/D115492 skips unreachable functions and
potentially allows more static de-virtualizations. The motivation is to
ignore virtual deleting destructor of abstract class (e.g.,
`Base::~Base()` in https://gcc.godbolt.org/z/dWMsdT9Kz).
* Note WPD already handles most pure virtual functions (like `Base::x()`
in the godbolt example above), which becomes a `__cxa_pure_virtual` in
the vtable slot.
This PR proposes to undo the change, because it turns out there are
other unreachable functions that a general program wants to run and fail
intentionally, with `LOG(FATAL)` or `CHECK` [1] for example. While many
real-world applications are encouraged to check-fail sparingly, they are
allowed to do so on critical errors (e.g., misconfiguration or bug is
detected during server startup).
* Implementation-wise, this PR keeps the one-bit 'unreachable' state in
bitcode and updates WPD analysis.
https://gcc.godbolt.org/z/T1aMhczYr is a minimum reproducible example
extracted from unit test. `Base::func` is a one-liner of `LOG(FATAL) <<
"message"`, and lowered to one basic block ending with `unreachable`. A
real-world program is _allowed_ to invoke Base::func to terminate the
program as a way to report errors (in server initialization stage for
example), even if errors on the serving path should be handled more
gracefully.
[1] https://abseil.io/docs/cpp/guides/logging#CHECK and
https://abseil.io/docs/cpp/guides/logging#configuration-and-flags
This option applies for _import_ WPD (i.e., when `DevirtModule` pass
de-virtualizes according to an imported summary, in ThinLTO backend
pipeline). It's meant for debugging (e.g., bisection).
Also reapply "[llvm] Teach whole program devirtualization about
relative vtables"
This reverts commit 1c604a9780fcfe92a99d539913553f0835b81de3 and
474f5efebed24547e76d022f0c5ffcc9db97ce6f.
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.
This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.
In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.
- The next change is https://github.com/llvm/llvm-project/pull/87600
[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
This adds a type_checked_load_relative intrinsic whose semantics it is to
load a relative function pointer.
A relative function pointer is a pointer to a 32bit value that when
added to its address yields the address of the function.
Differential Revision: https://reviews.llvm.org/D143204
We can't have a call with a constant target with a ptrauth bundle. Remove the
ptrauth bundle operand in such a case
rdar://105696396
Differential Revision: https://reviews.llvm.org/D144581
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).
Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.
Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.
Differential Revision: https://reviews.llvm.org/D153655
It's possible to segfault in `DevirtModule::applyICallBranchFunnel` when
attempting to call `getCaller` on a call base that was erased in a prior
iteration. This can occur when attempting to find devirtualizable calls
via `findDevirtualizableCallsForTypeTest` if the vtable passed to
llvm.type.test is a global and not a local. The function works by taking
the first argument of the llvm.type.test call (which is a vtable),
iterating through all uses of it, and adding any relevant all uses that
are calls associated with that intrinsic call to a vector. For most
cases where the vtable is actually a *local*, this wouldn't be an issue.
Take for example:
```
define i32 @fn(ptr %obj) #0 {
%vtable = load ptr, ptr %obj
%p = call i1 @llvm.type.test(ptr %vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr %vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
`findDevirtualizableCallsForTypeTest` will check the call base ` %result
= call i32 %fptr(ptr %obj, i32 1)`, find that it is associated with a
virtualizable call from `%vtable`, find all loads for `%vtable`, and add
any instances those load results are called into a vector. Now consider
the case where instead `%vtable` was the global itself rather than a
local:
```
define i32 @fn(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
`findDevirtualizableCallsForTypeTest` should work normally and add one
unique call instance to a vector. However, if there are multiple
instances where this same global is used for llvm.type.test, like with:
```
define i32 @fn(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
define i32 @fn2(ptr %obj) #0 {
%p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
call void @llvm.assume(i1 %p)
%fptr = load ptr, ptr @vtable
%result = call i32 %fptr(ptr %obj, i32 1)
ret i32 %result
}
```
Then each call base `%result = call i32 %fptr(ptr %obj, i32 1)` will be
added to the vector twice. This is because for either call base `%result
= call i32 %fptr(ptr %obj, i32 1) `, we determine it is associated with
a virtualizable call from `@vtable`, and then we iterate through all the
uses of `@vtable`, which is used across multiple functions. So when
scanning the first `%result = call i32 %fptr(ptr %obj, i32 1)`, then
both call bases will be added to the vector, but when scanning the
second one, both call bases are added again, resulting in duplicate call
bases in the CSInfo.CallSites vector.
Note this is actually accounted for in every other instance WPD iterates
over CallSites. What everything else does is actually add the call base
to the `OptimizedCalls` set and just check if it's already in the set.
We can't reuse that particular set since it serves a different purpose
marking which calls where devirtualized which `applyICallBranchFunnel`
explicitly says it doesn't. For this fix, we can just account for
duplicates with a map and do the actual replacements afterwards by
iterating over the map.
Differential Revision: https://reviews.llvm.org/D146267
Prior to this patch, WPD was not acting on relative-vtables in C++. This
involves teaching WPD about these things:
- llvm.load.relative which is how relative-vtables are indexed (instead of GEP)
- dso_local_equivalent which is used in the vtable itself when taking the
offset between a virtual function and vtable
- Update llvm/test/ThinLTO/X86/devirt.ll to use opaque pointers and add
equivalent tests for RV
Differential Revision: https://reviews.llvm.org/D134320
Add statistics to count overall devirtualized targets as well as the
various types of devirtualizations applied at callsites.
Differential Revision: https://reviews.llvm.org/D123152
In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets.
Differential Revision: https://reviews.llvm.org/D116056
This updates transform test cases for
ADCE
AddDiscriminators
AggressiveInstCombine
AlignmentFromAssumptions
ArgumentPromotion
BDCE
CalledValuePropagation
DCE
Reg2Mem
WholeProgramDevirt
to use the -passes syntax when specifying the pipeline.
Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is
a deprecated feature) the updated test cases already used the new
pass manager, but they were using the legacy syntax when specifying
the passes to run. This patch can be seen as a step toward deprecating
that interface.
This patch also removes some redundant RUN lines. Here I am
referring to test cases that had multiple RUN lines verifying both
the legacy "-passname" syntax and the new "-passes=passname" syntax.
Since we switched the default pass manager to "new PM" both RUN lines
have verified the new PM version of the pass (more or less wasting
time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER
is set to "off". It is assumed that it is enough to run these tests
with the new pass manager now.
Differential Revision: https://reviews.llvm.org/D108472
Currently, LLParser will create a Function/GlobalVariable forward
reference based on the desired pointer type and then modify it when
it is declared. With opaque pointers, we generally do not know the
correct type to use until we see the declaration.
Solve this by creating the forward reference with a dummy type, and
then performing a RAUW with the correct Function/GlobalVariable when
it is declared. The approach is adopted from
b5b55963f6.
This results in a change to the use list order, which is why we see
test changes on some module passes that are not stable under use list
reordering.
Differential Revision: https://reviews.llvm.org/D104950
WPD currently assumes that there is a one to one correspondence between
type test assume sequences and virtual calls. However, with
-fstrict-vtable-pointers this may not be true. This ends up causing
crashes when we try to optimize a virtual call more than once (
applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()).
applySingleImplDevirt() actually didn't previous crash because it would
replace the devirtualized call with the same direct call. Adding an
assert that the call is indirect causes the corresponding test to crash
with the rest of the patch.
This makes Chrome successfully build with -fstrict-vtable-pointers + WPD.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104798
Using $ breaks demangling of the symbols. For example,
$ c++filt _Z3foov\$123
_Z3foov$123
This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.
Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:
$ c++filt _Z3foov.123
foo() [clone .123]
This is already done in some places; try to do it everywhere.
Differential revision: https://reviews.llvm.org/D97484
Imported functions and variable get the visibility from the module supplying the
definition. However, non-imported definitions do not get the visibility from
(ELF) the most constraining visibility among all modules (Mach-O) the visibility
of the prevailing definition.
This patch
* adds visibility bits to GlobalValueSummary::GVFlags
* computes the result visibility and propagates it to all definitions
Protected/hidden can imply dso_local which can enable some optimizations (this
is stronger than GVFlags::DSOLocal because the implied dso_local can be
leveraged for ELF -shared while default visibility dso_local has to be cleared
for ELF -shared).
Note: we don't have summaries for declarations, so for ELF if a declaration has
the most constraining visibility, the result visibility may not be that one.
Differential Revision: https://reviews.llvm.org/D92900
The legacy pass's default constructor sets UseCommandLine = true and
goes down a separate testing route. Match that in the NPM pass.
This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D88588
This restores commit 80d0a137a5aba6998fadb764f1e11cb901aae233, and the
follow on fix in 873c0d0786dcf22f4af39f65df824917f70f2170, with a new
fix for test failures after a 2-stage clang bootstrap, and a more robust
fix for the Chromium build failure that an earlier version partially
fixed. See also discussion on D75201.
Reviewers: evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
Summary:
Changes the type of the @__typeid_.*_unique_member imports we generate
for unique return value optimization from i8 to [0 x i8]. This
prevents assuming that these imports do not alias, such as when
two unique return values occur in the same vtable.
Fixes PR45393.
Reviewers: tejohnson, pcc
Reviewed By: pcc
Subscribers: aganea, hiraditya, rnk, george.burgess.iv, dblaikie, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77421
This reverts commit 80d0a137a5aba6998fadb764f1e11cb901aae233, and the
follow on fix in 873c0d0786dcf22f4af39f65df824917f70f2170. It is
causing test failures after a multi-stage clang bootstrap. See
discussion on D73242 and D75201.
This restores commit 748bb5a0f1964d20dfb3891b0948ab6c66236c70, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
This restores 59733525d37cf9ad88b5021b33ecdbaf2e18911c (D71913), along
with bot fix 19c76989bb505c3117730c47df85fd3800ea2767.
The bot failure should be fixed by D73418, committed as
af954e441a5170a75687699d91d85e0692929d43.
I also added a fix for non-x86 bot failures by requiring x86 in new test
lld/test/ELF/lto/devirt_vcall_vis_public.ll.
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.
Also modifies the parser to accept IR in that form for obvious reasons.
llvm-svn: 367755
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.
This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.
There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.
Differential Revision: https://reviews.llvm.org/D65031
llvm-svn: 366725
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
Summary:
In D49565/r337503, the type id record writing was fixed so that only
referenced type ids were emitted into each per-module index for ThinLTO
distributed builds. However, this still left an efficiency issue: each
per-module index checked all type ids for membership in the referenced
set, yielding O(M*N) performance (M indexes and N type ids).
Change the TypeIdMap in the summary to be indexed by GUID, to facilitate
correlating with type identifier GUIDs referenced in the function
summary TypeIdInfo structures. This allowed simplifying other
places where a map from type id GUID to type id map entry was previously
being used to aid this correlation.
Also fix AsmWriter code to handle the rare case of type id GUID
collision.
For a large internal application, this reduced the thin link time by
almost 15%.
Reviewers: pcc, vitalybuka
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51330
llvm-svn: 343021
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.
This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.
The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.
For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html
Differential Revision: https://reviews.llvm.org/D42453
llvm-svn: 327163
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.
The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).
llvm-svn: 322806
While updating clang tests for having clang set dso_local I noticed
that:
- There are *a lot* of tests to update.
- Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.
llvm-svn: 322317
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374