If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.
Differential Revision: https://reviews.llvm.org/D136740
This patch reorders the traversal of function call sites and function
formal parameters to:
* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.
* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D135968
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function
void f(int x, int y)
the call like `f(1, 2)` could be matched by either
void f.1(int x /* int y == 2 */);
or
void f.2(/* int x == 1, int y == 2 */);
The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.
This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.
Reviewed By: ChuanqiXu, labrinea
Differential Revision: https://reviews.llvm.org/D136180
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` . There are a few
issues with the implementation, though:
* even with the default, the pass will still specialise based on
floating-point literals
* even when it's enabled, the pass will specialise only for the `i1`
type (or `i2` if all of the possible 4 values occur, or `i3` if all
of the possible 8 values occur, etc)
The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.
This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:
* unknown or undef
* a constant
* a constant range with a single element
on the basis that specialisation is pointless for those cases.
Is also changes the criteria for picking up an actual argument to
specialise if the argument is:
* a LLVM IR constant
* has `constant` lattice value
has `constantrange` lattice value with a single element.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135893
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.
With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135867
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135862
Reapplying after the fix for volatile modelling in D135863.
-----
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
Per LangRef, volatile operations are allowed to access the location
of their pointer argument, plus inaccessible memory:
> Any volatile operation can have side effects, and any volatile
> operation can read and/or modify state which is not accessible
> via a regular load or store in this module.
> [...]
> The allowed side-effects for volatile accesses are limited. If
> a non-volatile store to a given address would be legal, a volatile
> operation may modify the memory at that address. A volatile
> operation may not modify any other memory accessible by the
> module being compiled. A volatile operation may not call any
> code in the current module.
FunctionAttrs currently does not model this and ends up marking
functions with volatile accesses on arguments as argmemonly,
even though they should be inaccessiblemem_or_argmemonly.
Differential Revision: https://reviews.llvm.org/D135863
This reverts commit 8ef3fd8d59ba0100bc6e83350ab1e978536aa531.
I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
The sanitizer bots turned green again after another change went in, i.e.
revert 26dd64ba9cfabe5474bb207f3b7099965f81fed7, so I don't think this
patch was causing the problems.
This reverts commit 233659c7ae9b83b64a9f739d340736bca39c3d2e.
I see some sanitizer build bot failures. Not sure if it is change
causing it, but let's see if a revert returns the bots to green...
LoopFlatten has been in the code base off by default for years, but this
enables it to run by default. Downstream this has been running for
years, so it has been exposed to quite some code. Then around the time
we switched to the NPM, several fixes went in related to updating the
MemorySSA state and we moved it to a loop pass manager, which both
helped preventing rerunning certain analysis passes, and thus helped a
bit with compile-times.
About compile-times, adding a pass isn't free, but this should see only
very minor increases. The pass is relatively simple and there shouldn't
be anything algorithmically expensive because all it does is looking at
inner/outer loops and it checks assumptions on loop increments and
indices. If we see increases, I expect this to mainly come from
invalidation of analysis info, and perhaps subsequent passes to trigger
and do more. Despite its simplicity/restrictions, it triggers in most
code-bases, which makes it worth to enable this by default.
Differential Revision: https://reviews.llvm.org/D109958
This reverts commit b05f5b90a12098660a4fc16da0b4d421ddfe14e2.
There are thread sanitizer buildbot failures in simple_stack.c.
I think that's because this ended up affecting the handling of
volatile accesses to allocas. Reverting for now.
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
We have to account for accesses to argument memory via captures.
I don't think there's any way to make this produce incorrect
results right now (because as soon as "other" is set, we lose the
ability to infer argmemonly), but this avoids incorrect results
once we have more precise representation.
The code for inferring memory attributes on arguments claims that
inalloca/preallocated arguments are always clobbered:
d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)
However, we would still infer memory attributes for the whole
function without taking this into account, so we could still end
up inferring readnone for the function. This adds an argument
clobber if there are any inalloca/preallocated arguments.
Differential Revision: https://reviews.llvm.org/D135783
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
This fixes two problems.
(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```
(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
This reverts commit 4fbe33593c8132fdc48647c06f4d1455bfff1c88. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).
When determining the initial value of the object, use the constant
folding API to load a given type at a given offset in the global
initializer. This makes it work for cases where the load doesn't
directly correspond to an aggregate member.
Differential Revision: https://reviews.llvm.org/D135435
See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a
non-prevailing COMDAT, it remains defined while its leader has been made
available_externally. This violates the COMDAT rule that its members must be
retained or discarded as a unit.
To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)
Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.
```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
bar();
foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
foo();
}
eof
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
```
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D135427
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses. Even
may-write uses are not too bad with reachability in place. Capturing
is the problem as we loose track of update sides.
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
The previous version of the patch would incorrect convert an
existing argmemonly attribute into an inaccessiblemem_or_argmemonly
attribute.
-----
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.
Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.
Differential Revision: https://reviews.llvm.org/D134527
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.
Differential Revision: https://reviews.llvm.org/D134704
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.
Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.
Differential Revision: https://reviews.llvm.org/D134527
Second patch in the series to remove legacy PM and
associated -enable-new-pm=0 flag targets pass that
has not been ported to new PM - PruneEH.
Discussion about this can be found in D44415.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134686
MemoryLocation::getOrNone() already has the necessary logic to
handle different instruction types. Use it, rather than repeating
a subset of the logic. This adds support for previously unhandled
instructions like atomicrmw.
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.
This patch reorders the parameters so that callers do not have to
specify as many default parameters.
Differential Revision: https://reviews.llvm.org/D134125
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.
Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.
Differential Revision: https://reviews.llvm.org/D134121
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).
This patch teaches InlineFunction to merge function attributes. This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.
Differential Revision: https://reviews.llvm.org/D134117
UseInlinePriority specifies the priority function. This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.
Differential Revision: https://reviews.llvm.org/D134100
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.
Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner. DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:
- The use of DefaultInlineOrder doesn't make sense in the module
inliner because there is no priority inherent in the order in which
call sites are added to the list of call sites -- SmallVector.
- The use of any other inline order doesn't make sense in the CGSCC
inliner because little prioritization can be done within one CGSCC.
This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).
At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.
Differential Revision: https://reviews.llvm.org/D134080
This fixes https://github.com/llvm/llvm-project/issues/57616.
Type test lowering in ThinLTO modules relies on having type id
summaries set up for the referenced types, which provide the type
test resolution. If there is no summary, the type tests are lowered
to false. At the very least, a default type id summary gives the
type tests a resolution of Unknown, which is handled correctly (ignored
by the first invocation of LTT, and lowered to true by the second).
WPD sets up the type id summaries (with a default type test resolution)
as it is processing the type tests, but only does this for the patterns
handled by WPD, which is a type test directly feeding an assume. In the
case of type tests feeding an assume via a phi, the type id summary was
not being set up, leading to the type tests being lowered to false
incorrectly.
Fix this by adding the default type id summary entries for all type ids
used on globals during index-only WPD.
This is not an issue for hybrid (split-lto-unit) LTO, as in that case
the type test resolution is determined and set up during LTT, since the
type definitions are in the regular LTO split module, and exported via
the summary to the ThinLTO split module.
Differential Revision: https://reviews.llvm.org/D134012
These comments refer to the nested loop in the module inliner where
the inner loop grouped call sites from the same caller. We don't
group call sites anymore, so the comment has become stale.
In the CGSCC inliner, DidInline was used as an indicator to update the call graph.
In the module inliner, DidInline is always true at the end of the
"while" loop, so can just drop it.