5465 Commits

Author SHA1 Message Date
Johannes Rudolf Doerfert
41a278f56a [OpenMP][FIX] Do not add custom state machine eagerly in LTO runs
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.

Differential Revision: https://reviews.llvm.org/D136740
2022-10-26 10:40:11 -07:00
Momchil Velikov
9901583968 Revert "[FuncSpec] Fix specialisation based on literals"
This reverts commit a8b0f580170089fcd555ade5565ceff0ec60f609 because
of "reverse-iteration" buildbot failure.
2022-10-26 13:54:12 +01:00
Momchil Velikov
2c8a4c6e62 Revert "[FuncSpec][NFC] Refactor finding specialisation opportunities"
This reverts commit a8853924bd3c50deebfbf993c037257ccf9805f4 due to dependency
on a8b0f5801700
2022-10-26 13:54:12 +01:00
Momchil Velikov
a8853924bd [FuncSpec][NFC] Refactor finding specialisation opportunities
This patch reorders the traversal of function call sites and function
formal parameters to:

* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.

* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D135968
2022-10-26 10:18:35 +01:00
Momchil Velikov
606d25e545 [FuncSpec] Compute specialisation gain even when forcing specialisation
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g.  for a function

    void f(int x, int y)

the call like `f(1, 2)` could be matched by either

    void f.1(int x /* int y == 2 */);

or

    void f.2(/* int x == 1, int y == 2 */);

The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.

This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D136180
2022-10-26 10:08:03 +01:00
Momchil Velikov
a8b0f58017 [FuncSpec] Fix specialisation based on literals
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` .  There are a few
issues with the implementation, though:

* even with the default, the pass will still specialise based on
   floating-point literals

* even when it's enabled, the pass will specialise only for the `i1`
    type (or `i2` if all of the possible 4 values occur, or `i3` if all
    of the possible 8 values occur, etc)

The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.

This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:

* unknown or undef
* a constant
* a constant range with a single element

on the basis that specialisation is pointless for those cases.

Is also changes the criteria for picking up an actual argument to
specialise if the argument is:

* a LLVM IR constant
* has `constant` lattice value
 has `constantrange` lattice value with a single element.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135893
2022-10-26 09:55:33 +01:00
Momchil Velikov
1a525dec7f [FuncSpec] Fix missed opportunities for function specialisation
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.

With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135867
2022-10-25 23:19:48 +01:00
Momchil Velikov
c47739b45c [FuncSpec] Consider small noinline functions for specialisation
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135862
2022-10-25 19:49:04 +01:00
Nikita Popov
7c32c7e777 Reapply [FunctionAttrs] Make location classification more precise
Reapplying after the fix for volatile modelling in D135863.

-----

Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
2022-10-20 15:13:20 +02:00
Nikita Popov
bed31153b7 [FuncAttrs] Extract code for adding a location access (NFC)
This code is the same for accesses from call arguments and for
accesses from other (single-location) instructions. Extract i
into a common function.
2022-10-20 12:01:27 +02:00
Nikita Popov
f2fe289374 [FunctionAttrs] Volatile operations can access inaccessible memory
Per LangRef, volatile operations are allowed to access the location
of their pointer argument, plus inaccessible memory:

> Any volatile operation can have side effects, and any volatile
> operation can read and/or modify state which is not accessible
> via a regular load or store in this module.
> [...]
> The allowed side-effects for volatile accesses are limited. If
> a non-volatile store to a given address would be legal, a volatile
> operation may modify the memory at that address. A volatile
> operation may not modify any other memory accessible by the
> module being compiled. A volatile operation may not call any
> code in the current module.

FunctionAttrs currently does not model this and ends up marking
functions with volatile accesses on arguments as argmemonly,
even though they should be inaccessiblemem_or_argmemonly.

Differential Revision: https://reviews.llvm.org/D135863
2022-10-20 11:57:10 +02:00
Fangrui Song
c80b12d352 Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 8ef3fd8d59ba0100bc6e83350ab1e978536aa531.

I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up),
as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).
2022-10-19 11:24:12 -07:00
Nikita Popov
747f27d97d [AA] Rename getModRefBehavior() to getMemoryEffects() (NFC)
Follow up on D135962, renaming the method name to match the new
type name.
2022-10-19 11:03:54 +02:00
Nikita Popov
1a9d9823c5 [AA] Rename uses of FunctionModRefBehavior (NFC)
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
2022-10-19 10:54:47 +02:00
Sjoerd Meijer
f7c42a278b Revert "Recommit "[LoopFlatten] Enable it by default""
This reverts commit 5b9597f59a445523bd59b5251ab1c2865e74919f.

A miscompilation was reported:
https://github.com/llvm/llvm-project/issues/58441

Reverting this while I look at that.
2022-10-18 23:36:36 +05:30
Sjoerd Meijer
5b9597f59a Recommit "[LoopFlatten] Enable it by default"
The sanitizer bots turned green again after another change went in, i.e.
revert 26dd64ba9cfabe5474bb207f3b7099965f81fed7, so I don't think this
patch was causing the problems.
2022-10-17 23:27:19 +05:30
Sjoerd Meijer
a71c4e4fbb Revert "[LoopFlatten] Enable it by default"
This reverts commit 233659c7ae9b83b64a9f739d340736bca39c3d2e.

I see some sanitizer build bot failures. Not sure if it is change
causing it, but let's see if a revert returns the bots to green...
2022-10-17 22:14:20 +05:30
Sjoerd Meijer
233659c7ae [LoopFlatten] Enable it by default
LoopFlatten has been in the code base off by default for years, but this
enables it to run by default. Downstream this has been running for
years, so it has been exposed to quite some code. Then around the time
we switched to the NPM, several fixes went in related to updating the
MemorySSA state and we moved it to a loop pass manager, which both
helped preventing rerunning certain analysis passes, and thus helped a
bit with compile-times.

About compile-times, adding a pass isn't free, but this should see only
very minor increases. The pass is relatively simple and there shouldn't
be anything algorithmically expensive because all it does is looking at
inner/outer loops and it checks assumptions on loop increments and
indices. If we see increases, I expect this to mainly come from
invalidation of analysis info, and perhaps subsequent passes to trigger
and do more. Despite its simplicity/restrictions, it triggers in most
code-bases, which makes it worth to enable this by default.

Differential Revision: https://reviews.llvm.org/D109958
2022-10-17 17:11:39 +05:30
Nikita Popov
d44cd1bbeb Revert "[FunctionAttrs] Make location classification more precise"
This reverts commit b05f5b90a12098660a4fc16da0b4d421ddfe14e2.

There are thread sanitizer buildbot failures in simple_stack.c.
I think that's because this ended up affecting the handling of
volatile accesses to allocas. Reverting for now.
2022-10-13 12:11:04 +02:00
Nikita Popov
b05f5b90a1 [FunctionAttrs] Make location classification more precise
Don't add argmem if the pointer is clearly not an argument (e.g.
a global). I don't think this makes a difference right now, but
gives more obvious results with D135780.
2022-10-13 11:24:23 +02:00
Nikita Popov
440ce05fbf [FunctionAttrs] Handle potential access of captured argument
We have to account for accesses to argument memory via captures.
I don't think there's any way to make this produce incorrect
results right now (because as soon as "other" is set, we lose the
ability to infer argmemonly), but this avoids incorrect results
once we have more precise representation.
2022-10-13 11:15:36 +02:00
Nikita Popov
5b3776842f [FunctionAttrs] Account for memory effects of inalloca/preallocated
The code for inferring memory attributes on arguments claims that
inalloca/preallocated arguments are always clobbered:
d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)

However, we would still infer memory attributes for the whole
function without taking this into account, so we could still end
up inferring readnone for the function. This adds an argument
clobber if there are any inalloca/preallocated arguments.

Differential Revision: https://reviews.llvm.org/D135783
2022-10-13 10:20:17 +02:00
Fangrui Song
8ef3fd8d59 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-10-11 15:30:07 -07:00
Jordan Rupprecht
fb27fd5f88 Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 4fbe33593c8132fdc48647c06f4d1455bfff1c88. It causes linking errors, with details provided internally. (Hopefully the author/reviewers will be able to upstream the internal repro).
2022-10-10 11:40:45 -07:00
Nikita Popov
874c0327e7 [Attributor] Use ConstantFoldLoadFromConst()
When determining the initial value of the object, use the constant
folding API to load a given type at a given offset in the global
initializer. This makes it work for cases where the load doesn't
directly correspond to an aggregate member.

Differential Revision: https://reviews.llvm.org/D135435
2022-10-10 10:17:37 +02:00
Fangrui Song
4fbe33593c [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
See the updated linkonce_resolution_comdat.ll. For a local linkage GV in a
non-prevailing COMDAT, it remains defined while its leader has been made
available_externally. This violates the COMDAT rule that its members must be
retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
# one _Z3foov
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-10-08 11:09:43 -07:00
Johannes Doerfert
e18736149c [Attributor] Teach AAPointerInfo about atomic cmxchg and rmw
The atomic operations behave similar to a store except that we don't
know the new value and we read the result first.
2022-10-05 06:48:00 -07:00
Johannes Doerfert
93e51fa444 [Attributor] AAPointerInfo can model non-escaping call uses
If a call base use will not capture a pointer we can approximate the
effects. This is important especially for readnone/only uses. Even
may-write uses are not too bad with reachability in place. Capturing
is the problem as we loose track of update sides.
2022-10-05 06:29:14 -07:00
Johannes Doerfert
477e8e10f0 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-10-05 06:19:47 -07:00
Johannes Doerfert
a9557115b4 [Attributor] Qualify variables to avoid clashes in the future 2022-10-04 19:43:04 -07:00
Nikita Popov
412141663c Reapply [FunctionAttrs] Infer precise FMRB
The previous version of the patch would incorrect convert an
existing argmemonly attribute into an inaccessiblemem_or_argmemonly
attribute.

-----

This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.

Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.

Differential Revision: https://reviews.llvm.org/D134527
2022-09-29 14:02:15 +02:00
Sameer Sahasrabuddhe
3f078b308b [AAPointerInfo] OffsetInfo: Unassigned is distinct from Unknown
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.

Differential Revision: https://reviews.llvm.org/D134704
2022-09-28 20:31:36 +05:30
Benjamin Kramer
0fb2676c24 Revert "[FunctionAttrs] Infer precise FMRB"
This reverts commit 97dfa536260c434e68913129d79d863b26c1c179.

It can make DSE crash. Reduced test case at
https://reviews.llvm.org/P8291
2022-09-28 16:57:43 +02:00
Doru Bercea
c9adeca501 Move allocas converted from __kmpc_alloc_shared to entry block. 2022-09-27 17:16:58 +00:00
Nikita Popov
97dfa53626 [FunctionAttrs] Infer precise FMRB
This updates checkFunctionMemoryAccess() to infer a precise
FunctionModRefBehavior, rather than an approximation split into
read/write and argmemonly.

Afterwards, we still map this back to imprecise function attributes.
This still allows us to infer some cases that we previously did not
handle, namely inaccessiblememonly and inaccessiblemem_or_argmemonly.
In practice, this means we get better memory attributes in the
presence of intrinsics like @llvm.assume.

Differential Revision: https://reviews.llvm.org/D134527
2022-09-27 10:14:35 +02:00
Sebastian Peryt
46fc75ab28 [NFC][2/n] Remove PrunePH pass
Second patch in the series to remove legacy PM and
associated -enable-new-pm=0 flag targets pass that
has not been ported to new PM - PruneEH.
Discussion about this can be found in D44415.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134686
2022-09-26 18:38:04 -07:00
Nikita Popov
fe196380cc [FunctionAttrs] Use MemoryLocation::getOrNone() when infering memory attrs
MemoryLocation::getOrNone() already has the necessary logic to
handle different instruction types. Use it, rather than repeating
a subset of the logic. This adds support for previously unhandled
instructions like atomicrmw.
2022-09-23 13:56:55 +02:00
Kazu Hirata
00874c48ea [IPO] Reorder parameters of InlineFunction (NFC)
With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.

This patch reorders the parameters so that callers do not have to
specify as many default parameters.

Differential Revision: https://reviews.llvm.org/D134125
2022-09-20 09:09:38 -07:00
Kazu Hirata
d3b95ecc98 [ModuleInliner] Remove InlineOrder::front (NFC)
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.

Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.

Differential Revision: https://reviews.llvm.org/D134121
2022-09-18 08:49:44 -07:00
Benjamin Kramer
b987fe4972 Silence unused variable warning in release builds. NFC 2022-09-18 09:15:32 +02:00
Kazu Hirata
284f0397e2 [Transforms] Merge function attributes within InlineFunction (NFC)
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).

This patch teaches InlineFunction to merge function attributes.  This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.

Differential Revision: https://reviews.llvm.org/D134117
2022-09-17 23:10:23 -07:00
Kazu Hirata
6e4fbd2f51 [ModuleInliner] Set Changed earlier (NFC)
It makes more sense to set Changed to true immediately after a
successful inlining.
2022-09-17 14:16:32 -07:00
Kazu Hirata
31b91356bc [ModuleInliner] Don't include SetVector.h (NFC)
We don't use SetVector in the module inliner.
2022-09-17 12:17:52 -07:00
Kazu Hirata
5faf4bf195 [ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC)
UseInlinePriority specifies the priority function.  This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.

Differential Revision: https://reviews.llvm.org/D134100
2022-09-17 11:41:28 -07:00
Kazu Hirata
6e30a9cc08 [Inliner] Retire DefaultInlineOrder (NFC)
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.

Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner.  DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:

- The use of DefaultInlineOrder doesn't make sense in the module
  inliner because there is no priority inherent in the order in which
  call sites are added to the list of call sites -- SmallVector.

- The use of any other inline order doesn't make sense in the CGSCC
  inliner because little prioritization can be done within one CGSCC.

This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).

At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.

Differential Revision: https://reviews.llvm.org/D134080
2022-09-16 15:36:40 -07:00
Teresa Johnson
c2cf93c1a9 [WPD/LTT] Lower type test feeding assumes via phi correctly
This fixes https://github.com/llvm/llvm-project/issues/57616.

Type test lowering in ThinLTO modules relies on having type id
summaries set up for the referenced types, which provide the type
test resolution. If there is no summary, the type tests are lowered
to false. At the very least, a default type id summary gives the
type tests a resolution of Unknown, which is handled correctly (ignored
by the first invocation of LTT, and lowered to true by the second).

WPD sets up the type id summaries (with a default type test resolution)
as it is processing the type tests, but only does this for the patterns
handled by WPD, which is a type test directly feeding an assume. In the
case of type tests feeding an assume via a phi, the type id summary was
not being set up, leading to the type tests being lowered to false
incorrectly.

Fix this by adding the default type id summary entries for all type ids
used on globals during index-only WPD.

This is not an issue for hybrid (split-lto-unit) LTO, as in that case
the type test resolution is determined and set up during LTT, since the
type definitions are in the regular LTO split module, and exported via
the summary to the ThinLTO split module.

Differential Revision: https://reviews.llvm.org/D134012
2022-09-16 13:50:01 -07:00
Kazu Hirata
9111920af8 [ModuleInliner] clang-format ModuleInliner.cpp (NFC) 2022-09-16 09:41:42 -07:00
Kazu Hirata
4475470529 [ModuleInliner] Remove a stale comment (NFC)
These comments refer to the nested loop in the module inliner where
the inner loop grouped call sites from the same caller.  We don't
group call sites anymore, so the comment has become stale.
2022-09-16 09:37:43 -07:00
Kazu Hirata
42a90e6017 [ModuleInliner] Remove a redundaunt variable (NFC)
In the CGSCC inliner, DidInline was used as an indicator to update the call graph.

In the module inliner, DidInline is always true at the end of the
"while" loop, so can just drop it.
2022-09-16 09:32:02 -07:00
Kazu Hirata
513717ddd0 [ModuleInliner] Remove a write-only variable (NFC)
InlinedCallees is a remnant from the CGSCC inliner.  We don't use it
in the module inliner.
2022-09-16 09:15:53 -07:00