1014 Commits

Author SHA1 Message Date
Min-Yih Hsu
64314dedeb
[InlineCost] Print inline cost for invoke call sites as well (#114476)
Previously InlineCostAnnotationPrinter only prints inline cost for call
instructions. I don't think there is any reason not to analyze invoke
and its callee, and this patch adds such support.
2024-11-01 09:55:17 -07:00
Steven Perron
f405c683ba
[OPT] Search whole BB for convergence token. (#112728)
The spec for llvm.experimental.convergence.entry says that is must be in
the entry block for a function, and must preceed any other convergent
operation. It does not have to be the first instruction in the entry
block.

Inlining assumes that the call to llvm.experimental.convergence.entry
will be the first instruction after any phi instructions. This commit
modifies inlining to search the entire block for the call.
2024-10-30 11:19:23 -04:00
goldsteinn
69a798a996
Reapply "[Inliner] Propagate more attributes to params when inlining (#91101)" (2nd Attempt) (#112749)
Root cause of the bug was code hanging onto `range` attr after
changing BitWidth. This was fixed in PR #112633.
2024-10-17 20:28:47 -05:00
Arthur Eubanks
9e6d24f61f Revert "[Inliner] Propagate more attributes to params when inlining (#91101)"
This reverts commit ae778ae7ce72219270c30d5c8b3d88c9a4803f81.

Creates broken IR, see comments in #91101.
2024-10-16 21:21:34 +00:00
goldsteinn
ae778ae7ce
[Inliner] Propagate more attributes to params when inlining (#91101)
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**

Add support for propagating:
        - `derefereancable`
        - `derefereancable_or_null`
        - `align`
        - `nonnull`
        - `range`
    
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
2024-10-16 11:53:21 -05:00
goldsteinn
3c777f04f0
[Inliner] Don't propagate access attr to byval params (#112256)
- **[Inliner] Add tests for bad propagationg of access attr for `byval`
param; NFC**
- **[Inliner] Don't propagate access attr to `byval` params**

We previously only handled the case where the `byval` attr was in the
callbase's param attr list. This PR also handles the case if the
`ByVal` was a param attr on the function's param attr list.
2024-10-15 09:25:16 -05:00
Shilei Tian
e34e27f198
[TTI][AMDGPU] Allow targets to adjust LastCallToStaticBonus via getInliningLastCallToStaticBonus (#111311)
Currently we will not be able to inline a large function even if it only
has one live use because the inline cost is still very high after
applying `LastCallToStaticBonus`, which is a constant. This could
significantly impact the performance because CSR spill is very
expensive.

This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to
allow targets to customize this value.

Fixes SWDEV-471398.
2024-10-11 10:19:54 -04:00
Teresa Johnson
79b32bcda6
[MemProf] Strip callsite metadata when inlining an unprofiled callsite (#110998)
We weren't flagging inlined callee functions with callsite but not
memprof metadata correctly, leading to the callsite metadata not being
stripped when that function was inlined into a callsite that didn't
itself have callsite metadata.

In practice, this meant that we went into the LTO link with many more
calls than necessary having callsite metadata / summary records, which
in turn made the graph larger than necessary.

Fixing this oversight resulted in huge reductions in the thin link of a
large target:
99% fewer duplicated context ids (recall we have to duplicate when
callsites containing the same stack ids are in different functions)
71% fewer graph edges
17% fewer graph nodes
13% fewer functions cloned
44% smaller peak memory
47% smaller time
2024-10-03 08:06:56 -07:00
goldsteinn
a9352a0d31
[Inliner] Fix bug where attributes are propagated incorrectly (#109347)
- **[Inliner] Add tests for incorrect propagation of return attrs; NFC**
- **[Inliner] Fix bug where attributes are propagated incorrectly**

The bug stems from the fact that we assume the new (inlined) callsite
is calling the same function as the original (callee) callsite. While
this is typically the case, since `VMap` simplifies the new
instructions, callee intrinsics callsites can end up not corresponding
with the same function.

This can lead to buggy propagation.
2024-09-20 19:57:35 -05:00
Simon Pilgrim
b065ec0af5 [Inline][X86] Regenerate inline-target-cpu-* tests 2024-08-30 12:06:24 +01:00
Aiden Grossman
085587e1a9 Reland "[MLGO] Remove Python <3.8 from unsupported config (#106132)"
This reverts commit c3776c11c26e5c0e27b772e6694e6c76f73ac9e8.

This relands commit a959d70eb5b6d47c0b32eb34fc409e50c01d722d.

This was originally causing bot failures on Python version 3.8.
This relanding fixes that by adjusting the relevant type annotations
that are not supported in earlier versions.
2024-08-26 18:45:34 -07:00
Aiden Grossman
c3776c11c2 Revert "[MLGO] Remove Python <3.8 from unsupported config (#106132)"
This reverts commit a959d70eb5b6d47c0b32eb34fc409e50c01d722d.

This was causing bot failures.

https://lab.llvm.org/buildbot/#/builders/174/builds/3975
2024-08-26 23:36:56 +00:00
Aiden Grossman
a959d70eb5
[MLGO] Remove Python <3.8 from unsupported config (#106132)
Now that Python 3.8 is the minimum version supported by LLVM, we don't
need to explicitly check that the python version we are using is greater
than 3.8 in the MLGO tests.
2024-08-26 13:57:43 -07:00
David Green
83a5c7cb62 [ConstantFolding] Ensure TLI is valid when simplifying fp128 intrinsics.
TLI might not be valid for all contexts that constant folding is performed. Add
a quick guard that it is not null.
2024-08-24 14:39:20 +01:00
Matt Arsenault
edded8d7b5
AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (#101699)
This is now autoupgraded to annotate atomicrmw instructions in
old bitcode.
2024-08-13 22:02:25 +04:00
Andreas Jonson
04da77308f
Allow empty range attribute and add assert for full range (#100601)
fix #99619
2024-08-08 18:07:09 +02:00
Sander de Smalen
fb470db7b3
[AArch64] Avoid inlining if ZT0 needs preserving. (#101343)
Inlining may result in different behaviour when the callee clobbers ZT0,
because normally the call-site will have code to preserve ZT0. When
inlining the function this code to preserve ZT0 will no longer be
emitted, and so the resulting behaviour of the program is changed.
2024-08-02 10:29:08 +01:00
Daniel Kiss
1782810b84 [Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.

This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.

Releand with test fixes.
2024-07-10 11:32:41 +02:00
Daniel Kiss
4b2daeccc7
Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284)
Reverts llvm/llvm-project#82819
2024-07-10 10:22:38 +02:00
Daniel Kiss
e15d67cfc2
[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
 
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
2024-07-10 10:06:14 +02:00
Yingwei Zheng
be7239e5a6
[Inline] Remove bitcast handling in CallAnalyzer::stripAndComputeInBoundsConstantOffsets (#97988)
As we are now using opaque pointers, bitcast handling is no longer
needed.

Closes https://github.com/llvm/llvm-project/issues/97590.
2024-07-09 15:08:04 +08:00
Arthur Eubanks
94471e6d23
[MLInliner] Handle CGSCC changes from #94815 (#96274)
With #94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.

The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.
2024-07-03 10:14:49 -07:00
Daniil Fukalov
12c1156207
[NFC][AlwaysInliner] Reduce AlwaysInliner memory consumption. (#96958)
Refactored AlwaysInliner to remove some of inlined functions earlier.

Before the change AlwaysInliner walked through all functions in the
module and inlined them into calls where it is appropriate. Removing of
the dead inlined functions was performed only after all of inlining. For
the test case from the issue
[59126](https://github.com/llvm/llvm-project/issues/59126) compiler
consumes all of the memory on 64GB machine, so is killed.

The change checks if just inlined function can be removed from the
module and removes it.
2024-07-02 10:43:49 +02:00
Matt Arsenault
e47359a925
Inline: Fix handling of byval using non-alloca addrspace (#97306)
Use the address space of the original pointer argument instead
of querying the datalayout. This avoids producing a verifier error
since this was changing the address space for the user instructions.

Fixes #97086
2024-07-01 21:09:41 +02:00
Mingming Liu
1518b260ce
[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. (#81442)
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
    
* Function-comparison (before):

```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
```

* VTable-comparison (after):

```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func
```
    
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
     1) Update vtable value profiles after inline
     2) For either function-based comparison or vtable-based comparison,
          update both vtable and indirect call value profiles.
2024-06-29 23:21:33 -07:00
Mircea Trofin
600ff28772
[mlgo] add 2 new features whether caller/callee is available_externally (#96585)
AvailableExternally linkage is interesting because, in ThinLTO cases, it
means the function may get elided if it survives inlining - see
`elim-avail-extern` pass.
2024-06-25 12:36:40 -07:00
Noah Goldstein
db03d9d33a Recommit "[Inliner] Propagate callee argument memory access attributes before inlining" (2nd Try)
In the re-commit, just dropping the propagation of `writeonly` as that
is the only attribute that can play poorly with call slot optimization
(see issue: #95152 for more details).

Closes #95888
2024-06-21 16:14:28 +08:00
Mircea Trofin
6037a698b9
[mlgo] inline for size: add bypass mechanism for perserving performance (#95616)
This allows shrinking for size the cold part of the code, without sacrificing performance.
2024-06-17 14:18:55 -07:00
Stephen Tozer
094572701d
[RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Nikita Popov
5f99a7a51a Revert "[Inliner] Propagate callee argument memory access attributes before inlining"
This exposes a miscompile reported in
https://github.com/llvm/llvm-project/issues/95152.

Whether the new inference or MemCpyOpt is at fault depends on
the precise semantics of writeonly attributes. Revert the patch
while this is being pinned down.

This reverts commit 285dbed147e243f416b003e150d67ffb0922ff16.
This reverts commit cda5790e38af5da3ad455eddab36ef16bf3e8104.
2024-06-12 12:32:50 +02:00
Arthur Eubanks
71497cc7a4
[CGSCC] Fix compile time blowup with large RefSCCs (#94815)
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.

There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.

Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
2024-06-11 09:50:13 -07:00
Min-Yih Hsu
1fe4f2d1a4
[Inliner][test] Fix incorrect REQUIRE line in inline-switch-default.ll (NFC) (#95009)
It should be `x86-registered-target` because we only need the X86 target
in this case. `x86_64-linux` will be too strict here as it puts a
prerequisite on the default target triple.
2024-06-10 15:32:35 -07:00
Nikita Popov
deab451e7a
[IR] Remove support for icmp and fcmp constant expressions (#93038)
Remove support for the icmp and fcmp constant expressions.

This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179

As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
2024-06-04 08:31:03 +02:00
Andreas Jonson
5c214eb0c6
[Inline] Clone return range attribute on the callsite into inlined call (#92666) 2024-05-29 12:05:05 +02:00
Krzysztof Pszeniczny
cda5790e38
[Inliner] Don't propagate memory attributes to byval params (#93381)
Memory restrictions for params to the inlined function do not apply to
the copies logically made when that function further passes its own
params as byval.

In other words, imagine that `@foo()` calls `@bar(ptr readonly %p)`
which in turn calls `@baz(ptr byval("...") %p)` (passing the same `%p`).
This is fully legal - `baz` is allowed to modify its copy of the object
referenced by `%p` because the argument is passed by value. However,
when inlining `@bar` into `@foo`, we can't say that the callsite is now
`@baz(ptr readonly byval("...") %p)`, as this would mean that `@baz` is
not allowed to modify it's copy of the object pointed to by `%p`.
LangRef says: "The copy is considered to belong to the caller not the
callee (for example, readonly functions should not write to byval
parameters)".

This fixes a miscompile introduced by PR #89024 in a program in the
Google codebase.
2024-05-26 18:05:13 +02:00
Alex Voicu
10edb4991c
[Clang][CodeGen] Start migrating away from assuming the Default AS is 0 (#88182)
At the moment, Clang is rather liberal in assuming that 0 (and by extension unqualified) is always a safe default. This does not work for targets that actually use a different value for the default / generic AS (for example, the SPIRV that obtains from HIPSPV or SYCL). This patch is a first, fairly safe step towards trying to clear things up by querying a modules' default AS from the target, rather than assuming it's 0, alongside fixing a few places where things break / we encode the 0 == DefaultAS assumption. A bunch of existing tests are extended to check for non-zero default AS usage.
2024-05-19 14:59:03 +01:00
David Green
220756f1f9 [AArch64][Inline] Regenerate Inline/AArch64/binop.ll test check lines. NFC
Should hopefully help with #91854
2024-05-13 09:49:09 +01:00
DianQK
d48bf8aef2
Reapply "[InlineCost] Correct the default branch cost for the switch statement (#85160)"
This reverts commit c6e4f6309184814dfc4bb855ddbdb5375cc971e0.
2024-05-10 21:18:53 +08:00
Mingming Liu
64f4ceb09e
[Inline][PGO] After inline, update InvokeInst profile counts in caller and cloned callee (#83809)
A related change is https://reviews.llvm.org/D133121, which correctly
preserves both branch weights and value profiles for invoke instruction.
* If the branch weight of the `invokeinst` specifies taken / not-taken branches, there is no scale.
2024-05-08 15:48:40 -07:00
DianQK
c6e4f63091
Revert "[InlineCost] Correct the default branch cost for the switch statement (#85160)"
This reverts commit 882814edd33cab853859f07b1dd4c4fa1393e0ea.
2024-05-05 21:54:30 +08:00
Quentin Dian
882814edd3
[InlineCost] Correct the default branch cost for the switch statement (#85160)
Fixes #81723.

The earliest commit of the related code is:
919f9e8d65.
I tried to understand the following code with
https://github.com/llvm/llvm-project/pull/77856#issuecomment-1993499085.

5932fcc478/llvm/lib/Analysis/InlineCost.cpp (L709-L720)

I think only scenarios where there is a default branch were considered.
2024-05-05 21:28:31 +08:00
Noah Goldstein
285dbed147 [Inliner] Propagate callee argument memory access attributes before inlining
To avoid losing information, we can propagate some access attribute
from the to-be-inlined callee to its callsites.

We can propagate argument memory access attributes to callsite
parameters if they are from the same underlying object.

Closes #89024
2024-05-03 14:10:24 -05:00
Noah Goldstein
f8ff51e1b0 [Inliner] Add tests for not propagating writable if readonly is present; NFC 2024-05-03 14:10:24 -05:00
Matt Arsenault
9f9856d623 AMDGPU: Update name for amdgpu.no.remote.memory metadata 2024-05-03 11:50:59 +02:00
Antonio Frighetto
1bb929833b [Inline][Cloning] Drop incompatible attributes from NewFunc
Performing `instSimplify` while cloning is unsafe due to incomplete
remapping (as reported in #87534). Ideally, `instSimplify` ought to
reason on the updated newly-cloned function, after returns have been
rewritten and callee entry basic block / call-site have been fixed up.
This is in contrast to `CloneAndPruneIntoFromInst` behaviour, which
is inherently expected to clone basic blocks, with pruning on top of
– if any –, and not actually fixing up returns / CFG, which should be
up to the Inliner. We may solve this by letting `instSimplify` work on
the newly-cloned function, while maintaining old function attributes,
so as to avoid inconsistencies between the yet-to-be-solved return
type, and new function ret type attributes.
2024-05-02 16:29:09 +02:00
Antonio Frighetto
42c7cb6969 Reapply "[Inline][Cloning] Defer simplification after phi-nodes resolution"
Original commit: a61f9fe31750cee65c726fb51f1b14e31e177258

Multiple 2-stage buildbots were reporting failures. These issues have been
addressed separately.

Fixes: https://github.com/llvm/llvm-project/issues/87534.
2024-05-02 16:29:09 +02:00
Vitaly Buka
29c98e59cd Revert "[Inline][Cloning] Defer simplification after phi-nodes resolution" #87963
Reopens #87534.

Breaks multiple bots:
https://lab.llvm.org/buildbot/#/builders/168/builds/20028
https://lab.llvm.org/buildbot/#/builders/74/builds/27773

And reproducer in a61f9fe31750cee65c726fb51f1b14e31e177258.

This reverts commit a61f9fe31750cee65c726fb51f1b14e31e177258.
2024-04-24 14:51:54 -07:00
Antonio Frighetto
a61f9fe317 [Inline][Cloning] Defer simplification after phi-nodes resolution
A logic issue arose when inlining via `CloneAndPruneFunctionInto`,
which, besides cloning, performs instruction simplification as well.
By the time a new cloned instruction is being simplified, phi-nodes
are not remapped yet as the whole CFG needs to be processed first.
As `VMap` state at this stage is incomplete, `threadCmpOverPHI` and
variants could lead to unsound optimizations. This issue has been
addressed by performing basic constant folding while cloning, and
postponing instruction simplification once phi-nodes are revisited.

Fixes: https://github.com/llvm/llvm-project/issues/87534.
2024-04-24 16:55:33 +02:00
Antonio Frighetto
c1d00510ab [Inline][Cloning] Introduce test for PR87963 (NFC) 2024-04-24 16:55:33 +02:00
Matt Arsenault
f433c3b380
AMDGPU: Add tests for atomicrmw handling of new metadata (#89248)
Add baseline tests which should comprehensively test the new atomic
metadata. Test codegen / expansion, and preservation in a few
transforms.

New metadata defined in #85052
2024-04-20 00:43:36 +02:00