Previously InlineCostAnnotationPrinter only prints inline cost for call
instructions. I don't think there is any reason not to analyze invoke
and its callee, and this patch adds such support.
The spec for llvm.experimental.convergence.entry says that is must be in
the entry block for a function, and must preceed any other convergent
operation. It does not have to be the first instruction in the entry
block.
Inlining assumes that the call to llvm.experimental.convergence.entry
will be the first instruction after any phi instructions. This commit
modifies inlining to search the entire block for the call.
- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**
Add support for propagating:
- `derefereancable`
- `derefereancable_or_null`
- `align`
- `nonnull`
- `range`
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
- **[Inliner] Add tests for bad propagationg of access attr for `byval`
param; NFC**
- **[Inliner] Don't propagate access attr to `byval` params**
We previously only handled the case where the `byval` attr was in the
callbase's param attr list. This PR also handles the case if the
`ByVal` was a param attr on the function's param attr list.
Currently we will not be able to inline a large function even if it only
has one live use because the inline cost is still very high after
applying `LastCallToStaticBonus`, which is a constant. This could
significantly impact the performance because CSR spill is very
expensive.
This PR adds a new function `getInliningLastCallToStaticBonus` to TTI to
allow targets to customize this value.
Fixes SWDEV-471398.
We weren't flagging inlined callee functions with callsite but not
memprof metadata correctly, leading to the callsite metadata not being
stripped when that function was inlined into a callsite that didn't
itself have callsite metadata.
In practice, this meant that we went into the LTO link with many more
calls than necessary having callsite metadata / summary records, which
in turn made the graph larger than necessary.
Fixing this oversight resulted in huge reductions in the thin link of a
large target:
99% fewer duplicated context ids (recall we have to duplicate when
callsites containing the same stack ids are in different functions)
71% fewer graph edges
17% fewer graph nodes
13% fewer functions cloned
44% smaller peak memory
47% smaller time
- **[Inliner] Add tests for incorrect propagation of return attrs; NFC**
- **[Inliner] Fix bug where attributes are propagated incorrectly**
The bug stems from the fact that we assume the new (inlined) callsite
is calling the same function as the original (callee) callsite. While
this is typically the case, since `VMap` simplifies the new
instructions, callee intrinsics callsites can end up not corresponding
with the same function.
This can lead to buggy propagation.
This reverts commit c3776c11c26e5c0e27b772e6694e6c76f73ac9e8.
This relands commit a959d70eb5b6d47c0b32eb34fc409e50c01d722d.
This was originally causing bot failures on Python version 3.8.
This relanding fixes that by adjusting the relevant type annotations
that are not supported in earlier versions.
Now that Python 3.8 is the minimum version supported by LLVM, we don't
need to explicitly check that the python version we are using is greater
than 3.8 in the MLGO tests.
Inlining may result in different behaviour when the callee clobbers ZT0,
because normally the call-site will have code to preserve ZT0. When
inlining the function this code to preserve ZT0 will no longer be
emitted, and so the resulting behaviour of the program is changed.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
Releand with test fixes.
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
With #94815, the nodes belonging to dead functions are no longer
invalidated, but kept around to batch delete at the end of the call
graph walk.
The ML inliner needs to be updated to handle this. This fixes some
asserts getting hit, e.g. https://crbug.com/348376263.
Refactored AlwaysInliner to remove some of inlined functions earlier.
Before the change AlwaysInliner walked through all functions in the
module and inlined them into calls where it is appropriate. Removing of
the dead inlined functions was performed only after all of inlining. For
the test case from the issue
[59126](https://github.com/llvm/llvm-project/issues/59126) compiler
consumes all of the memory on 64GB machine, so is killed.
The change checks if just inlined function can be removed from the
module and removes it.
Use the address space of the original pointer argument instead
of querying the datalayout. This avoids producing a verifier error
since this was changing the address space for the user instructions.
Fixes#97086
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
* Function-comparison (before):
```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
call %func
```
* VTable-comparison (after):
```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
call %func
```
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
1) Update vtable value profiles after inline
2) For either function-based comparison or vtable-based comparison,
update both vtable and indirect call value profiles.
AvailableExternally linkage is interesting because, in ThinLTO cases, it
means the function may get elided if it survives inlining - see
`elim-avail-extern` pass.
In the re-commit, just dropping the propagation of `writeonly` as that
is the only attribute that can play poorly with call slot optimization
(see issue: #95152 for more details).
Closes#95888
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
This exposes a miscompile reported in
https://github.com/llvm/llvm-project/issues/95152.
Whether the new inference or MemCpyOpt is at fault depends on
the precise semantics of writeonly attributes. Revert the patch
while this is being pinned down.
This reverts commit 285dbed147e243f416b003e150d67ffb0922ff16.
This reverts commit cda5790e38af5da3ad455eddab36ef16bf3e8104.
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.
There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().
1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().
2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.
Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.
Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
It should be `x86-registered-target` because we only need the X86 target
in this case. `x86_64-linux` will be too strict here as it puts a
prerequisite on the default target triple.
Remove support for the icmp and fcmp constant expressions.
This is part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
As usual, many of the updated tests will no longer test what they were
originally intended to -- this is hard to preserve when constant
expressions get removed, and in many cases just impossible as the
existence of a specific kind of constant expression was the cause of the
issue in the first place.
Memory restrictions for params to the inlined function do not apply to
the copies logically made when that function further passes its own
params as byval.
In other words, imagine that `@foo()` calls `@bar(ptr readonly %p)`
which in turn calls `@baz(ptr byval("...") %p)` (passing the same `%p`).
This is fully legal - `baz` is allowed to modify its copy of the object
referenced by `%p` because the argument is passed by value. However,
when inlining `@bar` into `@foo`, we can't say that the callsite is now
`@baz(ptr readonly byval("...") %p)`, as this would mean that `@baz` is
not allowed to modify it's copy of the object pointed to by `%p`.
LangRef says: "The copy is considered to belong to the caller not the
callee (for example, readonly functions should not write to byval
parameters)".
This fixes a miscompile introduced by PR #89024 in a program in the
Google codebase.
At the moment, Clang is rather liberal in assuming that 0 (and by extension unqualified) is always a safe default. This does not work for targets that actually use a different value for the default / generic AS (for example, the SPIRV that obtains from HIPSPV or SYCL). This patch is a first, fairly safe step towards trying to clear things up by querying a modules' default AS from the target, rather than assuming it's 0, alongside fixing a few places where things break / we encode the 0 == DefaultAS assumption. A bunch of existing tests are extended to check for non-zero default AS usage.
A related change is https://reviews.llvm.org/D133121, which correctly
preserves both branch weights and value profiles for invoke instruction.
* If the branch weight of the `invokeinst` specifies taken / not-taken branches, there is no scale.
To avoid losing information, we can propagate some access attribute
from the to-be-inlined callee to its callsites.
We can propagate argument memory access attributes to callsite
parameters if they are from the same underlying object.
Closes#89024
Performing `instSimplify` while cloning is unsafe due to incomplete
remapping (as reported in #87534). Ideally, `instSimplify` ought to
reason on the updated newly-cloned function, after returns have been
rewritten and callee entry basic block / call-site have been fixed up.
This is in contrast to `CloneAndPruneIntoFromInst` behaviour, which
is inherently expected to clone basic blocks, with pruning on top of
– if any –, and not actually fixing up returns / CFG, which should be
up to the Inliner. We may solve this by letting `instSimplify` work on
the newly-cloned function, while maintaining old function attributes,
so as to avoid inconsistencies between the yet-to-be-solved return
type, and new function ret type attributes.
Original commit: a61f9fe31750cee65c726fb51f1b14e31e177258
Multiple 2-stage buildbots were reporting failures. These issues have been
addressed separately.
Fixes: https://github.com/llvm/llvm-project/issues/87534.
A logic issue arose when inlining via `CloneAndPruneFunctionInto`,
which, besides cloning, performs instruction simplification as well.
By the time a new cloned instruction is being simplified, phi-nodes
are not remapped yet as the whole CFG needs to be processed first.
As `VMap` state at this stage is incomplete, `threadCmpOverPHI` and
variants could lead to unsound optimizations. This issue has been
addressed by performing basic constant folding while cloning, and
postponing instruction simplification once phi-nodes are revisited.
Fixes: https://github.com/llvm/llvm-project/issues/87534.
Add baseline tests which should comprehensively test the new atomic
metadata. Test codegen / expansion, and preservation in a few
transforms.
New metadata defined in #85052