(Reland of #190092 with verifier change to look through GlobalAliases)
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
So that it's preserved across all inline invocations rather than just
one inliner pass run.
This prevents cases where devirtualization in the simplification
pipeline uncovers inlining opportunities that should be discarded due to
inline history, but we dropped the inline history between inliner pass
runs, causing code size to blow up, sometimes exponentially.
For compile time reasons, we want to limit this to only call sites that
have the potential to inline through SCCs, potentially with the help of
devirtualization. This means that the callee is in a non-trivial
(Ref)SCC, or the call site was previously an indirect call, which can
potentially be devirtualized to call any function.
The CGSCCUpdater::InlinedInternalEdges logic still seems to be relevant
even with this change, as monster_scc.ll blows up if I remove that code.
http://llvm-compile-time-tracker.com/compare.php?from=e830d88e8ae5f44a97cc76136a0a4e83aa9157c0&to=ed535e732fc41b79ab8efda2417886cbd0812f7f&stat=instructions:uFixes#186926.
The main change is to eliminate the use of "Argument" terminology when
dealing with overloaded types since overloaded types can be either
argument or return values, and some additional renaming for clarity.
1. Rename `Tys` argument to various intrinsic APIs to `OverloadTys` to
better reflect its meaning.
2. Rename `IITDescriptorKind::Argument` to
`IITDescriptorKind::Overloaded` to better convey that it's an overloaded
type. Removed "Argument" suffix for other kinds for dependent types.
3. Rename `ArgKind` to `AnyKind`, `getArgumentNumber` to
`getOverloadIndex`, `getArgumentKind` to `getOverloadKind`,
`getRefArgNumber` to `getRefOverloadIndex`, and `IIT_ARG` to `IIT_ANY`.
4. Rename `IIT_ANYPTR` (used to represent a pointer qualified with
address space) to `IIT_PTR_AS` to clearly distinguish it from
`llvm_anyptr_ty`
5. Change the packing of [ref overload index & overload index] for
`VecOfAnyPtrsToElt` to pack the overload index into the lower bits, so
we can use the `getOverloadIndex` function to get the overload index.
This is an attempt to merge https://reviews.llvm.org/D144006 with LTO
fix.
The last merge attempt was
https://github.com/llvm/llvm-project/pull/75385.
The issue with it was investigated in
https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121.
The problem happens when
1. Several modules are being linked.
2. There are several DISubprograms that initially belong to different
modules but represent the same source code function (for example, a
function included from the same source code file).
3. Some of such DISubprograms survive IR linking. It may happen if one
of them is inlined somewhere or if the functions that have these
DISubprograms attached have internal linkage.
4. Each of these DISubprograms has a local type that corresponds to the
same source code type. These types are initially from different modules,
but have the same ODR identifier.
If the same (in the sense of ODR identifier/ODR uniquing rules) local
type is present in two modules, and these modules are linked together,
the type gets uniqued. A DIType, that happens to be loaded first,
survives linking, and the references on other types with the same ODR
identifier from the modules loaded later are replaced with the
references on the DIType loaded first. Since defintion subprograms, in
scope of which these types are located, are not deduplicated, the linker
output may contain multiple DISubprogram's having the same (uniqued)
type in their retainedNodes lists.
Further compilation of such modules causes crashes.
To tackle that,
* previous solution to handle LTO linking with local types in
retainedNodes is removed (cloneLocalTypes() function),
* for each loaded distinct (definition) DISubprogram, its retainedNodes
list is scanned after loading, and DITypes with a scope of another
subprogram are removed. If something from a Function corresponding to
the DISubprogram references uniqued type, we rely on cross-CU links.
Additionally:
* a check is added to Verifier to report about local types located in a
wrong retainedNodes list,
Original commit message follows.
---------
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.
The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>
Previously, we use `std::map<BasicBlock *, unsigned> PredCount` to track
excess incoming blocks and removed them one by one using
`removeIncomingValue`.
Since `PredCount` use `BasicBlock *` as key, the iteration order depends
on the memory addresses of the blocks. With
`PHINode::removeIncomingValue()` changed to use the swapping strategy,
the order in which operands are removed affects the final order of the
remaining operands in the PHI node. This will cause non-determinism in
compiles.
This patch uses `PHINode::removeIncomingValueIf()` to remove invalid
incoming blocks that no longer go to `NewBB` block, fixes the
non-determinism.
- Use explicit types instead of auto when type is not obvious.
- Move local function out of anonymous namespace and make them static.
- Use structured bindings in some range for loops.
- Simplify PHI handling code a bit.
SROA and a few other facilities use generic-lambdas and some overloaded
functions to deal with both intrinsics and debug-records at the same time.
As part of stripping out intrinsic support, delete a swathe of this code
from things in the Utils directory.
This is a large diff, but is mostly about removing functions that were
duplicated during the migration to debug records. I've taken a few
opportunities to replace comments about "intrinsics" with "records",
and replace generic lambdas with plain lambdas (I believe this makes
it more readable).
All of this is chipping away at intrinsic-specific code until we get to
removing parts of findDbgUsers, which is the final boss -- we can't
remove that until almost everything else is gone.
This flag was used to let us incrementally introduce debug records
into LLVM, however everything is now using records. It serves no
purpose now, so delete it.
This change was separated from
https://github.com/llvm/llvm-project/pull/119001.
When cloning functions, use IdentityMDPredicate to ensure that if
DISubprogram is not cloned, then its DILocalVariables are not cloned
either.
This is currently expected to be an NFC, as DILocalVariables only
reference their subprograms (via DILocalScopes) and types, and inlined
DISubprograms and DITypes are not cloned. Thus, DILocalVariables are
mapped to self in ValueMapper (in mapTopLevelUniquedNode).
However, it will be needed for the original PR
https://github.com/llvm/llvm-project/pull/119001, where a
DILocalVariable may refer to a local type of a DISubprogram that is
being cloned. In that case, ValueMapper will clone DILocalVariable even
if it belongs to an inlined DISubprogram that is not cloned, which
should be avoided.
I'm making this change into a separate PR to make the original PR a bit
smaller, and because this has more to do with variables than with types.
Add:
mapAtomInstance - map the atom group number to a new group.
RemapSourceAtom - apply the mapped atom group number to this instruction.
Modify:
CloneBasicBlock - Call mapAtomInstance on cloned instruction's DebugLocs
if MapAtoms is true (default). Setting to false could
lead to a degraded debugging experience. See code comment.
Optimisations like loop unroll that duplicate instructions need to remap source
atom groups so that each duplicated source construct instance is considered
distinct when determining is_stmt locations.
This commit adds the remapping functionality and a unittest.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
During inlining, we may opportunistically simplify conditional branches
(incl. switches) to unconditional branches if, after inlining, their
destination is fixed. While we do this, we should propagate any
DILocation attached to the original branch to the simplified branch,
which this patch enables.
Found using https://github.com/llvm/llvm-project/pull/107279.
If a block with a single predecessor also had its address taken,
it was getting deleted in this post-inline cleanup step. This would
result in the blockaddress in the resulting function getting deleted
and replaced with inttoptr 1.
This fixes one bug required to permit inlining of functions with blockaddress
uses.
At the moment this is not testable (at least without an annoyingly complex
unit test), and is a pre-bug fix for future patches. Functions with
blockaddress uses are rejected in isInlineViable, so we don't get this far
with the current InlineFunction uses (some of the existing cases seem to
reproduce this part of the rejection logic, like PartialInliner). This
will be tested in a pending llvm-reduce change.
Prerequisite for #38908
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
Summary:
The new code should be functionally identical to the old one (but
faster). The reasoning is as follows.
In the old code when cloning within the module:
1. DIFinder traverses and collects *all* debug info reachable from a
function, its instructions, and its owning compile unit.
2. Then "compile units, types, other subprograms, and lexical blocks of
other subprograms" are saved in a set.
3. Then when we MapMetadata, we traverse the function's debug info
_again_ and those nodes that are in the set from p.2 are identity
mapped.
This looks equivalent to just doing step 3 with identity mapping based
on a predicate that says to identity map "compile units, types, other
subprograms, and lexical blocks of other subprograms" (same as in step
2). This is what the new code does.
Test Plan:
ninja check-all
There's a bunch of tests around cloning and all of them pass.
Summary:
We used the set only to check if it contains certain metadata nodes.
Replacing the set with a predicate makes the intention clearer and the
API more general.
Test Plan:
ninja check-all
Summary:
This should be behaviorally equivalent. DIFinder is only used when
cloning into a DifferentModule as part of llvm.dbg.cu update in
CloneFunctionInto.
Test Plan:
ninja check-llvm-unit check-llvm
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
Summary:
To avoid cloning module-level debug info (owned by the module rather
than the function), CloneFunction implementation used to eagerly
identity map such debug info into ValueMap's MD map. In larger modules
with meaningful volume of debug info this gets very expensive.
By passing such debug info metadata via an IdentityMD set for the
ValueMapper to map on first use, we get several benefits:
1. Mapping metadata is not cheap, particularly because of tracking. When
cloning a Function we identity map lots of global module-level
metadata to avoid cloning it, while only a fraction of it is actually
used by the function. Mapping on first use is a lot faster for
modules with meaningful amount of debug info.
2. Eagerly identity mapping metadata makes it harder to cache
module-level data (e.g. a set of metadata nodes in a \a DICompileUnit).
With this patch we can cache certain module-level metadata
calculations to speed things up further.
Anecdata from compiling a sample cpp file with full debug info shows that this moderately speeds up
CoroSplitPass which is one of the heavier users of cloning:
| | Baseline | IdentityMD set |
|-----------------|----------|----------------|
| CoroSplitPass | 306ms | 221ms |
| CoroCloner | 101ms | 72ms |
| Speed up | 1x | 1.4x |
Test Plan:
ninja check-llvm-unit
ninja check-llvm
Summary:
Previously, we'd add all SPs distinct from the cloned one into a set.
Then when cloning a local scope we'd check if it's from one of those
'distinct' SPs by checking if it's in the set. We don't need to do that.
We can just check against the cloned SP directly and drop the set.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
Moving the cloning of BBs after the metadata makes the flow of the
function a bit more straightforward and makes it easier to extract more
into helper functions.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
There was a single usage of CloneBasicBlock with non-default
DebugInfoFinder inside CloneFunctionInto which has been refactored in
more focused.
Test Plan:
ninja check-llvm-unit check-llvm
Summary:
Consolidate the logic in a single function. We do an extra pass over
Instructions but this is necessary to untangle things and extract
metadata cloning in a future diff.
Test Plan:
```
$ ninja check-llvm-unit check-llvm
[211/213] Running the LLVM regression tests
Testing Time: 106.06s
Total Discovered Tests: 62601
Skipped : 17 (0.03%)
Unsupported : 2518 (4.02%)
Passed : 59911 (95.70%)
Expectedly Failed: 155 (0.25%)
[212/213] Running lit suite
Testing Time: 12.47s
Total Discovered Tests: 8474
Skipped: 17 (0.20%)
Passed : 8457 (99.80%)
```
Extracted from #109032 (commit 3) (there are more refactors and cleanups
in subsequent commits)
This patch is a part of step-by-step refactoring of CloneFunctionInto.
The goal is to extract reusable pieces out of it that will be later used
to optimize function cloning e.g. in coroutine processing.
Extracted from #109032 (commit 2)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
We weren't flagging inlined callee functions with callsite but not
memprof metadata correctly, leading to the callsite metadata not being
stripped when that function was inlined into a callsite that didn't
itself have callsite metadata.
In practice, this meant that we went into the LTO link with many more
calls than necessary having callsite metadata / summary records, which
in turn made the graph larger than necessary.
Fixing this oversight resulted in huge reductions in the thin link of a
large target:
99% fewer duplicated context ids (recall we have to duplicate when
callsites containing the same stack ids are in different functions)
71% fewer graph edges
17% fewer graph nodes
13% fewer functions cloned
44% smaller peak memory
47% smaller time
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Performing `instSimplify` while cloning is unsafe due to incomplete
remapping (as reported in #87534). Ideally, `instSimplify` ought to
reason on the updated newly-cloned function, after returns have been
rewritten and callee entry basic block / call-site have been fixed up.
This is in contrast to `CloneAndPruneIntoFromInst` behaviour, which
is inherently expected to clone basic blocks, with pruning on top of
– if any –, and not actually fixing up returns / CFG, which should be
up to the Inliner. We may solve this by letting `instSimplify` work on
the newly-cloned function, while maintaining old function attributes,
so as to avoid inconsistencies between the yet-to-be-solved return
type, and new function ret type attributes.