lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.
However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.
* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.
I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.
This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)
In practice, I've only found a few places that currently produce
lifetimes on non-allocas:
* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
This is another prune of dead code -- we never generate debug intrinsics
nowadays, therefore there's no need for these codepaths to run.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
There are a few duplicate paths/facilities in the coroutine code to deal
with both intrinsics and debug-records; we can now delete the intrinsic
version.
https://github.com/llvm/llvm-project/pull/142551 started always dropping
lifetime markers after moving allocas on the frame, as these are not
useful on non-allocas but can cause issues.
However, this was not done for other ABIs (retcon, retcononce, async)
that go through a different code path. We should treat them the same
way.
Since the recent commit de3c8410d8, the `CoroSplit` pass adds `DILabel`s
to help find the location of a suspension point from its suspension
point id. Those labels are added in both `DebugEmissionKind::FullDebug`
and `DebugEmissionKind::LineTableOnly` mode. The idea was that this
information is necessary to reconstruct async stack traces and should
hence also be available for LineTableOnly.
Unfortunately, it turns out that the DWARF backend does not expect to
find any DILabel debug metadata if the emission kind is set to
LineTableOnly. The Dwarf backend simply runs into an assertion in that
case.
This commit fixes the issue by only adding `DILabel` for FullDebug.
When generating debug info for coroutine frames, nested struct types
were incorrectly inheriting the top-level function scope instead of
having their parent struct as scope. This caused assertion failures in
DebugInfoMetadata.h during member list replacement for complex nested
struct hierarchies.
Fix by passing the parent DIStruct as scope when recursively calling
solveDIType for nested struct fields, ensuring proper debug info scoping
hierarchy.
Add regression test that validates proper nested struct scoping
hierarchy and
prevents future regressions.
RFC on discourse:
https://discourse.llvm.org/t/rfc-debug-info-for-coroutine-suspension-locations-take-2/86606
With this commit, we add `DILabel` debug infos to the resume points of a
coroutine. Those labels can be used by debugging scripts to figure out
the exact line and column at which a coroutine was suspended by looking
up current `__coro_index` value inside the coroutines frame, and then
searching for the corresponding label inside the coroutine's resume
function.
The DWARF information generated for such a label looks like:
```
0x00000f71: DW_TAG_label
DW_AT_name ("__coro_resume_1")
DW_AT_decl_file ("generator-example.cpp")
DW_AT_decl_line (5)
DW_AT_decl_column (3)
DW_AT_artificial (true)
DW_AT_LLVM_coro_suspend_idx (0x01)
DW_AT_low_pc (0x00000000000019be)
```
The labels can be mapped to their corresponding `__coro_idx` values
either via their naming convention `__coro_resume_<N>` or using the new
`DW_AT_LLVM_coro_suspend_idx` attribute. In gdb, those line numebrs can
be looked up using `info line -function my_coroutine -label
__coro_resume_1`. LLDB unfortunately does not understand DW_TAG_label
debug information, yet.
Given this is an artificial compiler-generated label, I did apply the
DW_AT_artificial tag to it. The DWARFv5 standard only allows that tag on
type and variable definitions, but this is a natural extension and was
also blessed in the RFC on discourse.
Also, this commit adds `DW_AT_decl_column` to labels, not only for
coroutines but also for normal C and C++ labels. While not strictly
necessary, I am doing so now because it would be harder to do so later
without breaking the binary LLVM-IR format
Drive-by fixes: While reading the existing test cases to understand how
to write my own test case, I did a couple of small typo fixes and
comment improvements
For the checks whether certain intrinsics are used, work with intrinsic
IDs instead of intrinsic names.
This also exposes that some of the checks were incorrect, because the
intrinsics were overloaded. There is no efficient way to determine
whether these are used. This change explicitly documents which
intrinsics are not checked for this reason.
So far, the `DW_AT_linkage_name` of the coroutine `resume`, `destroy`,
`cleanup` and `noalloc` function clones were incorrectly set to the
original function name instead of the updated function names.
With this commit, we now update the `DW_AT_linkage_name` to the correct
name. This has multiple benefits:
1. it's easier for me (and other toolchain developers) to understand the
output of `llvm-dwarf-dump` when coroutines are involved.
2. When hitting a breakpoint, both LLDB and GDB now tell you which clone
of the function you are in. E.g., GDB now prints "Breakpoint 1.2,
coro_func(int) [clone .resume] (v=43) at ..." instead of "Breakpoint
1.2, coro_func(int) (v=43) at ...".
3. GDB's `info line coro_func` command now allows you to distinguish the
multiple different clones of the function.
In Swift, the linkage names of the clones were already updated. The
comment right above the relevant code in `CoroSplit.cpp` already hinted
that the linkage name should probably also be updated in C++. This
comment was added in commit 6ce76ff7eb7640, and back then the
corresponding `DW_AT_specification` (i.e., `SP->getDeclaration()`) was
not updated, yet, which led to problems for C++. In the meantime, commit
ca1a5b37c7236d added code to also update `SP->getDeclaration`, as such
there is no reason anymore to not update the linkage name for C++.
Note that most test cases used inconsistent function names for the LLVM
function vs. the DISubprogram linkage name. clang would never emit such
LLVM IR. This confused me initially, and hence I fixed it while updating
the test case.
Drive-by fix: The change in `CGVTables.cpp` is purely stylistic, NFC.
When looking for other usages of `replaceWithDistinct`, I got initially
confused because `CGVTables.cpp` was calling a static function via an
object instance.
This makes the code flow when reading the LLVM IR of a split coroutine a
bit more natural. It does not change anything from an end-user
perspective but makes debugging the CoroSplit pass slightly easier.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Transforms`
library. These annotations currently have no meaningful impact on the
LLVM build; however, they are a prerequisite to support an LLVM Windows
DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Removed a redundant `operator<<` from Attributor.h. IDS only
auto-annotates the 1st declaration, and the 2nd declaration being
un-annotated resulted in an "inconsistent linkage" error on Windows when
building LLVM as a DLL.
- `#include` the `VirtualFileSystem.h` in PGOInstrumentation.h and
remove the local declaration of the `vfs::FileSystem` class. This is
required because exporting the `PGOInstrumentationUse` constructor
requires the class be fully defined because it is used by an argument.
- Add #include "llvm/Support/Compiler.h" to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Start removing debug intrinsics support -- starting with the flag that
controls production of their replacement, debug records. This patch
removes the command-line-flag and with it the ability to switch back to
intrinsics. The module / function / block level "IsNewDbgInfoFormat"
flags get hardcoded to true, I'll to incrementally remove things that
depend on those flags.
If the control flow between `lifetime.start` and `lifetime.end` is too
complex, it is acceptable to give up the optimization opportunity and
collect the alloca to the frame. However, storing to the frame will
lengthen the lifetime of the alloca, and the sanitizer will complain. I
propose we always erase lifetime intrinsics of spilled allocas.
Fix#124612
---------
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
This change makes the C++ noop coroutine to be built with attributes
based on the module flags. Among other things, this adds function
attributes related to the Pointer Authentication and Branch Target
Enforcement module flags, which are set by command-line options.
Before this patch, C++ programs built with either PAC-RET or BTI that
had the noop coroutine emitted could have a runtime error because this
function wasn't compatible with the rest of the program.
Currently coroutine promises are modeled as allocas. This is problematic
because other middle-end passes will assume promise dead after coroutine
suspend, leading to misoptimizations.
I propose the front ends remain free to emit and use allocas to model
coro promise. At CoroEarly, we will replace all uses of promise alloca
with `coro.promise`. Non coroutine passes should only access promise
through `coro.promise`. Then at CoroSplit, we will lower `coro.promise`
back to promise alloca again. So that it will be correctly collected
into coro frame. Note that we do not have to bother maintainers of other
middle-end passes.
Fix#105595
It was excluded from spilling in a263a60, possibly by accident.
In the linked bug, we hit a situation like this:
```
%s = call @llvm.coro.suspend(...)
|
switch (%s)
case v1: / \ case v2:
... ...
| suspend point
| ...
\ /
%x = phi [v1] [v2]
|
...
|
use(%x)
```
Instcombine will notice that %x correlates exactly with %s, and so
use(%x) becomes use(%s).
However, corosplit would substitute different values for %s when
splitting the function, so even though %s had a particular value when
control actually passed through the switch, it could have a *different*
value when reaching use(%s).
This illustrates that while IR from the frontend typically does not use
these suspend return values across suspend points, mid-level
optimizations on the presplit coroutine may introduce new uses of
suspend values, so they must be considered eligible to spill to the
coroutine frame.
Fixes: #130326
Summary:
This cleans up the now unnecessary debug info collection in CoroSplit.
This makes CoroSplit pass almost as fast with -g2 as it is with -g1 on
the sample cpp file used with other parts of this stack:
| | Baseline | IdentityMD set | Prebuilt CommonDI | MetadataPred (cur) |
|-----------------|----------|----------------|-------------------|--------------------|
| CoroSplitPass | 306ms | 221ms | 68ms | 3.8ms |
| CoroCloner | 101ms | 72ms | 0.5ms | 0.5ms |
| CollectCommonDI | - | - | 63ms | - |
| Speed up | 1x | 1.4x | 4.5x | 80x |
Test Plan:
ninja check-all
Summary:
CloneFunctionInto now is fast on its own and we don't need to use
CloneFunctionAttributes/Metadata/Body separately.
CommonDebugInfo in CoroClone is now unused and is cleaned up separately
in the next diff in the stack.
Test Plan:
ninja check-all
Summary:
We used the set only to check if it contains certain metadata nodes.
Replacing the set with a predicate makes the intention clearer and the
API more general.
Test Plan:
ninja check-all
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
As a special exception, DIBuilder::insertDeclare() keeps a separate
overload accepting a BasicBlock *InsertAtEnd. This is because despite
the name, this method does not insert at the end of the block, therefore
this cannot be handled implicitly by using InsertPosition.
After #124287 updated several functions to return iterators rather than
Instruction *, it was no longer straightforward to pass their result to
DIBuilder. This commit updates DIBuilder methods to accept an
InsertPosition instead, so that they can be called with an iterator
(preferred), or with a deprecation warning an Instruction *, or a
BasicBlock *. This commit also updates the existing calls to the
DIBuilder methods to pass in iterators.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. The
call-sites updated in this patch are those where the dyn_cast_or_null cast
utility doesn't compose well with iterator insertion. It can distinguish
between nullptr and a "present" (non-null) Instruction pointer, but not
between a legal and illegal instruction iterator. This can lead to
end-iterator dereferences and thus crashes.
We can improve this in the future (as parent-pointers can now be accessed
from ilist nodes), but for the moment, add explicit tests for end()
iterators at the five call sites affected by this.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.
Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Summary:
CoroCloner, by calling into CloneFunctionInto, does a lot of repeated
work priming DIFinder and building a list of common module-level debug
info metadata. For programs compiled with full debug info this can get
very expensive.
This diff builds the data once and shares it between all clones.
Anecdata for a sample cpp source file compiled with full debug info:
| | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) |
|-----------------|----------|----------------|--------------------------|
| CoroSplitPass | 306ms | 221ms | 68ms |
| CoroCloner | 101ms | 72ms | 0.5ms |
| CollectCommonDI | - | - | 63ms |
| Speed up | 1x | 1.4x | 4.5x |
Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone.
Test Plan:
ninja check-llvm-unit
ninja check-llvm
Compiled a sample internal source file, checked time trace output for scope timings.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
Removes `llvm.coro.async.store_resume` from the list of coroutine
intrinsics. This is not a valid intrinsic name, and was likely added by
mistake with [this](https://reviews.llvm.org/D90612) change.
* Move CoroCloner to its own header. For now, the header is located in llvm/lib/Transforms/Coroutines
* Change private to protected to allow inheritance
* Create CoroSwitchCloner and move some of the switch specific code into this cloner. More code will follow in later commits.
The previous fix
c6414970d7
failed to consider the fact that the call graph update doesn't make any
sense if the caller node hasn't been populated in the LazyCallGraph yet.
This patch changes to skip this CG update step when that happens.
Use helpers such as llvm::enumerate and llvm::zip in places to avoid
using loop counters. Also refactored AnyRetconABI::splitCoroutine to
avoid some awkward indexing that came about by putting ContinuationPhi
into the ReturnPHIs vector and mistaking i with I and e with E.