This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.
Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.
`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.
The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.
The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.
When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.
From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.
After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.
The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.
The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.
And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.
In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.
So everything looks fine.
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.
And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.
In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.
So everything looks fine.
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
This is re-land of #90310 after making asan skip pre-split coroutines in
#99415.
Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
This makes codegen for array initialization simpler in two ways:
1. Drop the zero-index GEP at the start, which is no longer needed with
opaque pointers.
2. Emit GEPs directly to the correct element, instead of having a long
chain of +1 GEPs. This is more canonical, and also avoids regressions in
unoptimized builds from #93823.
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.
This is another try of https://reviews.llvm.org/D112921.
The original commit cf5a8b4 was reverted by 2e5035a due to some
failures (see #83774).
Fixes#60061
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.
Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.
This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.
Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.
Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
This change is incorrect when thinlto and asan are enabled, and this can
be observed by adding `-fsanitize=address` to the provided
coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot
handle non static allocas yet", and ASan introduces a dynamic alloca.
In other words, we must preserve the invariant that CoroSplit runs
before ASan. If we move CoroSplit to the post post-link compile stage,
ASan has to be moved to the post-link compile stage first. It would
also be correct to make CoroSplit handle dynamic allocas so the pass
ordering doesn't matter, but sanitizer instrumentation really ought to
be last, after coroutine splitting.
This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7.
This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec.
This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab.
This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0.
This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695.
This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.
Debug-info metadata does not have a strictly defined order. Check that
elements are linked to each other correctly, not that metadata appears
in a particular order.
Previous fix#90549 didn't completely address the Buildbot failures.
Some target may not recognize the target triple. This time, only run the
test under x86_64-linux.
Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
Latest diff:
f1ab4c2677..adf9bc902b
We address two additional bugs here:
### Problem 1: Deactivated normal cleanup still runs, leading to
double-free
Consider the following:
```cpp
struct A { };
struct B { B(const A&); };
struct S {
A a;
B b;
};
int AcceptS(S s);
void Accept2(int x, int y);
void Test() {
Accept2(AcceptS({.a = A{}, .b = A{}}), ({ return; 0; }));
}
```
We add cleanups as follows:
1. push dtor for field `S::a`
2. push dtor for temp `A{}` (used by ` B(const A&)` in `.b = A{}`)
3. push dtor for field `S::b`
4. Deactivate 3 `S::b`-> This pops the cleanup.
5. Deactivate 1 `S::a` -> Does not pop the cleanup as *2* is top. Should
create _active flag_!!
6. push dtor for `~S()`.
7. ...
It is important to deactivate **5** using active flags. Without the
active flags, the `return` will fallthrough it and would run both `~S()`
and dtor `S::a` leading to **double free** of `~A()`.
In this patch, we unconditionally emit active flags while deactivating
normal cleanups. These flags are deleted later by the `AllocaTracker` if
the cleanup is not emitted.
### Problem 2: Missing cleanup for conditional lifetime extension
We push 2 cleanups for lifetime-extended cleanup. The first cleanup is
useful if we exit from the middle of the expression (stmt-expr/coro
suspensions). This is deactivated after full-expr, and a new cleanup is
pushed, extending the lifetime of the temporaries (to the scope of the
reference being initialized).
If this lifetime extension happens to be conditional, then we use active
flags to remember whether the branch was taken and if the object was
initialized.
Previously, we used a **single** active flag, which was used by both
cleanups. This is wrong because the first cleanup will be forced to
deactivate after the full-expr and therefore this **active** flag will
always be **inactive**. The dtor for the lifetime extended entity would
not run as it always sees an **inactive** flag.
In this patch, we solve this using two separate active flags for both
cleanups. Both of them are activated if the conditional branch is taken,
but only one of them is deactivated after the full-expr.
---
Fixes https://github.com/llvm/llvm-project/issues/63818
Fixes https://github.com/llvm/llvm-project/issues/88478
---
Previous PR logs:
1. https://github.com/llvm/llvm-project/pull/85398
2. https://github.com/llvm/llvm-project/pull/88670
3. https://github.com/llvm/llvm-project/pull/88751
4. https://github.com/llvm/llvm-project/pull/88884
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.
This is another try of https://reviews.llvm.org/D112921.
Fixes#60061
The original change caused widespread breakages in msan/ubsan tests and
causes `use-after-free`. Most likely we are adding more cleanups than
necessary.
Fixes https://github.com/llvm/llvm-project/issues/63818 for control flow
out of an expressions.
#### Background
A control flow could happen in the middle of an expression due to
stmt-expr and coroutine suspensions.
Due to branch-in-expr, we missed running cleanups for the temporaries
constructed in the expression before the branch.
Previously, these cleanups were only added as `EHCleanup` during the
expression and as normal expression after the full expression.
Examples of such deferred cleanups include:
`ParenList/InitList`: Cleanups for fields are performed by the
destructor of the object being constructed.
`Array init`: Cleanup for elements of an array is included in the array
cleanup.
`Lifetime-extended temporaries`: reference-binding temporaries in
braced-init are lifetime extended to the parent scope.
`Lambda capture init`: init in the lambda capture list is destroyed by
the lambda object.
---
#### In this PR
In this PR, we change some of the `EHCleanups` cleanups to
`NormalAndEHCleanups` to make sure these are emitted when we see a
branch inside an expression (through statement expressions or coroutine
suspensions).
These are supposed to be deactivated after full expression and destroyed
later as part of the destructor of the aggregate or array being
constructed. To simplify deactivating cleanups, we add two utilities as
well:
* `DeferredDeactivationCleanupStack`: A stack to remember cleanups with
deferred deactivation.
* `CleanupDeactivationScope`: RAII for deactivating cleanups added to
the above stack.
---
#### Deactivating normal cleanups
These were previously `EHCleanups` and not `Normal` and **deactivation**
of **required** `Normal` cleanups had some bugs. These specifically
include deactivating `Normal` cleanups which are not the top of
`EHStack`
[source1](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L1319)),
[2](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L722-L746)).
This has not been part of our test suite (maybe it was never required
before statement expressions). In this PR, we also fix the emission of
required-deactivated-normal cleanups.
The old code used isInstructionTriviallyDead() and removed instructions
when walking the path from a resume call to function return to check if
the call is in tail position.
However, since the code was walking forwards it was not able to get past
instructions such as:
%gep = getelementptr inbounds i64, ptr %alloc.var, i32 0
%foo = ptrtoint ptr %gep to i64
This patch instead ignores such instructions as long as their values are
not needed. This enables the code to emit tail calls in more situations.
Implement `llvm.coro.await.suspend` intrinsics, to deal with performance
regression after prohibiting `.await_suspend` inlining, as suggested in
#64945.
Actually, there are three new intrinsics, which directly correspond to
each of three forms of `await_suspend`:
```
void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
```
There are three different versions instead of one, because in `bool`
case it's result is used for resuming via a branch, and in
`coroutine_handle` case exceptions from `await_suspend` are handled in
the coroutine, and exceptions from the subsequent `.resume()` are
propagated to the caller.
Await-suspend block is simplified down to intrinsic calls only, for
example for symmetric transfer:
```
%id = call token @llvm.coro.save(ptr null)
%handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
call void @llvm.coro.resume(%handle)
%result = call i8 @llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...
```
All await-suspend logic is moved out into a wrapper function, generated
for each suspension point.
The signature of the function is `<type> wrapperFunction(ptr %awaiter,
ptr %frame)` where `<type>` is one of `void` `i1` or `ptr`, depending on
the return type of `await_suspend`.
Intrinsic calls are lowered during `CoroSplit` pass, right after the
split.
Because I'm new to LLVM, I'm not sure if the helper function generation,
calls to them and lowering are implemented in the right way, especially
with regard to various metadata and attributes, i. e. for TBAA. All
things that seemed questionable are marked with `FIXME` comments.
There is another detail: in case of symmetric transfer raw pointer to
the frame of coroutine, that should be resumed, is returned from the
helper function and a direct call to `@llvm.coro.resume` is generated.
C++ standard demands, that `.resume()` method is evaluated. Not sure how
important is this, because code has been generated in the same way
before, sans helper function.
Set the writable and dead_on_unwind attributes for sret arguments. These
indicate that the argument points to writable memory (and it's legal to
introduce spurious writes to it on entry to the function) and that the
argument memory will not be used if the call unwinds.
This enables additional MemCpyOpt/DSE/LICM optimizations.
Previously we were not properly skipping the generation of the `try { }`
block around the `init_suspend.await_resume()` if the `await_resume` is
not returning void. The reason being that the resume expression was
wrapped in a `CXXBindTemporaryExpr` and the first dyn_cast failed,
silently ignoring the noexcept. This only mattered for `init_suspend`
because it had its own try block.
This patch changes to first extract the sub expression when we see a
`CXXBindTemporaryExpr`. Then perform the same logic to check for
`noexcept`.
Another version of this patch also wanted to assert the second step by
`cast<CXXMemberCallExpr>` and as far as I understand it should be a
valid assumption. I can change to that if upstream prefers.
This change aims to fix an ICE in issue
https://github.com/llvm/llvm-project/issues/63803
The crash happens in `ExitCXXTryStmt` because `EmitAnyExpr()` adds
additional cleanup to the `EHScopeStack`. This messes up the assumption
in `ExitCXXTryStmt` that the top of the stack should be a
`EHCatchScope`.
However, since we never read a value returned from `await_resume()` of
an init suspend, we can skip the part that builds this `RValue`.
The code here may not be in the best shape. There's another bug that
`memberCallExpressionCanThrow` doesn't work on the current Expr due to
type mismatch. I am preparing a separate PR to address it plus some
refactoring might be beneficial.
Close https://github.com/llvm/llvm-project/issues/56980.
This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.
The rationale behind the patch is simple. See the example:
```C++
A foo() {
dtor d;
co_await something();
dtor d1;
co_await something();
dtor d2;
co_return 43;
}
```
Generally the generated .destroy function may be:
```C++
void foo.destroy(foo.Frame *frame) {
switch(frame->suspend_index()) {
case 1:
frame->d.~dtor();
break;
case 2:
frame->d.~dtor();
frame->d1.~dtor();
break;
case 3:
frame->d.~dtor();
frame->d1.~dtor();
frame->d2.~dtor();
break;
default: // coroutine completed or haven't started
break;
}
frame->promise.~promise_type();
delete frame;
}
```
Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.
However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:
```C++
void foo.destroy(foo.Frame *frame) {
frame->promise.~promise_type();
delete frame;
}
```
I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
Link: https://lists.llvm.org/pipermail/cfe-dev/2021-August/068740.html ("[Exception Handling] Could we mark __cxa_end_catch as nounwind conditionally?"
Link: https://github.com/llvm/llvm-project/issues/57375
A catch handler calls `__cxa_begin_catch` and `__cxa_end_catch`. For a catch-all
clause or a catch clause matching a record type, we:
* assume that the exception object may have a throwing destructor
* emit `invoke void @__cxa_end_catch` (as the call is not marked as the `nounwind` attribute).
* emit a landing pad to destroy local variables and call `_Unwind_Resume`
```
struct A { ~A(); };
struct B { int x; };
void opaque();
void foo() {
A a;
try { opaque(); } catch (...) { } // the exception object has an unknown type and may throw
try { opaque(); } catch (B b) { } // B::~B is nothrow, but we do not utilize this
}
```
Per C++ [dcl.fct.def.coroutine], a coroutine's function body implies a `catch (...)`.
Our code generation pessimizes even simple code, like:
```
UserFacing foo() {
A a;
opaque();
co_return;
// For `invoke void @__cxa_end_catch()`, the landing pad destroys the
// promise_type and deletes the coro frame.
}
```
Throwing destructors are typically discouraged. In many environments, the
destructors of exception objects are guaranteed to never throw, making our
conservative code generation approach seem wasteful.
Furthermore, throwing destructors tend not to work well in practice:
* GCC does not emit call site records for the region containing `__cxa_end_catch`. This has been a long time, since 2000.
* If a catch-all clause catches an exception object that throws, both GCC and Clang using libstdc++ leak the allocated exception object.
To avoid code generation pessimization, add an opt-in driver option
-fassume-nothrow-exception-dtor to assume that `__cxa_end_catch` calls have the
`nounwind` attribute. This implies that thrown exception objects' destructors
will never throw.
To detect misuses, diagnose throw expressions with a potentially-throwing
destructor. Technically, it is possible that a potentially-throwing destructor
never throws when called transitively by `__cxa_end_catch`, but these cases seem
rare enough to justify a relaxed mode.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D108905
When dealing with short-circuiting coroutines (e.g. expected), the
deferred calls that resolve the get_return_object are currently being
emitted after we delete the coroutine frame.
This was caught by ASAN when using optimizations -O1 and above:
optimizations after inlining would place the __coro_gro in the heap, and
subsequent delete of the coroframe followed by the conversion -> BOOM.
This patch forbids the GRO to be placed in the coroutine frame, by
adding a new metadata node that can be attached to `alloca`
instructions.
Fix#49843
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.
Reviewed By: rnk, aaron.ballman, #libc, ldionne, Mordante, MaskRay
Differential Revision: https://reviews.llvm.org/D112921
Close https://github.com/llvm/llvm-project/issues/65054
The direct issue is still the call to coroutine_handle<>::address()
after await_suspend(). Without optimizations, the current logic will put
the temporary result of await_suspend() to the coroutine frame since the
middle end feel the temporary is escaped from
coroutine_handle<>::address. To fix this fundamentally, we should wrap
the whole logic about await-suspend into a standalone function. See
https://github.com/llvm/llvm-project/issues/64945
And as a short-term workaround, we probably can mark
coroutine_handle<>::address() as always-inline so that the temporary
result may not be thought to be escaped then it won't be put on the
coroutine frame. Although it looks dirty, it is probably do-able since
the compiler are allowed to do special tricks to standard library
components.
The original patch is incorrect since it marks too many calls to be
noinline. It shows that it is bad to do analysis in the frontend again.
This patch tries to mark the await_suspend function as noinlne only.
---
Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151
Close https://github.com/llvm/llvm-project/issues/65018
See the summary and the discussion of
https://reviews.llvm.org/D157070
to get the full context.
As @rjmccall pointed out, the key point of the root cause is that
currently we didn't implement the semantics for '@llvm.coro.save'
well ("after the await-ready returns false, the coroutine is considered
to be suspended ") well.
Since the semantics implies that we (the compiler) shouldn't write
the spills into the coroutine frame in the await_suspend. But now it is
possible due to some combinations of the optimizations so the semantics are
broken. And the inlining is the root optimization of such optimizations.
So in this patch, we tried to add the `noinline` attribute to the
await_suspend function.
This looks slightly problematic since the users are able to call the
await_suspend function standalone. This is limited by the
implementation. On the one hand, we don't want the workaround solution
(See the proposed solution later) to be too complex. On the other hand,
it is rare to call await_suspend standalone. Also it is not semantically
incorrect to do so since the inlining is not part of the C++ standard.
Also as an optimization, we don't add the `noinline` attribute to
the await_suspend function if the awaiter is an empty class. This should be
correct since the programmers can't access the local variables in
await_suspend if the awaiter is empty. I think this is necessary for
the performance since it is pretty common.
The long term solution is:
call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
ptr @awaitSuspendFn)
Then it is much easier to perform the safety analysis in the middle
end. If it is safe to inline the call to awaitSuspend, we can replace it
in the CoroEarly pass. Otherwise we could replace it in the CoroSplit
pass.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D157833
This reverts commit 9d9c25f81456aace2bec4b58498a420e650007d9.
This reverts commit 19ab2664ad3182ffa8fe3a95bb19765e4ae84653.
This reverts commit c4672454743e942f148a1aff1e809dae73e464f6.
As the issue https://github.com/llvm/llvm-project/issues/65018 shows,
the previous fix introduce a regression actually. So this commit reverts
the fix by our policies.
Address https://github.com/llvm/llvm-project/issues/64933 and partially
https://github.com/llvm/llvm-project/issues/64945.
After c467245, we will add a noinline attribute to the await_suspend
member function of an awaiter if the awaiter has any non static member
functions.
Obviously, this decision will bring some performance regressions. And
people may complain about this while the long term solution may not be
available soon. In such cases, it is better to provide a solution for
the users who met the regression surprisingly.
Also it is natural to not prevent the inlining if the function is marked
as always_inline by the users already.
Close https://github.com/llvm/llvm-project/issues/59723.
The fundamental cause of the above issue is that we assumed the memory
of coroutine frame can be released by stack unwinding automatically
if the allocation of the coroutine frame is elided. But we missed one
point: the stack unwinding has different semantics with the explicit
coroutine_handle<>::destroy(). Since the latter is explicit so it shows
the intention of the user. So we can blame the user to destroy the
coroutine frame incorrectly in case of use-after-free happens. But we
can't do so with stack unwinding.
So after this patch, we won't think the exceptional terminator don't
leak the coroutine handle unconditionally. Instead, we think the
exceptional terminator will leak the coroutine handle too if the
coroutine is leaked somewhere along the search path.
Concretely for C++, we can think the exceptional terminator is not
special any more. Maybe this may cause some performance regressions.
But I've tested the motivating example (std::generator). And on the
other side, the coroutine elision is a middle end opitmization and not
a language feature. So we don't think we should blame such regressions
especially we are correcting the miscompilations.
Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151
See the summary and the discussion of https://reviews.llvm.org/D157070
to get the full context.
As @rjmccall pointed out, the key point of the root cause is that
currently we didn't implement the semantics for '@llvm.coro.save' well
("after the await-ready returns false, the coroutine is considered to be
suspended ") well.
Since the semantics implies that we (the compiler) shouldn't write the
spills into the coroutine frame in the await_suspend. But now it is possible
due to some combinations of the optimizations so the semantics are
broken. And the inlining is the root optimization of such optimizations.
So in this patch, we tried to add the `noinline` attribute to the
await_suspend call.
Also as an optimization, we don't add the `noinline` attribute to the
await_suspend call if the awaiter is an empty class. This should be
correct since the programmers can't access the local variables in
await_suspend if the awaiter is empty. I think this is necessary for the
performance since it is pretty common.
Another potential optimization is:
call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
ptr @awaitSuspendFn)
Then it is much easier to perform the safety analysis in the middle
end.
If it is safe to inline the call to awaitSuspend, we can replace it
in the CoroEarly pass. Otherwise we could replace it in the CoroSplit
pass.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D157833
In case of 'get_return_object_on_allocation_failure' get declared, the
compiler is required to call 'operator new(size_t, nothrow_t)' and the
handle the failure case by calling
'get_return_object_on_allocation_failure()'. But the failure case should
be rare and we can assume the allocation is successful and pass the
information to the optimizer.
The insertSpills() code will currently skip lifetime intrinsic users
when replacing the alloca with a frame reference. Rather than
leaving behind the dead lifetime intrinsics working on the old
alloca, directly remove them. This makes sure the alloca can be
dropped as well.
I noticed this as a regression when converting tests to opaque
pointers. Without opaque pointers, this code didn't really do
anything, because there would usually be a bitcast in between.
The lifetimes would get rewritten to the frame pointer. With
opaque pointers, this code now triggers and leaves behind users
of the old allocas.
Differential Revision: https://reviews.llvm.org/D148240