195 Commits

Author SHA1 Message Date
Yuxuan Chen
e1b40dc063
[Clang] Propagate elide safe context through [[clang::coro_await_elidable_argument]] (#108474) 2024-09-17 22:58:21 -07:00
Yuxuan Chen
e17a39bc31
[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282)
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
  co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.

`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.

The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.

When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.

From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.

The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
2024-09-08 23:08:58 -07:00
Chuanqi Xu
07514fa9b6 [Coroutines] Salvage the debug information for coroutine frames within optimizations
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.

And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.

In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.

So everything looks fine.
2024-08-28 17:02:12 +08:00
Dmitri Gribenko
f709cd5add Revert "[Coroutines] Salvage the debug information for coroutine frames within optimizations"
This reverts commit 522c253f47ea27d8eeb759e06f8749092b1de71e.

This series of commits causes Clang crashes. The reproducer is posted on
08a0dece2b.
2024-08-21 23:49:45 +02:00
Chuanqi Xu
522c253f47 [Coroutines] Salvage the debug information for coroutine frames within optimizations
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.

And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.

In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.

So everything looks fine.
2024-08-20 17:21:43 +08:00
Hari Limaye
94473f4db6
[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.
2024-08-09 13:25:04 +01:00
Dmitri Gribenko
533a22941e [clang][test] Write temporary files to %t
The issue was introduced in
3a9ef4e69a.
2024-07-30 10:26:31 +02:00
Wei Wang
3a9ef4e69a
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#100205)
This is re-land of #90310 after making asan skip pre-split coroutines in
#99415.

Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-07-29 17:42:01 -07:00
Nikita Popov
12d24e0c56
[CodeGen] Simplify codegen for array initialization (#93956)
This makes codegen for array initialization simpler in two ways:
1. Drop the zero-index GEP at the start, which is no longer needed with
opaque pointers.
2. Emit GEPs directly to the correct element, instead of having a long
chain of +1 GEPs. This is more canonical, and also avoids regressions in
unoptimized builds from #93823.
2024-06-10 09:19:55 +02:00
Pengcheng Wang
130e93cc26
Reland "[clang] Enable sized deallocation by default in C++14 onwards" (#90373)
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

This is another try of https://reviews.llvm.org/D112921.

The original commit cf5a8b4 was reverted by 2e5035a due to some
failures (see #83774).

Fixes #60061
2024-05-22 12:37:27 +08:00
Hans
3bb39690d7
[coro] Lower llvm.coro.await.suspend.handle to resume with tail call (#89751)
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.

Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.

This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.

Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.

Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
2024-05-15 15:29:08 +02:00
Reid Kleckner
aa0776de46 Revert "[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)" and related patches
This change is incorrect when thinlto and asan are enabled, and this can
be observed by adding `-fsanitize=address` to the provided
coro-elide-thinlto.cpp test. It results in the error "Coroutines cannot
handle non static allocas yet", and ASan introduces a dynamic alloca.

In other words, we must preserve the invariant that CoroSplit runs
before ASan. If we move CoroSplit to the post post-link compile stage,
ASan has to be moved to the post-link compile stage first.  It would
also be correct to make CoroSplit handle dynamic allocas so the pass
ordering doesn't matter, but sanitizer instrumentation really ought to
be last, after coroutine splitting.

This reverts commit bafc5f42c0132171287d7cba7f5c14459be1f7b7.
This reverts commit b1b1bfa7bea0ce489b5ea9134e17a43c695df5ec.
This reverts commit 0232b77e145577ab78e3ed1fdbb7eacc5a7381ab.
This reverts commit fb2d3056618e3d03ba9a695627c7b002458e59f0.
This reverts commit 1cb33713910501c6352d0eb2a15b7a15e6e18695.
This reverts commit cd68d7b3c0ebf6da5e235cfabd5e6381737eb7fe.
2024-05-10 21:28:13 +00:00
Paul T Robinson
3ceacd8b95
[Coro] Relax a debug-info test (#91401)
Debug-info metadata does not have a strictly defined order. Check that
elements are linked to each other correctly, not that metadata appears
in a particular order.
2024-05-08 06:37:24 -07:00
Fangrui Song
7c1d9b15ee [test] %clang_cc1: remove redundant actions 2024-05-04 23:08:11 -07:00
Fangrui Song
0d501f38f3 [test] %clang_cc1 -emit-llvm: remove redundant -S
Also replace aarch64-none-linux-gnu (none can indicate an OS as well) with aarch64
2024-05-04 17:15:51 -07:00
Fangrui Song
c5de4dd1ea [test] %clang_cc1 -emit-llvm: remove redundant -S
And replace -emit-llvm -o - with -emit-llvm-only
2024-05-04 17:00:29 -07:00
Wei Wang
b1b1bfa7be
[Coroutines][Test] Only run coro-elide-thinlto under x86_64-linux (#90672)
Previous fix #90549 didn't completely address the Buildbot failures.
Some target may not recognize the target triple. This time, only run the
test under x86_64-linux.
2024-04-30 18:08:40 -07:00
Wei Wang
0232b77e14
[Coroutines][Test] Specify target triple in coro-elide-thinlto (#90549)
Resolve test failure on non-x86 linux host
2024-04-30 14:31:31 -07:00
Danial Klimkin
fb2d305661
Fix output in coro-elide-thinlto.cpp (#90579)
Current dir can be read-only. Use a temp path instead.
2024-04-30 11:42:13 +02:00
David Blaikie
1cb3371391 Ensure test writes objects to test temp dir 2024-04-29 23:50:18 +00:00
Wei Wang
cd68d7b3c0
[Pipelines] Do not run CoroSplit and CoroCleanup in LTO pre-link pipeline (#90310)
Skip CoroSplit and CoroCleanup in LTO pre-link pipeline so that
CoroElide can happen after callee coroutine is imported into caller's
module in ThinLTO.
2024-04-29 10:24:53 -07:00
Utkarsh Saxena
d72146f471
Re-apply "Emit missing cleanups for stmt-expr" and other commits (#89154)
Latest diff:
f1ab4c2677..adf9bc902b

We address two additional bugs here: 

### Problem 1: Deactivated normal cleanup still runs, leading to
double-free
Consider the following:
```cpp

struct A { };

struct B { B(const A&); };

struct S {
  A a;
  B b;
};

int AcceptS(S s);

void Accept2(int x, int y);

void Test() {
  Accept2(AcceptS({.a = A{}, .b = A{}}), ({ return; 0; }));
}
```
We add cleanups as follows:
1. push dtor for field `S::a`
2. push dtor for temp `A{}` (used by ` B(const A&)` in `.b = A{}`)
3. push dtor for field `S::b`
4. Deactivate 3 `S::b`-> This pops the cleanup.
5. Deactivate 1 `S::a` -> Does not pop the cleanup as *2* is top. Should
create _active flag_!!
6. push dtor for `~S()`.
7. ...

It is important to deactivate **5** using active flags. Without the
active flags, the `return` will fallthrough it and would run both `~S()`
and dtor `S::a` leading to **double free** of `~A()`.
In this patch, we unconditionally emit active flags while deactivating
normal cleanups. These flags are deleted later by the `AllocaTracker` if
the cleanup is not emitted.

### Problem 2: Missing cleanup for conditional lifetime extension
We push 2 cleanups for lifetime-extended cleanup. The first cleanup is
useful if we exit from the middle of the expression (stmt-expr/coro
suspensions). This is deactivated after full-expr, and a new cleanup is
pushed, extending the lifetime of the temporaries (to the scope of the
reference being initialized).
If this lifetime extension happens to be conditional, then we use active
flags to remember whether the branch was taken and if the object was
initialized.
Previously, we used a **single** active flag, which was used by both
cleanups. This is wrong because the first cleanup will be forced to
deactivate after the full-expr and therefore this **active** flag will
always be **inactive**. The dtor for the lifetime extended entity would
not run as it always sees an **inactive** flag.

In this patch, we solve this using two separate active flags for both
cleanups. Both of them are activated if the conditional branch is taken,
but only one of them is deactivated after the full-expr.

---

Fixes https://github.com/llvm/llvm-project/issues/63818
Fixes https://github.com/llvm/llvm-project/issues/88478

---

Previous PR logs:
1. https://github.com/llvm/llvm-project/pull/85398
2. https://github.com/llvm/llvm-project/pull/88670
3. https://github.com/llvm/llvm-project/pull/88751
4. https://github.com/llvm/llvm-project/pull/88884
2024-04-29 12:33:46 +02:00
Vitaly Buka
2e5035aeed
Revert "[clang] Enable sized deallocation by default in C++14 onwards (#83774)" (#90299)
https://lab.llvm.org/buildbot/#/builders/168/builds/20063 (should be
fixed with #90292)

More details in #83774

This reverts commit cf5a8b489464d09dfdd7a48ce7c8b41d3c9bf819.
2024-04-26 17:14:43 -07:00
Pengcheng Wang
cf5a8b4894
[clang] Enable sized deallocation by default in C++14 onwards (#83774)
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

This is another try of https://reviews.llvm.org/D112921.

Fixes #60061
2024-04-26 16:59:12 +08:00
Haojian Wu
dc8f6a8cda
[clang] coroutine: generate valid mangled name in CodeGenFunction::generateAwaitSuspendWrapper (#89731)
Fixes https://github.com/llvm/llvm-project/issues/89723
2024-04-23 21:09:36 +02:00
Utkarsh Saxena
9d8be24087
Revert "[codegen] Emit missing cleanups for stmt-expr and coro suspensions" and related commits (#88884)
The original change caused widespread breakages in msan/ubsan tests and
causes `use-after-free`. Most likely we are adding more cleanups than
necessary.
2024-04-16 15:30:32 +02:00
Utkarsh Saxena
89ba7e183e
[codegen] Emit missing cleanups for stmt-expr and coro suspensions [take-2] (#85398)
Fixes https://github.com/llvm/llvm-project/issues/63818 for control flow
out of an expressions.

#### Background

A control flow could happen in the middle of an expression due to
stmt-expr and coroutine suspensions.

Due to branch-in-expr, we missed running cleanups for the temporaries
constructed in the expression before the branch.
Previously, these cleanups were only added as `EHCleanup` during the
expression and as normal expression after the full expression.

Examples of such deferred cleanups include:

`ParenList/InitList`: Cleanups for fields are performed by the
destructor of the object being constructed.
`Array init`: Cleanup for elements of an array is included in the array
cleanup.
`Lifetime-extended temporaries`: reference-binding temporaries in
braced-init are lifetime extended to the parent scope.
`Lambda capture init`: init in the lambda capture list is destroyed by
the lambda object.

---

#### In this PR

In this PR, we change some of the `EHCleanups` cleanups to
`NormalAndEHCleanups` to make sure these are emitted when we see a
branch inside an expression (through statement expressions or coroutine
suspensions).

These are supposed to be deactivated after full expression and destroyed
later as part of the destructor of the aggregate or array being
constructed. To simplify deactivating cleanups, we add two utilities as
well:
* `DeferredDeactivationCleanupStack`: A stack to remember cleanups with
deferred deactivation.
* `CleanupDeactivationScope`: RAII for deactivating cleanups added to
the above stack.

---

#### Deactivating normal cleanups
These were previously `EHCleanups` and not `Normal` and **deactivation**
of **required** `Normal` cleanups had some bugs. These specifically
include deactivating `Normal` cleanups which are not the top of
`EHStack`
[source1](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L1319)),
[2](92b56011e6/clang/lib/CodeGen/CGCleanup.cpp (L722-L746)).
This has not been part of our test suite (maybe it was never required
before statement expressions). In this PR, we also fix the emission of
required-deactivated-normal cleanups.
2024-04-10 12:59:24 +02:00
Hans
9d1cb18d19
[Coroutines] Ignore instructions more aggressively in addMustTailToCoroResumes() (#85271)
The old code used isInstructionTriviallyDead() and removed instructions
when walking the path from a resume call to function return to check if
the call is in tail position.

However, since the code was walking forwards it was not able to get past
instructions such as:

  %gep = getelementptr inbounds i64, ptr %alloc.var, i32 0
  %foo = ptrtoint ptr %gep to i64

This patch instead ignores such instructions as long as their values are
not needed. This enables the code to emit tail calls in more situations.
2024-03-20 14:51:45 +01:00
fpasserby
f786881340
[coroutine] Implement llvm.coro.await.suspend intrinsic (#79712)
Implement `llvm.coro.await.suspend` intrinsics, to deal with performance
regression after prohibiting `.await_suspend` inlining, as suggested in
#64945.
Actually, there are three new intrinsics, which directly correspond to
each of three forms of `await_suspend`:
```
void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
```
There are three different versions instead of one, because in `bool`
case it's result is used for resuming via a branch, and in
`coroutine_handle` case exceptions from `await_suspend` are handled in
the coroutine, and exceptions from the subsequent `.resume()` are
propagated to the caller.

Await-suspend block is simplified down to intrinsic calls only, for
example for symmetric transfer:
```
%id = call token @llvm.coro.save(ptr null)
%handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
call void @llvm.coro.resume(%handle)
%result = call i8 @llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...
```
All await-suspend logic is moved out into a wrapper function, generated
for each suspension point.
The signature of the function is `<type> wrapperFunction(ptr %awaiter,
ptr %frame)` where `<type>` is one of `void` `i1` or `ptr`, depending on
the return type of `await_suspend`.
Intrinsic calls are lowered during `CoroSplit` pass, right after the
split.

Because I'm new to LLVM, I'm not sure if the helper function generation,
calls to them and lowering are implemented in the right way, especially
with regard to various metadata and attributes, i. e. for TBAA. All
things that seemed questionable are marked with `FIXME` comments.

There is another detail: in case of symmetric transfer raw pointer to
the frame of coroutine, that should be resumed, is returned from the
helper function and a direct call to `@llvm.coro.resume` is generated.
C++ standard demands, that `.resume()` method is evaluated. Not sure how
important is this, because code has been generated in the same way
before, sans helper function.
2024-03-11 10:00:00 +08:00
Nikita Popov
158d72d728
[Clang] Set writable and dead_on_unwind attributes on sret arguments (#77116)
Set the writable and dead_on_unwind attributes for sret arguments. These
indicate that the argument points to writable memory (and it's legal to
introduce spurious writes to it on entry to the function) and that the
argument memory will not be used if the call unwinds.

This enables additional MemCpyOpt/DSE/LICM optimizations.
2024-01-11 09:46:54 +01:00
Yuxuan Chen
4a294b5806
[Clang] CGCoroutines skip emitting try block for value returning noexcept init await_resume calls (#73160)
Previously we were not properly skipping the generation of the `try { }`
block around the `init_suspend.await_resume()` if the `await_resume` is
not returning void. The reason being that the resume expression was
wrapped in a `CXXBindTemporaryExpr` and the first dyn_cast failed,
silently ignoring the noexcept. This only mattered for `init_suspend`
because it had its own try block.

This patch changes to first extract the sub expression when we see a
`CXXBindTemporaryExpr`. Then perform the same logic to check for
`noexcept`.

Another version of this patch also wanted to assert the second step by
`cast<CXXMemberCallExpr>` and as far as I understand it should be a
valid assumption. I can change to that if upstream prefers.
2023-11-28 19:04:29 -08:00
Yuxuan Chen
1fad78b123
[Clang][Coroutines] Properly emit EH code for initial suspend await_resume (#73073)
This change aims to fix an ICE in issue
https://github.com/llvm/llvm-project/issues/63803

The crash happens in `ExitCXXTryStmt` because `EmitAnyExpr()` adds
additional cleanup to the `EHScopeStack`. This messes up the assumption
in `ExitCXXTryStmt` that the top of the stack should be a
`EHCatchScope`.

However, since we never read a value returned from `await_resume()` of
an init suspend, we can skip the part that builds this `RValue`.

The code here may not be in the best shape. There's another bug that
`memberCallExpressionCanThrow` doesn't work on the current Expr due to
type mismatch. I am preparing a separate PR to address it plus some
refactoring might be beneficial.
2023-11-21 21:21:27 -08:00
Chuanqi Xu
b7b5907b56
[Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] (#71014)
Close https://github.com/llvm/llvm-project/issues/56980.

This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.

The rationale behind the patch is simple. See the example:

```C++
A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}
```

Generally the generated .destroy function may be:

```C++
void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}
```

Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:

```C++
void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}
```

I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
2023-11-09 14:42:07 +08:00
Fangrui Song
c0a73918bf [ItaniumCXXABI] Add -fassume-nothrow-exception-dtor to assume that all exception objects' destructors are non-throwing
Link: https://lists.llvm.org/pipermail/cfe-dev/2021-August/068740.html ("[Exception Handling] Could we mark __cxa_end_catch as nounwind conditionally?"
Link: https://github.com/llvm/llvm-project/issues/57375

A catch handler calls `__cxa_begin_catch` and `__cxa_end_catch`. For a catch-all
clause or a catch clause matching a record type, we:

* assume that the exception object may have a throwing destructor
* emit `invoke void @__cxa_end_catch` (as the call is not marked as the `nounwind` attribute).
* emit a landing pad to destroy local variables and call `_Unwind_Resume`

```
struct A { ~A(); };
struct B { int x; };
void opaque();
void foo() {
  A a;
  try { opaque(); } catch (...) { } // the exception object has an unknown type and may throw
  try { opaque(); } catch (B b) { } // B::~B is nothrow, but we do not utilize this
}
```

Per C++ [dcl.fct.def.coroutine], a coroutine's function body implies a `catch (...)`.
Our code generation pessimizes even simple code, like:
```
UserFacing foo() {
  A a;
  opaque();
  co_return;
  // For `invoke void @__cxa_end_catch()`, the landing pad destroys the
  // promise_type and deletes the coro frame.
}
```

Throwing destructors are typically discouraged. In many environments, the
destructors of exception objects are guaranteed to never throw, making our
conservative code generation approach seem wasteful.

Furthermore, throwing destructors tend not to work well in practice:

* GCC does not emit call site records for the region containing `__cxa_end_catch`. This has been a long time, since 2000.
* If a catch-all clause catches an exception object that throws, both GCC and Clang using libstdc++ leak the allocated exception object.

To avoid code generation pessimization, add an opt-in driver option
-fassume-nothrow-exception-dtor to assume that `__cxa_end_catch` calls have the
`nounwind` attribute. This implies that thrown exception objects' destructors
will never throw.

To detect misuses, diagnose throw expressions with a potentially-throwing
destructor. Technically, it is possible that a potentially-throwing destructor
never throws when called transitively by `__cxa_end_catch`, but these cases seem
rare enough to justify a relaxed mode.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D108905
2023-11-05 00:39:38 -07:00
Bruno Cardoso Lopes
34415fd611
[Clang][LLVM][Coroutines] Prevent __coro_gro from outliving __promise (#66706)
When dealing with short-circuiting coroutines (e.g. expected), the
deferred calls that resolve the get_return_object are currently being
emitted after we delete the coroutine frame.

This was caught by ASAN when using optimizations -O1 and above:
optimizations after inlining would place the __coro_gro in the heap, and
subsequent delete of the coroframe followed by the conversion -> BOOM.

This patch forbids the GRO to be placed in the coroutine frame, by
adding a new metadata node that can be attached to `alloca`
instructions.

Fix #49843
2023-09-21 22:52:05 -07:00
Anton Korobeynikov
51d5d7bbae
Extend retcon.once coroutines lowering to optionally produce a normal result (#66333)
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.

However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).

The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
2023-09-15 09:54:38 -07:00
Alexander Kornienko
b7f4915644 Revert "Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas"
This reverts commit e698695fbbf62e6676f8907665187f2d2c4d814b. The commit caused
invalid AddressSanitizer: stack-use-after-scope errors.

See https://reviews.llvm.org/D74094#4633785 for details.

Differential Revision: https://reviews.llvm.org/D159346
2023-09-01 12:53:24 +02:00
Aaron Ballman
a02f9a7756 Revert "[clang] Enable sized deallocation by default in C++14 onwards"
This reverts commit 2916b125f686115deab2ba573dcaff3847566ab9.

Reverting due to failures on:
https://lab.llvm.org/buildbot/#/builders/216/builds/26407
https://lab.llvm.org/staging/#/builders/247/builds/5659
http://45.33.8.238/win/83485/step_7.txt
2023-08-29 09:36:59 -04:00
wangpc
2916b125f6 [clang] Enable sized deallocation by default in C++14 onwards
Since C++14 has been released for about nine years and most standard
libraries have implemented sized deallocation functions, it's time to
make this feature default again.

Reviewed By: rnk, aaron.ballman, #libc, ldionne, Mordante, MaskRay

Differential Revision: https://reviews.llvm.org/D112921
2023-08-29 15:42:50 +08:00
Chuanqi Xu
20e6515d5c [Coroutines] Mark 'coroutine_handle<>::address' as always-inline
Close https://github.com/llvm/llvm-project/issues/65054

The direct issue is still the call to coroutine_handle<>::address()
after await_suspend(). Without optimizations, the current logic will put
the temporary result of await_suspend() to the coroutine frame since the
middle end feel the temporary is escaped from
coroutine_handle<>::address. To fix this fundamentally, we should wrap
the whole logic about await-suspend into a standalone function. See
https://github.com/llvm/llvm-project/issues/64945

And as a short-term workaround, we probably can mark
coroutine_handle<>::address() as always-inline so that the temporary
result may not be thought to be escaped then it won't be put on the
coroutine frame. Although it looks dirty, it is probably do-able since
the compiler are allowed to do special tricks to standard library
components.
2023-08-29 14:35:27 +08:00
Chuanqi Xu
6f1b2e4e97 [NFC] Correct the test code in pr65018
The test code in pr65018 is actually incorrect since the optimizier are
free to optimize the whole coroutine body away. This patch corrected
this.
2023-08-29 11:30:47 +08:00
Chuanqi Xu
b32aa72afc Recommit [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty
The original patch is incorrect since it marks too many calls to be
noinline. It shows that it is bad to do analysis in the frontend again.
This patch tries to mark the await_suspend function as noinlne only.

---

Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151
Close https://github.com/llvm/llvm-project/issues/65018

See the summary and the discussion of
https://reviews.llvm.org/D157070
to get the full context.

As @rjmccall pointed out, the key point of the root cause is that
currently we didn't implement the semantics for '@llvm.coro.save'
well ("after the await-ready returns false, the coroutine is considered
to be suspended ") well.
Since the semantics implies that we (the compiler) shouldn't write
the spills into the coroutine frame in the await_suspend. But now it is
possible due to some combinations of the optimizations so the semantics are
broken. And the inlining is the root optimization of such optimizations.
So in this patch, we tried to add the `noinline` attribute to the
await_suspend function.

This looks slightly problematic since the users are able to call the
await_suspend function standalone. This is limited by the
implementation. On the one hand, we don't want the workaround solution
(See the proposed solution later) to be too complex. On the other hand,
it is rare to call await_suspend standalone. Also it is not semantically
incorrect to do so since the inlining is not part of the C++ standard.

Also as an optimization, we don't add the `noinline` attribute to
the await_suspend function if the awaiter is an empty class. This should be
correct since the programmers can't access the local variables in
await_suspend if the awaiter is empty. I think this is necessary for
the performance since it is pretty common.

The long term solution is:

    call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                  ptr @awaitSuspendFn)

Then it is much easier to perform the safety analysis in the middle
end. If it is safe to inline the call to awaitSuspend, we can replace it
in the CoroEarly pass. Otherwise we could replace it in the CoroSplit
pass.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D157833
2023-08-28 17:07:30 +08:00
Chuanqi Xu
572cc8d38f Revert "[C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty"
This reverts commit 9d9c25f81456aace2bec4b58498a420e650007d9.
This reverts commit 19ab2664ad3182ffa8fe3a95bb19765e4ae84653.
This reverts commit c4672454743e942f148a1aff1e809dae73e464f6.

As the issue https://github.com/llvm/llvm-project/issues/65018 shows,
the previous fix introduce a regression actually. So this commit reverts
the fix by our policies.
2023-08-28 13:21:17 +08:00
Chuanqi Xu
9d9c25f814 [C++20] [Coroutines] Don't mark await_suspend as noinline if it is specified as always_inline already
Address https://github.com/llvm/llvm-project/issues/64933 and partially
https://github.com/llvm/llvm-project/issues/64945.

After c467245, we will add a noinline attribute to the await_suspend
member function of an awaiter if the awaiter has any non static member
functions.

Obviously, this decision will bring some performance regressions. And
people may complain about this while the long term solution may not be
available soon. In such cases, it is better to provide a solution for
the users who met the regression surprisingly.

Also it is natural to not prevent the inlining if the function is marked
as always_inline by the users already.
2023-08-28 11:43:33 +08:00
Chuanqi Xu
7037331a2f [Coroutines] [CoroElide] Don't think exceptional terminator don't leak coro handle unconditionally any more
Close https://github.com/llvm/llvm-project/issues/59723.

The fundamental cause of the above issue is that we assumed the memory
of coroutine frame can be released by stack unwinding automatically
if the allocation of the coroutine frame is elided. But we missed one
point: the stack unwinding has different semantics with the explicit
coroutine_handle<>::destroy(). Since the latter is explicit so it shows
the intention of the user. So we can blame the user to destroy the
coroutine frame incorrectly in case of use-after-free happens. But we
can't do so with stack unwinding.

So after this patch, we won't think the exceptional terminator don't
leak the coroutine handle unconditionally. Instead, we think the
exceptional terminator will leak the coroutine handle too if the
coroutine is leaked somewhere along the search path.

Concretely for C++, we can think the exceptional terminator is not
special any more. Maybe this may cause some performance regressions.
But I've tested the motivating example (std::generator). And on the
other side, the coroutine elision is a middle end opitmization and not
a language feature. So we don't think we should blame such regressions
especially we are correcting the miscompilations.
2023-08-23 16:51:53 +08:00
Chuanqi Xu
c467245474 [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty
Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151

See the summary and the discussion of https://reviews.llvm.org/D157070
to get the full context.

As @rjmccall pointed out, the key point of the root cause is that
currently we didn't implement the semantics for '@llvm.coro.save' well
("after the await-ready returns false, the coroutine is considered to be
suspended ") well.
Since the semantics implies that we (the compiler) shouldn't write the
spills into the coroutine frame in the await_suspend. But now it is possible
due to some combinations of the optimizations so the semantics are
broken. And the inlining is the root optimization of such optimizations.
So in this patch, we tried to add the `noinline` attribute to the
await_suspend call.

Also as an optimization, we don't add the `noinline` attribute to the
await_suspend call if the awaiter is an empty class. This should be
correct since the programmers can't access the local variables in
await_suspend if the awaiter is empty. I think this is necessary for the
performance since it is pretty common.

Another potential optimization is:

    call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                  ptr @awaitSuspendFn)

Then it is much easier to perform the safety analysis in the middle
end.
If it is safe to inline the call to awaitSuspend, we can replace it
in the CoroEarly pass. Otherwise we could replace it in the CoroSplit
pass.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D157833
2023-08-22 09:56:44 +08:00
Erik Pilkington
e698695fbb Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas
This reverts commit e26c24b849211f35a988d001753e0cd15e4a9d7b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Link: https://github.com/llvm/llvm-project/issues/38157
Link: https://github.com/llvm/llvm-project/issues/41896
Link: https://github.com/llvm/llvm-project/issues/43598
Link: https://github.com/ClangBuiltLinux/linux/issues/39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D74094
2023-08-16 15:21:46 -07:00
Chuanqi Xu
21765af763 [C++] [Coroutines] Assume the allocation doesn't return nullptr
In case of 'get_return_object_on_allocation_failure' get declared, the
compiler is required to call 'operator new(size_t, nothrow_t)' and the
handle the failure case by calling
'get_return_object_on_allocation_failure()'. But the failure case should
be rare and we can assume the allocation is successful and pass the
information to the optimizer.
2023-06-26 14:37:25 +08:00
Sergei Barannikov
f46b0e6d75 [clang] Convert a few tests to opaque pointers
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D150520
2023-05-14 21:00:15 +03:00
Nikita Popov
243e62b9d8 [Coroutines] Directly remove unnecessary lifetime intrinsics
The insertSpills() code will currently skip lifetime intrinsic users
when replacing the alloca with a frame reference. Rather than
leaving behind the dead lifetime intrinsics working on the old
alloca, directly remove them. This makes sure the alloca can be
dropped as well.

I noticed this as a regression when converting tests to opaque
pointers. Without opaque pointers, this code didn't really do
anything, because there would usually be a bitcast in between.
The lifetimes would get rewritten to the frame pointer. With
opaque pointers, this code now triggers and leaves behind users
of the old allocas.

Differential Revision: https://reviews.llvm.org/D148240
2023-04-14 10:22:30 +02:00