When fetching allocation sizes, we almost always want to have the
size in bytes, but we were only providing an InBits API. Also add
the corresponding byte-based conjugate to save some *8 and /8
juggling everywhere.
Assuming that collecting frame allocas doesn't modify CFG we can safely
move DominatorTree construction outside loop and avoid expensive computations.
Differential Revision: https://reviews.llvm.org/D140818
The llvm.lifetime.start intrinsic guarantees that the address for a
given alloca is always the same. So variables with escaped addresses
reaching reaching a lifetime start/end block before and after a suspend
must be placed onto the coroutine frame even if the variable itself
is not alive across the suspend point.
This computes a new `LoopKill` flag in the suspend crossing data flow
anaysis to catch the case where a lifetime marker can reach itself
via suspend-crossing path.
This fixes https://llvm.org/PR52501
Differential Revision: https://reviews.llvm.org/D140231
Close https://github.com/llvm/llvm-project/issues/59221.
The root cause for the problem is that we marked the parameter of the
resume/destroy functions as noalias previously. But this is not true.
See https://github.com/llvm/llvm-project/issues/59221 for the details.
Long story short, for this C++ program
(https://compiler-explorer.com/z/6qGcozG93), the optimized frame will be
something like:
```
struct test_frame {
void (*__resume_)(), // a function pointer points to the
`test.resume` function, which can be imaged as the test() function in
the example.
....
struct a_frame {
...
void **caller; // may points to test_frame at runtime.
};
};
```
And the function a and function test looks just like:
```
define i32 @a(ptr noalias %alloc_8) {
%alloc_8_16 = getelementptr ptr, ptr %alloc_8, i64 16
store i32 42, ptr %alloc_8_16, align 8
%alloc_8_8 = getelementptr ptr, ptr %alloc_8, i64 8
%alloc = load ptr, ptr %alloc_8_8, align 8
%p = load ptr, ptr %alloc, align 8
%r = call i32 %p(ptr %alloc)
ret i32 %r
}
define i32 @b(ptr %p) {
entry:
%alloc = alloca [128 x i8], align 8
%alloc_8 = getelementptr ptr, ptr %alloc, i64 8
%alloc_8_8 = getelementptr ptr, ptr %alloc_8, i64 8
store ptr %alloc, ptr %alloc_8_8, align 8
store ptr %p, ptr %alloc, align 8
%r = call i32 @a(ptr nonnull %alloc_8)
ret i32 %r
}
```
Here inside the function `a`, we can access the parameter `%alloc_8` by
`%alloc` and we pass `%alloc` to an unknown function. So it breaks the
assumption of `noalias` parameter.
Note that although only CoroElide optimization can put a frame inside
another frame directly, the following case is not valid too:
```
struct test_frame {
....
void **a_frame; // may points to a_frame at runtime.
};
struct a_frame {
void **caller; // may points to test_frame at runtime.
};
```
Since the C++ language allows the programmer to get the address of
coroutine frames, we can't assume the above case wouldn't happen in the
source codes. So we can't set the parameter as noalias no matter if
CoroElide applies or not. And for other languages, it may be safe if
they don't allow the programmers to get the address of coroutine frames.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139295
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This has been found while trying to remove the last few places relying on `unsigned` to convey alignment operations.
This seems to be untested.
Differential Revision: https://reviews.llvm.org/D138784
llvm.coro.begin
Previously we've taken care of the writes to allocas prior to llvm.coro.begin.
However, since the promise alloca is special so that we never handled it
before. For the long time, since the programmers can't access the
promise_type due to the c++ language specification, we still failed to
recognize the problem until a recent report:
https://github.com/llvm/llvm-project/issues/57861
And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin()
prope until a recent report:
https://github.com/llvm/llvm-project/issues/57861
And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin() properly.
Closes https://github.com/llvm/llvm-project/issues/57861
Same as for async-style lowering, if there are no resume points in a
function, the coroutine frame pointer will be replaced by an undef,
making all accesses to the frame undefinde behavior.
Fix this by not adding allocas to the coroutine frame if there are no
resume points.
Differential Revision: https://reviews.llvm.org/D137866
Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable.
Can be disabled because we don't expect to handle every pass in -print-pipeline-passes.
Fixes#58280.
Reviewed By: ChuanqiXu
Differential Revision: https://reviews.llvm.org/D135703
Transforms occasionally want to insert an instruction directly
after the definition point of a value. This involves quite a few
different edge cases, e.g. for phi nodes the next insertion point
is not the next instruction, and for invokes and callbrs its not
even in the same block. Additionally, the insertion point may not
exist at all if catchswitch is involved.
This adds a general Instruction::getInsertionPointAfterDef() API to
implement the necessary logic. For now it is used in two places
where this should be mostly NFC. I will follow up with additional
uses where this fixes specific bugs in the existing implementations.
Differential Revision: https://reviews.llvm.org/D129660
With this commit, we now attach an `DISubprogram` to the LLVM-generated
`_NoopCoro_ResumeDestroy` function. Thereby, lldb can show a
`std::coroutine_handle` to a `std::noop_coroutine` as
```
continuation = coro frame = 0x555555560d98 {
resume = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy)
destroy = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy)
}
```
instead of
```
continuation = coro frame = 0x555555560d98 {
resume = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211)
destroy = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211)
}
```
I renamed the function from `NoopCoro.ResumeDestroy` to
`_NoopCoro_ResumeDestroy` because:
* the leading `_` makes sure this is a reserved name and should not
clash with any user-provided names
* the `.` was replaced by a `_`, so the name is now a valid identifier
in C, making it allows me to type its name in the debugger
Differential Revision: https://reviews.llvm.org/D132580
Closing https://github.com/llvm/llvm-project/issues/57339
The root cause for this issue is an pre-mature optimization to eliminate
the index for the final suspend point since we feel like we can judge
if a coroutine is suspended at the final suspend by if resume_fn_addr is
null. However this is not true if the coroutine exists via an exception
in promise.unhandled_exception(). According to
[dcl.fct.def.coroutine]p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.
But from the perspective of the implementation, we can't set the coro
index to the final suspend point directly since it breaks the states.
To fix the issue, we block the optimization if we find there is any
unwind coro end, which indicates that it is possible that the coroutine
exists via an exception from promise.unhandled_exception().
Test Plan: folly
Closing https://github.com/llvm/llvm-project/issues/56329
The problem happens when we try to simplify the suspend points. We might
break the assumption that the final suspend lives in the last slot of
Shape.CoroSuspends. This patch tries to main the assumption and fixes
the problem.
Closing https://github.com/llvm/llvm-project/issues/56919
It is meaningless to preserve the lifetime markers for the spilled
allocas in the coroutine frames and it would block some optimizations
too.
Previously the scope of debug type of __coro_frame is limited in the
current function. It looked good at the first sight. But it prevent us
to print the type in splitted functions and other functions. Also the
debug type is different for different coroutine functions. So it makes
sense to rename the debug type to make it related to the function name.
After this patch, we could access the coroutine frame type in a function
by `function_name.coro_frame_ty`.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D127623
enabled
The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.
This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D128794
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.
Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.
The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.
Differential Revision: https://reviews.llvm.org/D125795
Symmetric transfer is not a part of C++ standards. So the vendors is not
forced to implement it any way. Given the symmetric transfer nowadays is
an optimization. It makes more sense to enable it only if the
optimization is enabled. It is also helpful for the compilation speed in
O0.
Background:
When we construct coroutine frame, we would insert a dbg.declare
intrinsic for it:
```
%hdl = call void @llvm.coro.begin() ; would return coroutine handle
call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
```
And in the splitted coroutine, it looks like:
```
define void @coro_func.resume(ptr *hdl) {
entry.resume:
call void @llvm.dbg.declare(metadata ptr %hdl, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
}
```
And we would salvage the debug info by inserting a new alloca here:
```
define void @coro_func.resume(ptr %hdl) {
entry.resume:
%frame.debug = alloca ptr
call void @llvm.dbg.declare(metadata ptr %frame.debug, metadata
![[DEBUG_VARIABLE: __coro_frame]], metadata !DIExpression())
store ptr %hdl, %frame.debug
}
```
But now, the problem comes since the `dbg.declare` refers to the address
of that alloca instead of actual coroutine handle. I saw there are codes
to solve the problem but it only applies to complex expression only. I
feel if it is OK to relax the condition to make it work for
`__coro_frame`.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D126277
Async context frames are allocated with a maximum alignment. If a type
requests an alignment bigger than that dynamically align the address
in the frame.
Differential Revision: https://reviews.llvm.org/D126715
Running iwyu-diff on LLVM codebase since 7030654296a0416bd9402a0278 detected a few
regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D126417
The EnableReuseStorageInFrame option is designed for testing only.
But it is better to use *_PASS_WITH_PARAMS macro to keep consistent with
other passes.
This is a following cleanup for the previous work D123918. I missed
serveral places which still use legacy pass managers. This patch tries
to remove them.
Re-materialize for debug instructions would cause a different code
generated if we enabled `-g`. This is bad. So we disable to
re-materialize for debug instructions.