This reverts commit 2e414799d0ad511cd7999895014a2cae2ea5e3e3.
Reason: buildbot breakage
(https://lab.llvm.org/buildbot/#/builders/51/builds/4105)
(This was the only new CL.)
/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Transforms/Coroutines/ABI.h:71:31: error: 'llvm::coro::AsyncABI' has virtual functions but non-virtual destructor [-Werror,-Wnon-virtual-dtor]
71 | class LLVM_LIBRARY_VISIBILITY AsyncABI : public BaseABI {
etc.
* Adds an ABI object class hierarchy to implement the coroutine ABIs
(Switch, Asyc, and Retcon{Once})
* The ABI object improves the separation of the code related to users,
ABIs and utilities.
* No code changes are required by any existing users.
* Each ABI overrides delegate methods for initialization, building the
coroutine frame and splitting the coroutine, other methods may be added
later.
* CoroSplit invokes a generator lambda to instantiate the ABI object and
calls the ABI object to carry out its primary operations.
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
This patch fixes:
llvm/lib/Transforms/Coroutines/Coroutines.cpp:475:3: error: default
label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
* Refactor buildFrom to separate the analysis, abi related operations,
cleaning and invalidating.
* In a follow-up PR the code in initABI will be moved to an ABI object
init method.
* In a follow-up PR the OptimizeFrame flag will also be removed from the
Shape and instead will be passed directly to buildCoroutineFrame
(although it would be nice to find another way to trigger this
optimization). This is the only thing that Shape cannot determine from
the Function/Coroutine, but it is only needed within
buildCoroutineFrame.
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
CoroSplit has a heuristic where the scope line for funclets is adjusted
to match the line of the suspend intrinsic that caused the split. This
is useful as it avoids a jump on the line table from the original
function declaration to the line where the split happens.
However, very often using the line of the split is not ideal: if we can
avoid it, we should not have a line entry for the split location, as
this would cause breakpoints by line to match against two functions: the
funclet before and the funclet after the split.
This patch adjusts the heuristics to look for the first instruction with
a non-zero line number after the split. In other words, this patch makes
breakpoints on `await foo()` lines behave much more like a regular
function call.
* To create custom ABIs plugin libraries need access to CoroShape.
* As a step in enabling plugin libraries, move Shape into its own header
* The header will eventually be moved into include/llvm/Transforms/Coroutines
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
* Move materialization out of CoroFrame to MaterializationUtils.h
* Move spill related utilities that were used by materialization to
SpillUtils
* Move isSuspendBlock (needed by materialization) to CoroInternal
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
* Add asserts to verify normalization of the coroutine happened.
* This will be important when normalization becomes part of an ABI
object.
--- From a previous discussion here
https://github.com/llvm/llvm-project/pull/108076
Normalization performs these important steps:
split around each suspend, this adds BBs before/after each suspend so
each suspend now exists in its own block.
split around coro.end (similar to above)
break critical edges and add single-edge phis in the new blocks (also
removing other single-edge phis).
Each of these things can individually be tested
A) Check that each suspend is the only inst in its BB
B) Check that coro.end is the only inst in its BB
C) Check that each edge of a multi-edge phis is preceded by single-edge
phi in an immediate pred
For 1) and 2) I believe the purpose of the transform is in part for
suspend crossing info's analysis so it can specifically 'mark' the
suspend blocks and identify the end of the coroutine. There are some
existing places within suspend crossing info that visit the CoroSuspends
and CoroEnds so we could check A) and B) there.
For 3) I believe the purpose of this transform is for insertSpills to
work properly. Infact there is already a check for the result of this
transform!
assert(PN->getNumIncomingValues() == 1 &&
"unexpected number of incoming "
"values in the PHINode");
I think to verify the result of normalization we just need to add checks
A) and B) to suspend crossing info.
Fixes https://github.com/llvm/llvm-project/issues/107139.
We weren't updating the call graph properly in CoroSplit. This crash is
due to the await_suspend() function calling the coroutine, forming a
multi-node SCC. The issue bisected to
https://github.com/llvm/llvm-project/pull/79712 but I think this is red
herring. We haven't been properly updating the call graph.
Added an example of such code as a test case.
* Split buildCoroutineFrame into code related to normalization and code
related to actually building the coroutine frame.
* This will enable future specialization of buildCoroutineFrame for
different ABIs while the normalization can be done by splitCoroutine
prior to calling buildCoroutineFrame.
See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
After landing https://github.com/llvm/llvm-project/pull/99285 we found
that the call graph update was causing the following crash when
expensive checks are turned on
```
llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:982: LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass(LazyCallGraph &, LazyCallGraph::SCC &, LazyCallGraph::Node &, CGSCCAnalysisManager &, CGSCCUpdateResult &, FunctionAnalysisManager &, bool): Assertion `(RC == &TargetRC || RC->isAncestorOf(Targe
tRC)) && "New call edge is not trivial!"' failed.
```
I have to admit I believe that the call graph update process I did for
that patch could be wrong.
After reading the code in `CGSCCToFunctionPassAdaptor`, I am convinced
that `CoroAnnotationElidePass` can be a FunctionPass and rely on the
adaptor to update the call graph for us, so long as we properly
invalidate the caller's analyses.
After this patch,
`llvm/test/Transforms/Coroutines/coro-transform-must-elide.ll` no longer
fails under expensive checks.
This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.
This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.
Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)
You can find episode 1: Clang FE of this patch series at
https://github.com/llvm/llvm-project/pull/99282
Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
This patch is episode two of the coroutine HALO improvement project
published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
Previously CoroElide depends on inlining, and its analysis does not work
very well with code generated by the C++ frontend due the existence of
many customization points. There has been issue reported to upstream how
ineffective the original CoroElide was in real world applications.
For C++ users, this set of patches aim to fix this problem by providing
library authors and users deterministic HALO behaviour for some
well-behaved coroutine `Task` types. The stack begins with a library
side attribute on the `Task` class that guarantees no unstructured
concurrency when coroutines are awaited directly with `co_await`ed as a
prvalue. This attribute on Task types gives us lifetime guarantees and
makes C++ FE capable to telling the ME which coroutine calls are
elidable. We convey such information from FE through the attribute
`coro_elide_safe`.
This patch modifies CoroSplit to create a variant of the coroutine ramp
function that 1) does not use heap allocated frame, instead take an
additional parameter as the pointer to the frame. Such parameter is
attributed with `dereferenceble` and `align` to convey size and align
requirements for the frame. 2) always stores cleanup instead of destroy
address for `coro.destroy()` actions.
In a later patch, we will have a new pass that runs right after
CoroSplit to find usages of the callee coroutine attributed
`coro_elide_safe` in presplit coroutine callers, allocates the frame on
its "stack", transform those usages to call the `noalloc` ramp function
variant.
(note I put quotes on the word "stack" here, because for presplit
coroutine, any alloca will be spilled into the frame when it's being
split)
The C++ Frontend attribute implementation that works with this change
can be found at https://github.com/llvm/llvm-project/pull/99282
The pass that makes use of the new `noalloc` split can be found at
https://github.com/llvm/llvm-project/pull/99285
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.
And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.
In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.
So everything looks fine.
No Functional Changes
* Fix comments in several places
* Instead of using BB-getName() (in dump methods) use
getBasicBlockLabel. This fixes the poor output of the dumped info that
resulted in missing BB labels.
* Use RPO when dumping SuspendCrossingInfo. Without this the dump order
is determined by the ptr addresses and so is not consistent from run to
run making IR diffs difficult to read.
* Inference -> Interference
* Pull the logic that determines insertion location out of insertSpills
and into getSpillInsertionPt, to differentiate between these two
operations.
* Use Shape getters for CoroId instead of getting it manually.
---------
Co-authored-by: tnowicki <tnowicki.nowicki@amd.com>
As the title mentioned, do not search for the DILocalVariable for
__promise when constructing the debug variable for __coro_frame.
This should make sense because the debug variable of `__coro_frame`
shouldn't dependent on the debug variable of `__promise`. And actually,
it is not. Currently, we search the debug variable for `__promise` only
because we want to get the debug location and the debug scope for the
`__promise`. However, we can construct the debug location directly from
the debug scope of the being compiled function. Then it is not necessary
any more to search the `__promise` variable.
And this patch makes the codes to construct the debug variable for
`__coro_frame` to be more stable. Now we will always be able to
construct the debug variable for the coroutine frame no matter if we
found the debug variable for the __promise or not.
This patch is not strictly NFC but it is intended to not affect any end
users.
This patch tries to salvage the debug information for the coroutine
frames within optimizations by creating the help alloca varaibles with
optimizations too. We didn't do this when I implement it initially. I
roughtly remember the reason was, we feel the additional help alloca
variable may pessimize the performance, which is almost the most
important thing under optimizations. But now, it looks like the new
inserted help alloca variables can be optimized out by the following
optimizations. So it looks like the time to make it available within
optimizations.
And also, it looks like the following optimizations will convert the
generated dbg.declare instrinsic into dbg.value intrinsic within
optimizations.
In LLVM's test, there is a slightly regression
that a dbg.declare for the promise object failed to be remained after
this change. But it looks like we won't have a chance to see dbg.declare
for the promise object when we split the coroutine as that dbg.declare
will be converted into a dbg.value in early stage.
So everything looks fine.
As the title mentioned, do not search for the DILocalVariable for
__promise when constructing the debug variable for __coro_frame.
This should make sense because the debug variable of `__coro_frame`
shouldn't dependent on the debug variable of `__promise`. And actually,
it is not. Currently, we search the debug variable for `__promise` only
because we want to get the debug location and the debug scope for the
`__promise`. However, we can construct the debug location directly from
the debug scope of the being compiled function. Then it is not necessary
any more to search the `__promise` variable.
And this patch makes the codes to construct the debug variable for
`__coro_frame` to be more stable. Now we will always be able to
construct the debug variable for the coroutine frame no matter if we
found the debug variable for the __promise or not.
This patch is not strictly NFC but it is intended to not affect any end
users.
CoroSplit expects the second parameter of `llvm.coro.id` to be the
promise alloca. Applying Asan on a pre-split coroutine breaks this
assumption and causes split to fail. This should be NFC because asan
pass happens late in the pipeline where all coroutines are split. This
is to prevent crash in case the order of passes are switched.
Also `NoopCoro.Frame.Const` is a special coroutine frame that does
nothing when resumed or destroyed. There is no point to do
instrumentation on it.
If `Coroutines.empty()` the following loop is going to be skipped
entirely and same goes for `PrepareFns.empty()`. These two conditions
here aren't useful and adds to complexity.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
SmallPtrSet.h and TimeProfiler.h are unused. CommandLine.h is only
needed for the UseNewDbgInfoFormat declare, which can be moved to the
places that need it.
See the post commit message in
https://github.com/llvm/llvm-project/pull/89751
We met a regression due to a change of calling convention of this patch.
Previously, the calling convention of indirect resume calls is always
fast.
And in this patch, although we tried to take care of it in the cloner,
we forget the case that we have to update the resuming calls in the ramp
functions. So this is the root cause of the downstream failure.
This patch tries to mark the generated resuming calls as fast
immediately after they got created to make sure the calling convention
is correct.
See the post commit message in
https://github.com/llvm/llvm-project/pull/89751
We met a regression due to a change of calling convention of this patch.
Previously, the calling convention of indirect resume calls is always
fast.
And in this patch, although we tried to take care of it in the cloner,
we forget the case that we have to update the resuming calls in the
ramp functions. So this is the root cause of the downstream failure.
This patch tries to mark the generated resuming calls as fast
immediately after they got created to make sure the calling convention
is correct.
This reverts commit 924384161ffceda08099536dd07a953299a69b53.
This reland addresses the performance regressions seen in #90265 by
retaining the original definition of
`isPotentiallyReachableFromMany(...)` instead of reimplementing it with
`isManyPotentiallyReachableFromMany(...)`.
Fixes#86580
The C++ standard requires that symmetric transfer from one coroutine to
another is performed via a tail call. Failure to do so is a miscompile
and often breaks programs by quickly overflowing the stack.
Until now, the coro split pass tried to ensure this in the
`addMustTailToCoroResumes()` function by searching for
`llvm.coro.resume` calls to lower as tail calls if the conditions were
right: the right function arguments, attributes, calling convention
etc., and if a `ret void` was sure to be reached after traversal with
some ad-hoc constant folding following the call.
This was brittle, as the kind of implicit variants required for a tail
call to happen could easily be broken by other passes (e.g. if some
instruction got in between the `resume` and `ret`), see for example
9d1cb18d19862fc0627e4a56e1e491a498e84c71 and
284da049f5feb62b40f5abc41dda7895e3d81d72.
Also the logic seemed backwards: instead of searching for possible tail
call candidates and doing them if the circumstances are right, it seems
better to start with the intention of making the tail calls we need, and
forcing the circumstances to be right.
Now that we have the `llvm.coro.await.suspend.handle` intrinsic (since
f78688134026686288a8d310b493d9327753a022) which corresponds exactly to
symmetric transfer, change the lowering of that to also include the
`resume` part, always lowered as a tail call.
This patch contains no functional changes.
The main goal of this patch is to get better clarity out of the code, to
make intentions and assumptions clear.
One major design problem I had in the past were `Lowerer`. It previously
inherited from `coro::LowererBase` but it doesn't use any of the fields
or methods from `LowererBase`. It might be an artifact leftover from
previous designs of this code.
Furthermore, we should clarify that although one such instance is bound
to the function, `Lowerer` was dedicated to one `CoroId` instruction at
a time. We rely on a sequence of fragile constructs like
`CoroBegins.clear(); DestroyAddr.clear()`. This doesn't help understand
the code.
What's worse is that we have confusing calls like
`elideHeapAllocations(CoroId->getFunction(), ...` and it might get
confused with `CoroId->getCoroutine()`.
The new structure intends to make it clear that we always operate on one
`CoroId` at a time, which may have multiple `CoroBegin`s. Such structure
doesn't rely on frequent `.clear()` that's prone to miss.
The current logic for using lifetime intrinsics to determine whether a
coroutine alloca should live on the coroutine frame or stack doesn't
consider `llvm.lifetime.end`. As a result, some allocas are incorrectly
placed on the stack even though their lifetimes may outlive the stack.
For example, SimplifyCFG may generate code that drops the corresponding
`llvm.lifetime.end` of an `llvm.lifetime.start`, and that code is
incorrectly handled by the existing logic.
To fix this, new logic is introduced where if an alloca's address is
escaped, and there is a path from an `llvm.lifetime.start` to a
coroutine suspend point (e.g. `llvm.coro.suspend`) without an
`llvm.lifetime.end`, then we know the object lives beyond the suspension
point and therefore must go on the coroutine frame.
Fixes https://github.com/llvm/llvm-project/issues/86580
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
31 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".