Because we were not looking for the llvm.coro.id.async intrinsic in the
early coro pass which triggers follow-up passes we relied on the
llvm.coro.end intrinsic being present. This might not be the case in
functions that end in unreachable code.
Differential Revision: https://reviews.llvm.org/D95144
Summary: This is to address bug48712.
The solution in this patch is that when we want to merge two variable a
into the storage frame of variable b only if the alignment of a is
multiple of b.
There may be other strategies. But now I think they are hard to handle
and benefit little. Or we can implement them in the future.
Test-plan: check-llvm
Reviewers: jmorse, lxfind, junparser
Differential Revision: https://reviews.llvm.org/D94891
This is to address https://bugs.llvm.org/show_bug.cgi?id=48626.
When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal.
We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call
actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is
generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway.
Differential Revision: https://reviews.llvm.org/D94834
When building with GCC 10, the following warning is reported:
```
/llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable]
1527 | if (CatchSwitchInst *CS =
```
This change adds a cast to `void` to avoid the warning.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94456
promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94137
Apparently there can be no clones, as happens in
coro-retcon-unreachable.ll.
The alternative is to allow no split functions in
addSplitRefRecursiveFunctions(), but it seems better to have the caller
make sure it's not accidentally splitting no functions out.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D94258
Previously when trying to support CoroSplit's function splitting, we
added in a hack that simply added the new function's node into the
original function's SCC (https://reviews.llvm.org/D87798). This is
incorrect since it might be in its own SCC.
Now, more similar to the previous design, we have callers explicitly
notify the LazyCallGraph that a function has been split out from another
one.
In order to properly support CoroSplit, there are two ways functions can
be split out.
One is the normal expected "outlining" of one function into a new one.
The new function may only contain references to other functions that the
original did. The original function must reference the new function. The
new function may reference the original function, which can result in
the new function being in the same SCC as the original function. The
weird case is when the original function indirectly references the new
function, but the new function directly calls the original function,
resulting in the new SCC being a parent of the original function's SCC.
This form of function splitting works with CoroSplit's Switch ABI.
The second way of splitting is more specific to CoroSplit. CoroSplit's
Retcon and Async ABIs split the original function into multiple
functions that all reference each other and are referenced by the
original function. In order to keep the LazyCallGraph in a valid state,
all new functions must be processed together, else some nodes won't be
populated. To keep things simple, this only supports the case where all
new edges are ref edges, and every new function references every other
new function. There can be a reference back from any new function to the
original function, putting all functions in the same RefSCC.
This also adds asserts that all nodes in a (Ref)SCC can reach all other
nodes to prevent future incorrect hacks.
The original hacks in https://reviews.llvm.org/D87798 are no longer
necessary since all new functions should have been registered before
calling updateCGAndAnalysisManagerForPass.
This fixes all coroutine tests when opt's -enable-new-pm is true by
default. This also fixes PR48190, which was likely due to the previous
hack breaking SCC invariants.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D93828
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
The llvm.coro.end.async intrinsic allows to specify a function that is
to be called as the last action before returning. This function will be
inlined after coroutine splitting.
This function can contain a 'musttail' call to allow for guaranteed tail
calling as the last action.
Differential Revision: https://reviews.llvm.org/D93568
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.
This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.
Differential Revision: https://reviews.llvm.org/D91305
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.
This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.
Differential Revision: https://reviews.llvm.org/D91305
We need to be able to call function pointers. Inline the dispatch
function.
Also inline the context projection function.
Transfer debug locations from the suspend point to the inlined functions.
Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.
Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D91098
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:
```
resumable foo() {
int x[10] = {};
int a = 3;
co_await std::experimental::suspend_always();
a++;
x[0] = 1;
a += 2;
x[1] = 2;
a += 3;
x[2] = 3;
}
```
Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):
```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
40 co_await std::experimental::suspend_always();
41 a++;
42 x[0] = 1;
-> 43 a += 2;
44 x[1] = 2;
45 a += 3;
46 x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```
The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.
Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:
```
await.ready:
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
becomes:
```
await.ready:
...
call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
Differential Revision: https://reviews.llvm.org/D90772
Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.
Differential Revision: https://reviews.llvm.org/D90977
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).
This allows for defining suspension points that reuse the current
context for example.
Also:
llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic
Blocks inlining until after the async coroutine was split.
Also, change the async function pointer's context size position
struct async_function_pointer {
uint32_t relative_function_pointer_to_async_impl;
uint32_t context_size;
}
And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90783
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame.
Differential Revision: https://reviews.llvm.org/D89768
This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.
This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.
Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.
The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.
In doing so, I also cleaned up a few things and also discovered a few other bugs.
Cleanups:
1. Found out that PromiseFieldId is not used, delete it.
2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
4. All the loops that process Spills are simplified now because we use a map instead of a vector.
Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.
The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.
Differential Revision: https://reviews.llvm.org/D88872
bug 45566 shows the process of building coroutine frame won't consider
that the lifetimes of different local variables are not overlapped,
which means the compiler could generates smaller frame.
This patch calculate the lifetime range of each alloca by StackLifetime
class. Then the patch build non-overlapped sets for allocas whose
lifetime ranges are not overlapped. We use the largest type in a
non-overlapped set as the field type in the frame. In insertSpills
process, if we find the type of field is not the same with the alloca,
we cast the pointer to the field type to the pointer to the alloca type.
Since the lifetime range of alloca in one non-overlapped set is not
overlapped with each other, it should be ok to reuse the storage space
in the frame.
Test plan: check-llvm, check-clang, cppcoro, folly
Reviewers: junparser, lxfind, modocache
Differential Revision: https://reviews.llvm.org/D87596
Issue Details:
In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks.
Fix Details:
During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block.
Reviewed By: GorNishanov, lxfind
Differential Revision: https://reviews.llvm.org/D88059
This seems to fit the CGSCC updates model better than calling
addNewFunctionInto{Ref,}SCC() on newly created/outlined functions.
Now addNewFunctionInto{Ref,}SCC() are no longer necessary.
However, this doesn't work on newly outlined functions that aren't
referenced by the original function. e.g. if a() was outlined into b()
and c(), but c() is only referenced by b() and not by a(), this will
trigger an assert.
This also fixes an issue I was seeing with newly created functions not
having passes run on them.
Ran check-llvm with expensive checks.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87798
When a spill definition is before CoroBegin, we cannot spill it to the frame immediately after the definition. We have to spill it after the frame is ready.
The current implementation handles it properly for any other kinds of instructions except for PhINode and InvokeInst, which could also be defined before CoroBegin.
This patch fixes it by moving the CoroBegin dominance check earlier, so that it covers all cases.
Added a test.
Differential Revision: https://reviews.llvm.org/D87810
D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:
%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.
There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.
While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.
Differential Revision: https://reviews.llvm.org/D86859
This reverts commit 2e43acfed8.
LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).
The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue
TestPlan: check-llvm
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D85812