Coroutines have weird semantics that don't quite match normal LLVM functions,
so trying to infer even simple attributes based on thier contents can go wrong.
When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ; %struct is escaping!
<suspend points>
```
Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.
This patch fixes this.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D112216
In situations where the coroutine function is not split we can just
replace the async.resume by null.
rdar://82591919
Differential Revision: https://reviews.llvm.org/D110191
The OutermostLoad condition is supposed to strip the outermost
DW_OP_deref operation because dbg.declares are implicitly
indirect. This patch makes sure the heuristic is only applied to
dbg.declare intrinsics and only if the outermost instruction is a
load.
This was found while qualifying the latest Swift compiler rebranch.
rdar://82037764
Unfortunatley the IR Verifier doesn't reject debug intrinsics that
have nullptr as arguments, so coro::salvageDebugInfo for now also
needs to deal with them.
rdar://81979541
This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().
A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107384
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.
Differential Revision: https://reviews.llvm.org/D107573
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.
This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.
I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105627
Before this patch we would normally use the ABI alignment which can be
to high for the context alginment.
For spilled values we don't need ABI alignment, since the frame entry's
address is not escaped.
rdar://79664965
Differential Revision: https://reviews.llvm.org/D105288
Code assumes that uses of single predecessor phis are not live accross
suspend points. Cleanup any single predecessor phis preceeding the code
making this assumption.
rdar://76020301
Differential Revision: https://reviews.llvm.org/D105488
The resume partial functions generated for swift suspend points will now
use a Swift mangling suffix.
Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_")
and suspend resume partial functions will use the suffix 'TY'[0-9]+'_'
(e.g "...TY1_").
Reviewed By: nate_chandler
Differential Revision: https://reviews.llvm.org/D104144
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095
Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html
In the existing design, An SCC that contains a coroutine will go through the folloing passes:
Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline
The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat.
As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended.
What we really wanted is this:
Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline
(note that we don't really need to run Inliner again on the ramp function after split).
Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function.
This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually.
By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines.
Reviewed By: aeubanks, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D95807
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095
CoroElide pass works only when a post-split coroutine is inlined into another post-split coroutine.
In O0, there is no inlining after CoroSplit, and hence no CoroElide can happen.
It's useless to put CoroElide pass in the O0 pipeline and it will never be triggered (unless I miss anything).
Differential Revision: https://reviews.llvm.org/D105066
There is a constraint that coro.suspend instructions need to be in their
own blocks. The coro split pass initially creates IR that obeys this constraint
(which is later checked). Sinking rematerializable instructions into these
blocks breaks that constraint.
Instead rematerialize in the predecessor block to the suspend's single
predecessor block.
Differential Revision: https://reviews.llvm.org/D104051
Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope.
We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D104937
With new pm becomes the default, the old-style test command becomes exactly the same as the new test command, i.e. the two commands are now redundant.
We should just delete the old command. (unless someone wants to add enable-new-pm=0 to all old commands.
Differential Revision: https://reviews.llvm.org/D104895
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.
-----
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.
Differential Revision: https://reviews.llvm.org/D104184
Coro-split functions with an active suspend point have their scope line set to
the line of the suspend point. However for compiler generated functions, this
results in debug info with unconventional results: a file named
`<compiler-generated>` with a non-zero line number. The convention for
`<compiler-generated>` is that the line number is zero.
This change propagates the scope line only for non-compiler generated
functions.
Differential Revision: https://reviews.llvm.org/D102412
Transfer the swiftasync attribute to the resume partial function according to
suspend.async specification. It's first argument denotes which argument is the
async context.
rdar://71499498
Differential Revision: https://reviews.llvm.org/D103285
D101841 added this test. It appears to generate different outcome on different platforms.
Make it to only call -coro-split instead of entire O2 pipeline to simplify the test flow.
Hope this will make the test more robust.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D103418
The current ad-hoc implementation used to determine whether a basic
block is unreachable doesn't work correctly in the general case (for
example it won't detect successors of unreachable blocks as
unreachable). This patch replaces it with the correct API that uses a
DominatorTree to answer the question correctly and quickly.
rdar://77181156
Differential Revision: https://reviews.llvm.org/D102963
Summary: The previous implementation of coro-split didn't collect values
used by dbg instructions into the spills which made a log debug info
unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In
this way, the debugbility of coroutine could be as powerful as normal
functions with optimization on.
To avoid enlarging the coroutine frame, this patch only collects
`dbg.value` whose value is already in the coroutine frame. This decision
may make some debug info getting unavailable. But if we are with
optimization on, the performance issue should be considered first. And
this patch would make the debugbility of coroutine to be better only
without changing the layout of the frame.
Test-plan: check-llvm
Reviewed By: aprantl, lxfind
Differential Revision: https://reviews.llvm.org/D97673
Summary: This patch tries to build debug info for coroutine frame in the
middle end. Although the coroutine frame is constructed and maintained by
the compiler and the programmer shouldn't care about the coroutine frame
by the design of C++20 coroutine,
a lot of programmers told me that they want to see the layout of the
coroutine frame strongly. Although C++ is designed as an abstract layer
so that the programmers shouldn't care about the actual memory in bits,
many experienced C++ programmers are familiar with assembler and
debugger to see the memory layout in fact, After I was been told they
want to see the coroutine frame about 3 times, I think it is an actual
and desired demand.
However, the debug information is constructed in the front end and
coroutine frame is constructed in the middle end. This is a natural and
clear gap. So I could only try to construct the debug information in the
middle end after coroutine frame constructed. It is unusual, but we are
in consensus that the approch is the best one.
One hard part is we need construct the name for variables since there
isn't a map from llvm variables to DIVar. Then here is the strategy this
patch uses:
- The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are
constructed by the patch.
- Then the name `__promise` comes from the dbg.variable of corresponding
dbg.declare of PromiseAlloca, which shows highest priority to
construct the debug information for the member of coroutine frame.
- Then if the member is struct, we would try to get the name of the llvm
struct directly. Then replace ':' and '.' with '_' to make it
printable for debugger.
- If the member is a basic type like integer or double, we would try to
emit the corresponding name.
- Then if the member is a Pointer Type, we would add `Ptr` after
corresponding pointee type.
- Otherwise, we would name it with 'UnknownType'.
Reviewered by: lxfind, aprantl, rjmcall, dblaikie
Differential Revision: https://reviews.llvm.org/D99179
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=49916.
When the size of an alloca is 0, it will trigger an assertion in OptimizedStructLayout when being added to the frame.
Fix it by not adding it at all. We return index 0 (beginning of the frame) for all 0-sized allocas.
Differential Revision: https://reviews.llvm.org/D101841
Summary: The original logic seems to be we could collecting a CoroBegin
if one of the terminators could be dominated by one of coro.destroy,
which doesn't make sense.
This patch rewrites the logics to collect CoroBegin if all of
terminators are dominated by one coro.destroy. If there is no such
coro.destroy, we would call hasEscapePath to evaluate if we should
collect it.
Test Plan: check-llvm
Reviewed by: lxfind
Differential Revision: https://reviews.llvm.org/D100614
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.
Reviewed By: rjmccall, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D100282
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
Differential Revision: https://reviews.llvm.org/D100282
This patch updates the linkage name in the DISubprogram of coro-split
functions, which is particularly important for Swift, where the
funclets have a special name mangling. This patch does not affect C++
coroutines, since the DW_AT_specification is expected to hold the
(original) linkage name. I believe this is mostly due to limitations
in AsmPrinter, so we might be able to relax this restriction in the
future.
Differential Revision: https://reviews.llvm.org/D99693
LLVM test Transforms/Coroutine/coro-split-sink-lifetime-O2.ll tries to
check for the absence of a sequence of instructions with several
CHECK-NOT with one of those directives using a variable defined in
another. However CHECK-NOT are checked independently so that is using a
variable defined in a pattern that should not occur in the input.
This commit simplifies the CHECK-NOT block to only check for the
presence of any lifetime start marker since that is effectively what
the test was testing at the moment.
Reviewed By: junparser
Differential Revision: https://reviews.llvm.org/D99856
Summary: Try to insert dbg.declare to entry.resume basic block in resume
function. In this way, we could print alloca such as __promise in
gdb/lldb under O2, which would be beneficial to debug coroutine program.
Test Plan: check-llvm
Reviewed by: aprantl
Differential Revision: https://reviews.llvm.org/D96938
This patch updates the scope line to point to the suspend point. This
makes the first address in the function point to the first source line
in the resume function rather than the function declaration. Without
this the line table "jumps" from the beginning of the function to the
suspend point at the beginning.
rdar://73386346
Differential Revision: https://reviews.llvm.org/D97345
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.
This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.
Differential Revision: https://reviews.llvm.org/D96928
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument
Differential Revision: https://reviews.llvm.org/D97333
This doesn't actually reproduce with a dbg.declare(i8* null, ...)
which produces a non-null null Value, but I have seen this show up in
crash logs. I'm suspecting that there may be another pass forcibly
setting the operand to a nullptr.
In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.
Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:
1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.
Differential Revision: https://reviews.llvm.org/D96922
The new intrinsic replaces the size in one specified AsyncFunctionPointer with
the size in another. This ability is necessary for functions which merely
forward to async functions such as those defined for partial applications.
Reviewed By: aschwaighofer
Differential Revision: https://reviews.llvm.org/D97229
As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute.
Differential Revision: https://reviews.llvm.org/D96926