It is a known problem that we can't align the switch-based coroutine
frame if the alignment exceeds std::max_align_t (which is 16 usually).
We could solve the problem on the middle-end by dynamically transforming
or in the frontend by emitting aligned allocation function.
If we need to solve it in the frontend, the middle end need to offer an
intrinsic to tell the alignment at least. This patch tries to offer such
an intrinsic called llvm.coro.align.
Reviewed By: https://reviews.llvm.org/D117542
Differential revision: https://reviews.llvm.org/D117542
This fixes bug49888. The root cause for this is that
simplifyTerminatorLeadingToRet didn't handle lifetime markers well.
Another issue also noted in D116327 is that we deleted some inlined
optimization pass in CoroSplit so that simplifyTerminatorLeadingToRet
need to remove dead instructions by hand.
This patch fixes bug49888 by skipping lifetime markers and bitcast
instruction and removing dead instructions by hand in
simplifyTerminatorLeadingToRet.
Reviewed By: junparser
Differential Revision: https://reviews.llvm.org/D116330
This fixes bug52896.
Simply, some symmetric transfer optimization chances get invalided due
to we delete some inlined optimization passes in 822b92a. This would
cause stack-overflow in some situations which should be avoided by the
design of coroutine. This patch tries to fix this by transforming the
constant CmpInst instruction which was done in the deleted passes.
Reviewed By: rjmccall, junparser
Differential Revision: https://reviews.llvm.org/D116327
This fixes bug49264.
Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.
This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.
Reviewed By: rjmccall, ezhulenev
Differential Revision: https://reviews.llvm.org/D115790
According to [dcl.fct.def.coroutine]/p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.
But this is not implemented in clang before. This patch would implement
this feature by marking the coroutine as done at the place of
coro.end(frame, /*InUnwindPath=*/true ).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115219
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.
Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115222
Infinite loops can lead to an IR representation where the lifetime.end
intrinsice is missing. The code to do lifetime based optimization then
fails to see that an address escapes (is life) accross a supspend.
Eventually, we could detect such situations and disable it under more narrow
circumstances. For now, do the correct thing.
rdar://83635953
Differential Revision: https://reviews.llvm.org/D110949
Since coroutine would be splitted into pieces, compiler would move the
dbg.declare intrinsic after the Storage is created to make sure the
corresponding dbg instruction is still available aftet splitted.
However, it would be problematic if the storage instruction is an
InvokeInst, which is a terminator. We couldn't move instruction after an
InvokeInst. This patch tries to move the dbg.declare intrinsic in the
normal destination of the InvokeInst. It should make sense due to the
Storage should be invalid in exception path.
We might emit functions that are private and never called. The coro
split pass only processes functions that might be called. Remove
intrinsics that we can't generate code for.
rdar://84619859
Differential Revision: https://reviews.llvm.org/D114021
When I playing with Coroutines, I found that it is possible to generate
following IR:
```
%struct = alloca ...
%sub.element = getelementptr %struct, i64 0, i64 index ; index is not
%zero
lifetime.marker.start(%sub.element)
% use of %sub.element
lifetime.marker.end(%sub.element)
store %struct to xxx ; %struct is escaping!
<suspend points>
```
Then the AllocaUseVisitor would collect the lifetime marker for
sub.element and treat it as the lifetime markers of the alloca! So it
judges that the alloca could be put on the stack instead of the frame by
judging the lifetime markers only.
The root cause for the bug is that AllocaUseVisitor collects wrong
lifetime markers.
This patch fixes this.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D112216
In situations where the coroutine function is not split we can just
replace the async.resume by null.
rdar://82591919
Differential Revision: https://reviews.llvm.org/D110191
verifyFunction can be really slow on large functions. This can significantly slow down compilation in production.
Given that coroutine passes are fairly stable now, we should only run it in debug mode.
Differential Revision: https://reviews.llvm.org/D109198
The OutermostLoad condition is supposed to strip the outermost
DW_OP_deref operation because dbg.declares are implicitly
indirect. This patch makes sure the heuristic is only applied to
dbg.declare intrinsics and only if the outermost instruction is a
load.
This was found while qualifying the latest Swift compiler rebranch.
rdar://82037764
Unfortunatley the IR Verifier doesn't reject debug intrinsics that
have nullptr as arguments, so coro::salvageDebugInfo for now also
needs to deal with them.
rdar://81979541
This patch removes the hand-rolled implementation of salvageDebugInfo
for cast and GEPs and replaces it with a call into
llvm::salvageDebugInfoImpl().
A side-effect of this is that additional redundant convert operations
are introduced, but those don't have any negative effect on the
resulting DWARF expression.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107384
This patch refactors / simplifies salvageDebugInfoImpl(). The goal
here is to simplify the implementation of coro::salvageDebugInfo() in
a followup patch.
1. Change the return value to I.getOperand(0). Currently users of
salvageDebugInfoImpl() assume that the first operand is
I.getOperand(0). This patch makes this information explicit. A
nice side-effect of this change is that it allows us to salvage
expressions such as add i8 1, %a in the future.
2. Factor out the creation of a DIExpression and return an array of
DIExpression operations instead. This change allows users that
call salvageDebugInfoImpl() in a loop to avoid the costly
creation of temporary DIExpressions and to defer the creation of
a DIExpression until the end.
This patch does not change any functionality.
rdar://80227769
Differential Revision: https://reviews.llvm.org/D107383
We use the CurrentBlock to determine whether we have already processed a
block. Don't reuse this variable for setting where we should insert the
rematerialization. The rematerialization block is different to the
current block when we rematerialize for coro suspend block users.
Differential Revision: https://reviews.llvm.org/D107573
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
Before this patch we would normally use the ABI alignment which can be
to high for the context alginment.
For spilled values we don't need ABI alignment, since the frame entry's
address is not escaped.
rdar://79664965
Differential Revision: https://reviews.llvm.org/D105288
Code assumes that uses of single predecessor phis are not live accross
suspend points. Cleanup any single predecessor phis preceeding the code
making this assumption.
rdar://76020301
Differential Revision: https://reviews.llvm.org/D105488
The resume partial functions generated for swift suspend points will now
use a Swift mangling suffix.
Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_")
and suspend resume partial functions will use the suffix 'TY'[0-9]+'_'
(e.g "...TY1_").
Reviewed By: nate_chandler
Differential Revision: https://reviews.llvm.org/D104144
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095
Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html
In the existing design, An SCC that contains a coroutine will go through the folloing passes:
Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline
The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat.
As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended.
What we really wanted is this:
Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline
(note that we don't really need to run Inliner again on the ramp function after split).
Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function.
This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually.
By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines.
Reviewed By: aeubanks, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D95807