This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.
Notes:
- it also includes changes and tests from D90104
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D92887
Add an intrinsic type class to represent the
llvm.experimental.noalias.scope.decl intrinsic, to make code
working with it a bit nicer by hiding the metadata extraction
from view.
Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.
Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
292077072e this may not happen, if the
willreturn attribute is missing.
As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.
This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.
As shown in the test diffs, we could miscompile by
propagating flags that did not exist in the original
code.
The flags required for fmin/fmax reductions will be
fixed in a follow-up patch.
With the addition of the `willreturn` attribute, functions that may
not return (e.g. due to an infinite loop) are well defined, if they are
not marked as `willreturn`.
This patch updates `wouldInstructionBeTriviallyDead` to not consider
calls that may not return as dead.
This patch still provides an escape hatch for intrinsics, which are
still assumed as willreturn unconditionally. It will be removed once
all intrinsics definitions have been reviewed and updated.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94106
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.
There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.
Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.
But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined.
This patch includes some refactorings from D90104.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93040
This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.
When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.
This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.
(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)
The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue.
Differential Revision: https://reviews.llvm.org/D94378
Similar to binary operators like fadd/fmul/fsub, propagate shape info
through unary operators (fneg is the only one?).
Differential Revision: https://reviews.llvm.org/D95252
I have previously tried doing that in
b33fbbaa34 / d38205144f,
but eventually it was pointed out that the approach taken there
was just broken wrt how the uses of bonus instructions are updated
to account for the fact that they should now use either bonus instruction
or the cloned bonus instruction. In particluar, all that manual handling
of PHI nodes in successors was just wrong.
But, the fix is actually much much simpler than my initial approach:
just tell SSAUpdate about both instances of bonus instruction,
and let it deal with all the PHI handling.
Alive2 confirms that the reproducers from the original bugs (@pr48450*)
are now handled correctly.
This effectively reverts commit 59560e8589,
effectively relanding b33fbbaa34.
NewBonusInst just took name from BonusInst, so BonusInst has no name,
so BonusInst.getName() makes no sense.
So we need to ask NewBonusInst for the name.
This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX.
Differential Revision: https://reviews.llvm.org/D94710
If the call result is unused, we should let it get DCEd rather
than replacing it. Also, don't try to replace an existing sincos
with another one (unless it's as part of combining sin and cos).
This avoids an infinite combine loop if the calls are not DCEd
as expected, which can happen with D94106 and lack of willreturn
annotation in hand-crafted IR.
In the motivating cases from https://llvm.org/PR48816 ,
we have a trailing trunc. But that is not required to
reduce the abs width:
https://alive2.llvm.org/ce/z/ECaz-p
...as long as we clear the int-min-is-poison bit (nsw).
We have some existing tests that are affected, and I'm
not sure what the overall implications are, but in general
we favor narrowing operations over preserving nsw/nuw.
If that causes problems, we could restrict this transform
based on type (shouldChangeType() and/or vector vs. scalar).
Differential Revision: https://reviews.llvm.org/D95235
The existing code did not deal with atomic loads correctly. Such loads
are represented as MemoryDefs. Bail out on any MemoryAccess that is not
a MemoryUse.
Because we were not looking for the llvm.coro.id.async intrinsic in the
early coro pass which triggers follow-up passes we relied on the
llvm.coro.end intrinsic being present. This might not be the case in
functions that end in unreachable code.
Differential Revision: https://reviews.llvm.org/D95144
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
I'm intentionally structuring it this way, so that the actual fold only
does the fold, and no legality/correctness checks, all of which must be
done by the caller. This allows for the fold code to be more compact
and more easily grokable.
Hoist the successor updating out of the code that deals with branch
weight updating, and hoist the 'has weights' check from the latter,
making code more consistent and easier to follow.
While we already ignore uncond branches, we could still potentially
end up with a conditional branches with identical destinations
due to the visitation order, or because we were called as an utility.
But if we have such a disguised uncond branch,
we still probably shouldn't deal with it here.
The case where BB ends with an unconditional branch,
and has a single predecessor w/ conditional branch
to BB and a single successor of BB is exactly the pattern
SpeculativelyExecuteBB() transform deals with.
(and in this case they both allow speculating only a single instruction)
Well, or FoldTwoEntryPHINode(), if the final block
has only those two predecessors.
Here, in FoldBranchToCommonDest(), only a weird subset of that
transform is supported, and it's glued on the side in a weird way.
In particular, it took me a bit to understand that the Cond
isn't actually a branch condition in that case, but just the value
we allow to speculate (otherwise it reads as a miscompile to me).
Additionally, this only supports for the speculated instruction
to be an ICmp.
So let's just unclutter FoldBranchToCommonDest(), and leave
this transform up to SpeculativelyExecuteBB(). As far as i can tell,
this shouldn't really impact optimization potential, but if it does,
improving SpeculativelyExecuteBB() will be more beneficial anyways.
Notably, this only affects a single test,
but EarlyCSE should have run beforehand in the pipeline,
and then FoldTwoEntryPHINode() would have caught it.
This reverts commit rL158392 / commit d33f4efbfd.
Walking the use list of a Constant (particularly, ConstantData)
is not scalable, since a given constant may be used by many
instructinos in many functions in many modules.
Differential Revision: https://reviews.llvm.org/D94713
I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost
that prevented a cost being calculated for select instructions when using
scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost
to only do special cost calculations for fixed width vectors and fall
back to the base version for scalable vectors.
I have added a simple cost model test for cmps and selects:
test/Analysis/CostModel/sve-cmpsel.ll
and some simple tests that show we vectorize loops with cmp and select:
test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
Differential Revision: https://reviews.llvm.org/D95039
This adds cost modelling for the inloop vectorization added in
745bf6cf44. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:
%sa = sext <16 x i8> A to <16 x i32>
%sb = sext <16 x i8> B to <16 x i32>
%m = mul <16 x i32> %sa, %sb
%r = vecreduce.add(%m)
->
R = VMLADAV A, B
There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.
Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.
This patch attempt to improve the costmodelling of in-loop reductions
by:
- Adding some pattern matching in the loop vectorizer cost model to
match extended reduction patterns that are optionally extended and/or
MLA patterns. This marks the cost of the reduction instruction correctly
and the sext/zext/mul leading up to it as free, which is otherwise
difficult to tell and may get a very high cost. (In the long run this
can hopefully be replaced by vplan producing a single node and costing
it correctly, but that is not yet something that vplan can do).
- getExtendedAddReductionCost is added to query the cost of these
extended reduction patterns.
- Expanded the ARM costs to account for these expanded sizes, which is a
fairly simple change in itself.
- Some minor alterations to allow inloop reduction larger than the highest
vector width and i64 MVE reductions.
- An extra InLoopReductionImmediateChains map was added to the vectorizer
for it to efficiently detect which instructions are reductions in the
cost model.
- The tests have some updates to show what I believe is optimal
vectorization and where we are now.
Put together this can greatly improve performance for reduction loop
under MVE.
Differential Revision: https://reviews.llvm.org/D93476
In LoopInterchange, `findInnerReductionPhi()` looks for reduction
variables, which cannot be constants. Update it to return early in that
case.
This also addresses a blocker for removing use-lists from ConstantData,
whose users could be spread across arbitrary modules in the same
LLVMContext.
Differential Revision: https://reviews.llvm.org/D94712
This is NFC-intended and removes the "OperationData"
class which had become nothing more than a recurrence
(reduction) type.
I adjusted the matching logic to distinguish
instructions from non-instructions - that's all that
the "IsLeafValue" member was keeping track of.
If a function doesn't contain loops and does not call non-willreturn
functions, then it is willreturn. Loops are detected by checking
for backedges in the function. We don't attempt to handle finite
loops at this point.
Differential Revision: https://reviews.llvm.org/D94633
In https://llvm.org/PR48810 , we are crashing while trying to
propagate attributes from mempcpy (returns void*) to memcpy
(returns nothing - void).
We can avoid the crash by removing known incompatible
attributes for the void return type.
I'm not sure if this goes far enough (should we just drop all
attributes since this isn't the same function?). We also need
to audit other transforms in LibCallSimplifier to make sure
there are no other cases that have the same problem.
Differential Revision: https://reviews.llvm.org/D95088
This patch applies the idea from D93734 to LoopUnswitch.
It adds support for unswitching on conditions that are only
invariant along certain paths through a loop.
In particular, it targets conditions in the loop header that
depend on values loaded from memory. If either path from
the true or false successor through the loop does not modify
memory, perform partial loop unswitching.
That is, duplicate the instructions feeding the condition in the pre-header.
Then unswitch on the duplicated condition. The condition is now known
in the unswitched version for the 'invariant' path through the original loop.
On caveat of this approach is that one of the loops created can be partially
unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable`
metadata is added to the unswitched loops, to avoid subsequent partial
unswitching.
If that's the approach to go, I can move the code handling the metadata kind
into separate functions.
This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 &
MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp
```
Tests: 236
Same hash: 170 (filtered out)
Remaining: 66
Metric: loop-unswitch.NumBranches
Program base patch diff
test-suite...000/255.vortex/255.vortex.test 2.00 23.00 1050.0%
test-suite...T2006/401.bzip2/401.bzip2.test 7.00 55.00 685.7%
test-suite :: External/Nurbs/nurbs.test 5.00 26.00 420.0%
test-suite...s-C/unix-smail/unix-smail.test 1.00 3.00 200.0%
test-suite.../Prolangs-C++/ocean/ocean.test 1.00 3.00 200.0%
test-suite...tions/lambda-0.1.3/lambda.test 1.00 3.00 200.0%
test-suite...yApps-C++/PENNANT/PENNANT.test 2.00 5.00 150.0%
test-suite...marks/Ptrdist/yacr2/yacr2.test 1.00 2.00 100.0%
test-suite...lications/viterbi/viterbi.test 1.00 2.00 100.0%
test-suite...plications/d/make_dparser.test 12.00 24.00 100.0%
test-suite...CFP2006/433.milc/433.milc.test 14.00 27.00 92.9%
test-suite.../Applications/lemon/lemon.test 7.00 12.00 71.4%
test-suite...ce/Applications/Burg/burg.test 6.00 10.00 66.7%
test-suite...T2006/473.astar/473.astar.test 16.00 26.00 62.5%
test-suite...marks/7zip/7zip-benchmark.test 78.00 121.00 55.1%
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93764
This reverts commit d97f776be5.
The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
This is related to D94982. We want to call these APIs from the Analysis
component, so we can't leave them under Transforms.
Differential Revision: https://reviews.llvm.org/D95079
Branch/assume conditions in PredicateInfo are currently handled in
a rather ad-hoc manner, with some arbitrary limitations. For example,
an `and` of two `icmp`s will be handled, but an `and` of an `icmp`
and some other condition will not. That also includes the case where
more than two conditions and and'ed together.
This patch makes the handling more general by looking through and/ors
up to a limit and considering all kinds of conditions (though operands
will only be taken for cmps of course).
Differential Revision: https://reviews.llvm.org/D94447
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.
This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.
Differential Revision: https://reviews.llvm.org/D94982
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.
More streamlining is possible, but I'm trying to avoid
logic/typo bugs while fixing this. Eventually, we should
not need the `OperationData` class.
We were able to remove almost all of the state from
OperationData, so these don't make sense as members
of that class - just pass the RecurKind in as a param.
Loop peeling assumes that the loop's latch is a conditional branch. Add
a check to canPeel that explicitly checks for this, and testcases that
otherwise fail an assertion when trying to peel a loop whose back-edge
is a switch case or the non-unwind edge of an invoke.
Reviewed By: skatkov, fhahn
Differential Revision: https://reviews.llvm.org/D94995
Summary: This is to address bug48712.
The solution in this patch is that when we want to merge two variable a
into the storage frame of variable b only if the alignment of a is
multiple of b.
There may be other strategies. But now I think they are hard to handle
and benefit little. Or we can implement them in the future.
Test-plan: check-llvm
Reviewers: jmorse, lxfind, junparser
Differential Revision: https://reviews.llvm.org/D94891
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:
```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p) ; instcombine converts this to call void @f(nonnull %p)
```
Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.
```
define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore
call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
ret void
}
```
Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90529
separate sections.
For ThinLTO, all the function profiles without context has been annotated to
outline functions if possible in prelink phase. In postlink phase, profile
annotation in postlink phase is only meaningful for function profile with
context. If the profile is large, it is better to split the profile into two
parts, one with context and one without, so the profile reading in postlink
phase only has to read the part with context. To have the profile splitting,
we extend the ExtBinary format to support different section arrangement. It
will be flexible to add other section layout in the future without the need
to create new class inheriting from ExtBinary class.
Differential Revision: https://reviews.llvm.org/D94435
The pass has dependency on 'TargetTransformInfoWrapperPass', but the
corresponding call to INITIALIZE_PASS_DEPENDENCY was missing.
Differential Revision: https://reviews.llvm.org/D94916
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).
Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.
-----
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.
This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.
Among other things, this also subsumes D94860.
Differential Revision: https://reviews.llvm.org/D94862
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93042
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.
> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862
This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:
> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.
The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.
This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.
I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.
From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.
There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).
There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.
I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.
Reviewed By: sanwou01
Differential Revision: https://reviews.llvm.org/D94232
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.
Differential Revision: https://reviews.llvm.org/D94934
A previous patch has already changed getInstructionCost to return
an InstructionCost type. This patch changes the other various
getXXXCost functions to return an InstructionCost too. This is a
non-functional change - I've added a few asserts that the costs
are valid in places where we're selecting between vector call
and intrinsic costs. However, since we don't yet return invalid
costs from any of the TTI implementations these asserts should
not fire.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D94065
This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of
(x == C1 || x == C2 || ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible.
D93065 has more context about the transition, including links to the list of optimizations being updated.
Differential Revision: https://reviews.llvm.org/D93943
After much refactoring over the last 2 weeks to the reduction
matching code, I think this change is finally ready.
We effectively broke fmax/fmin vector reduction optimization
when we started canonicalizing to intrinsics in instcombine,
so this should restore that functionality for SLP.
There are still FMF problems here as noted in the code comments,
but we should be avoiding miscompiles on those for fmax/fmin by
restricting to full 'fast' ops (negative tests are included).
Fixing FMF propagation is a planned follow-up.
Differential Revision: https://reviews.llvm.org/D94913
This patch adds the default value of 1 to drop_begin.
In the llvm codebase, 70% of calls to drop_begin have 1 as the second
argument. The interface similar to with std::next should improve
readability.
This patch converts a couple of calls to drop_begin as examples.
Differential Revision: https://reviews.llvm.org/D94858
This is to address https://bugs.llvm.org/show_bug.cgi?id=48626.
When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal.
We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call
actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is
generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway.
Differential Revision: https://reviews.llvm.org/D94834
This will avoid confusion once we start matching
min/max intrinsics. All of these hacks to accomodate
cmp+sel idioms should disappear once we canonicalize
to min/max intrinsics.
The icmp opcode is now hard-coded in the cost model call.
This will make it easier to eventually remove all opcode
queries for min/max patterns as we transition to intrinsics.
This patch marks some library functions as willreturn. On the first pass, I
excluded most functions that interact with streams/the filesystem.
Along with willreturn, it also adds nounwind to a set of math functions.
There probably are a few additional attributes we can add for those, but
that should be done separately.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94684
This patch changes these functions:
vectorizeLoadInsert
isExtractExtractCheap
foldExtractedCmps
scalarizeBinopOrCmp
getShuffleExtract
foldBitcastShuf
to use the class InstructionCost when calling TTI.get<something>Cost().
This patch is part of a series of patches to use InstructionCost instead of
unsigned/int for the cost model functions.
See this thread for context:
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
See this patch for the introduction of the type:
https://reviews.llvm.org/D91174
ps.:This patch adds the test || !NewCost.isValid(), because we want to
return false when:
!NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive
and
!NewCost.isValid() && OldCost.isValid()
Therefore for simplication we only add test for !NewCost.isValid()
Differential Revision: https://reviews.llvm.org/D94069
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.
This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.
Among other things, this also subsumes D94860.
Differential Revision: https://reviews.llvm.org/D94862
When removing catchpad's from catchswitch, if that removes a successor,
we need to record that in DomTreeUpdater.
This fixes PostDomTree preservation failure in an existing test.
This appears to be the single issue that i see in my current test coverage.
This is NFC-intended and another step towards supporting
intrinsics as reduction candidates.
The remaining bits of the OperationData class do not make
much sense as-is, so I will try to improve that, but I'm
trying to take minimal steps because it's still not clear
how this was intended to work.
This is another NFC-intended patch to allow matching
intrinsics (example: maxnum) as candidates for reductions.
It's possible that the loop/if logic can be reduced now,
but it's still difficult to understand how this all works.
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
To get into this block we had: !A || B || C
and we checked C in the first 'if' clause
leaving !A || B. But the 2nd 'if' is checking:
A && !B --> !(!A || B)
DestBB might or might not already be a successor of SelectBB,
and it wasn't we need to ensure that we record the fact in DomTree.
The testcase used to crash in lazy domtree updater mode + non-per-function
domtree validity checks disabled.
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
Even though not all it's users operate on DomTreeUpdater,
it itself internally operates on DomTreeUpdater,
so it must mean everything is fine with that,
so just do that globally.
Summary:
Set the default for the option enabling memory ssa use in the loop sink
pass to true for the new pass manager.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D92486
This is NFC-intended. I'm still trying to figure out
how the loop where this is used works. It does not
seem like we require this data at all, but it's
hard to confirm given the complicated predicates.
In the spirit of commit fc783e91e0 (llvm-svn: 248943) we
shouldn't vectorize stores of non-packed types (i.e. types that
has padding between consecutive variables in a scalar layout,
but being packed in a vector layout).
The problem was detected as a miscompile in a downstream test case.
Reviewed By: anton-afanasyev
Differential Revision: https://reviews.llvm.org/D94446
When building with GCC 10, the following warning is reported:
```
/llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable]
1527 | if (CatchSwitchInst *CS =
```
This change adds a cast to `void` to avoid the warning.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94456
to Pass.h.
In some compiler passes like SampleProfileLoaderPass, we want to know which
LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum
class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and
it also cannot represent phase in full LTO. The patch extends it to include
full LTO phases and move it from PassBuilder.h to Pass.h, then it is much
easier for PassBuilder to communiate with each pass about current LTO phase.
Differential Revision: https://reviews.llvm.org/D94613
In commit 700d2417d8 the CodeExtractor
was updated so that bitcasts that have lifetime markers that beginning
outside of the region are deduplicated outside the region and are not
used as an output. This caused a discrepancy in the IROutliner, where
in these cases there were arguments added to the aggregate function
that were not needed causing assertion errors.
The IROutliner queries the CodeExtractor twice to determine the inputs
and outputs, before and after `findAllocas` is called with the same
ValueSet for the outputs causing the duplication. This has been fixed
with a dummy ValueSet for the first call.
However, the additional bitcasts prevent us from using the same
similarity relationships that were previously defined by the
IR Similarity Analysis Pass. In these cases, we check whether the
initial version of the region being analyzed for outlining is still the
same as it was previously. If it is not, i.e. because of the additional
bitcast instructions from the CodeExtractor, we discard the region.
Reviewers: yroux
Differential Revision: https://reviews.llvm.org/D94303
We can fold a ? b : false to a & b if is_poison(b) implies that
is_poison(a), at which point we're able to reuse all the usual fold
on ands. In particular, this covers the very common case of
icmp X, C && icmp X, C'. The same applies to ors.
This currently only has an effect if the
-instcombine-unsafe-select-transform=0 option is set.
Differential Revision: https://reviews.llvm.org/D94550
In places where we calculate costs using TTI.getXXXCost() interfaces
I have changed the code to use InstructionCost instead of unsigned.
The change is non functional since InstructionCost behaves in the
same way as an integer for valid costs. Currently the getXXXCost()
functions used in this file do not return invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential revision: https://reviews.llvm.org/D94484
promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D94137
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.
Differential Revision: https://reviews.llvm.org/D94372
Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D94455
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator
The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.
The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.
Testing:
ninja check-llvm
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D94333
LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here.
Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.
This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317).
The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis.
A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates)
Differential Revision: https://reviews.llvm.org/D93725
Similar to D94125, derive `willreturn` for functions that are `readonly` and
`mustprogress` in FunctionAttrs.
To quote the reasoning from D94125:
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.
Reviewed By: jdoerfert, nikic
Differential Revision: https://reviews.llvm.org/D94502
Added a utility function in Value class to print block name and use
block labels for unnamed blocks.
Changed LICM to call this function in its debug output.
Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>
Differential Revision: https://reviews.llvm.org/D93577
This boils down to how we deal with early-increment iterator
over function's basic blocks: not only we need to early-increment,
after that we also need to skip all the blocks
that are scheduled for removal, as per DomTreeUpdater.
Thus supporting lazy DomTreeUpdater mode,
where the domtree updates (and thus block removals)
aren't applied immediately, but are delayed
until last possible moment.
When DomTreeUpdater is in lazy update mode, the blocks
that were scheduled to be removed, won't be removed
until the updates are flushed, e.g. by asking
DomTreeUpdater for a up-to-date DomTree.
From the function's current code, it is pretty evident
that the support for the lazy mode is an afterthought,
see e.g. how we roll-back NumRemoved statistic..
So instead of considering all the unreachable blocks
as the blocks-to-be-removed, simply additionally skip
all the blocks that are already scheduled to be removed
When we are adding edges to the terminator and potentially turning it
into a switch (if it wasn't already), it is possible that the
case we're adding will share it's destination with one of the
preexisting cases, in which case there is no domtree edge to add.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
BB was already always branching to EdgeBB, there is no edge to add.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
SI is the terminator of BB, so the edge we are adding obviously already existed.
Indeed, this change does not have a test coverage change.
This failure has been exposed in an existing test coverage
by a follow-up patch that switches to lazy domtreeupdater mode,
and removes domtree verification from
SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(),
IOW it does not appear feasible to add dedicated test coverage here.
Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93747
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640
There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.
Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.
Name: ugt
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ugt i8 %a, C2
=>
%r = icmp slt i8 %x, 0
Name: ult
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ult i4 %a, C2
=>
%r = icmp sgt i4 %x, -1
Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCczhttps://alive2.llvm.org/ce/z/__szVL
Differential Revision: https://reviews.llvm.org/D94014
Please see D93747 for more context which tries to make linkage names of internal
linkage functions to be the uniqueified names. This causes a problem with gdb
because breaking using the demangled function name will not work if the new
uniqueified name cannot be demangled. The problem is the generated suffix which
is a mix of integers and letters which do not demangle. The demangler accepts
either all numbers or all letters. This patch simply converts the hash to decimal.
There is no loss of uniqueness by doing this as the precision is maintained.
The symbol names get longer by a few characters though.
Differential Revision: https://reviews.llvm.org/D94154
The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90909
This patch unifies the way recipes and VPValues are printed after the
transition to VPDef.
VPSlotTracker has been updated to iterate over all recipes and all
their defined values to number those. There is no need to number
values in Value2VPValue.
It also updates a few places that only used slot numbers for
VPInstruction. All recipes now can produce numbered VPValues.
This patch fixes a bug that could result in miscompiles (at least
in an OOT target). The problem could be seen by adding checks that
the DominatorTree used in BasicAliasAnalysis and ValueTracking was
valid (e.g. by adding DT->verify() call before every DT dereference
and then running all tests in test/CodeGen).
Problem was that the LegacyPassManager calculated "last user"
incorrectly for passes such as the DominatorTree when not telling
the pass manager that there was a transitive dependency between
the different analyses. And then it could happen that an incorrect
dominator tree was used when doing alias analysis (which was a pretty
serious bug as the alias analysis result could be invalid).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94138
This patch is part of a series of patches that migrate integer
instruction costs to use InstructionCost. In the function
selectVectorizationFactor I have simply asserted that the cost
is valid and extracted the value as is. In future we expect
to encounter invalid costs, but we should filter out those
vectorization factors that lead to such invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D92178
Loop peeling as a last step triggers loop simplification and this
can change the loop structure. As a result all cashed values like
latch branch becomes invalid.
Patch re-structure the code to take into account the possible
changes caused by peeling.
Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: Meinersbur, fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D93686
This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures in 7c63aac), with the additional restriction added to the transform to only consider outer most loops.
As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better.
Original commit message follows...
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906