This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
...
TE->setOperand(i, Operands);
buildTree_rec(Operands, Depth + 1,...);
}
Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.
Differential Revision: https://reviews.llvm.org/D75296
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
This is a second commit after a revert
https://reviews.llvm.org/rG4569b3a86f8a4b1b8ad28fe2321f936f9d7ffd43 and a fix
https://reviews.llvm.org/rG41e06ae7ba91.
Differential Revision: https://reviews.llvm.org/D69591
Summary: This fixes the crash that led to the revert of D69591.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75307
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.
Differential Revision: https://reviews.llvm.org/D75276
Summary:
Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.
The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.
Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct. This is controlled by a new option,
-inline-caller-superset-nobuiltin, which is enabled by default.
Reviewers: hfinkel, gchatelet, chandlerc, davidxl
Subscribers: arsenm, jvesely, nhaehnle, mehdi_amini, eraman, hiraditya, haicheng, dexonsmith, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74162
Summary:
This patch makes EarlyCSE fold equivalent freeze instructions.
Another optimization that I think will be useful is to remove freeze if its operand is used as a branch condition or at llvm.assume:
```
%c = ...
br i1 %c, label %A, ..
A:
%d = freeze %c ; %d can be optimized to %c because %c cannot be poison or undef (or 'br %c' would be UB otherwise)
```
If it make sense for EarlyCSE to support this as well, I will make a patch for this.
Reviewers: spatel, reames, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75334
SROA will drop the explicit alignment on allocas when the ABI guarantees
enough alignment. Because the alignment on new load/store instructions
are set based on the alloca's alignment, that means SROA would end up
dropping the alignment from atomic loads and stores, which is not
allowed (see bug). For those, make sure to always carry over the
alignment from the previous instruction.
Differential revision: https://reviews.llvm.org/D75266
Summary:
For now just insert the callback for stores, similar to how MSan tracks
origins. In the future we may want to add callbacks for loads, memcpy,
function calls, CMPs, etc.
Reviewers: pcc, vitalybuka, kcc, eugenis
Reviewed By: vitalybuka, kcc, eugenis
Subscribers: eugenis, hiraditya, #sanitizers, llvm-commits, kcc
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D75312
DSE would mistakenly remove store (2):
a = calloc(n+1)
for (int i = 0; i < n; i++) {
store 1, a[i+1] // (1)
store 0, a[i] // (2)
}
The fix is to do PHI transaltion while looking for clobbering
instructions between the store and the calloc.
Reviewed By: efriedma, bjope
Differential Revision: https://reviews.llvm.org/D68006
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.
To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.
This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.
This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.
Differential Revision: https://reviews.llvm.org/D75008
Use UnaryOperator::CreateFNeg instead.
Summary:
With the introduction of the native fneg instruction, the
fsub -0.0, %x idiom is obsolete. This patch makes LLVM
emit fneg instead of the idiom in all places.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D75130
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.
This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).
Differential revision: https://reviews.llvm.org/D74944
As suggested in D75145 -
I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.
We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.
Differential Revision: https://reviews.llvm.org/D75204
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)
Differential Revision: https://reviews.llvm.org/D72169
InstCombine removes pairs of start+end intrinsics that don't
have anything in between them. Currently this is done by starting
at the start intrinsic and scanning forwards. This patch changes
it to start at the end intrinsic and scan backwards.
The motivation here is as follows: When we process the start
intrinsic, we have not yet looked at the following instructions,
which may still get folded/removed. If they do, we will only be
able to remove the start/end pair on the next iteration. When we
process the end intrinsic, all the instructions before it have
already been visited, and we don't run into this problem.
Differential Revision: https://reviews.llvm.org/D75011
DevirtSCCRepeatedPass iteration. Needs ReviewPublic
This aims to fix a missed inlining case.
If there's a virtual call in the callee on an alloca (stack allocated object) in
the caller, and the callee is inlined into the caller, the post-inline cleanup
would devirtualize the virtual call, but if the next iteration of
DevirtSCCRepeatedPass doesn't happen (under the new pass manager), which is
based on a heuristic to determine whether to reiterate, we may miss inlining the
devirtualized call.
This enables inlining in clang/test/CodeGenCXX/member-function-pointer-calls.cpp.
See discussion on PR44792.
This reverts commit 02ce9d8ef5.
It also reverts the follow-up commits
8f46269f0 "[profile] Don't dump counters when forking and don't reset when calling exec** functions"
62c7d8402 "[profile] gcov_mutex must be static"
Summary:
Loop unswitch hoists branches on loop-invariant conditions. However, if this
condition is poison/undef and the branch wasn't originally reachable, loop
unswitch introduces UB (since the optimized code will branch on poison/undef and
the original one didn't)).
We fix this problem by freezing the condition to ensure we don't introduce UB.
We will now transform the following:
while (...) {
if (C) { A }
else { B }
}
Into:
C' = freeze(C)
if (C') {
while (...) { A }
} else {
while (...) { B }
}
This patch fixes the root cause of the following bug reports (which use the old loop unswitch, but can be reproduced with minor changes in the code and -enable-nontrivial-unswitch):
- https://llvm.org/bugs/show_bug.cgi?id=27506
- https://llvm.org/bugs/show_bug.cgi?id=31652
Reviewers: reames, majnemer, chenli, sanjoy, hfinkel
Reviewed By: reames
Subscribers: hiraditya, jvesely, nhaehnle, filcab, regehr, trentxintong, nlopes, llvm-commits, mzolotukhin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D29015
If we deduplicate OpenMP runtime calls we have multiple `ident_t*` that
represent information like source location. So far, we simply kept the
one used by the replacement call. However, as exposed by PR44893, that
can cause problems if we have stack allocated `ident_t` objects. While
we need to revisit the use of these as well, it is clear that we
eventually want to merge source location information in some way. With
this patch we add the infrastructure to do so but without doing the
actual merge. Instead we pick a global `ident_t` from the replaced
calls, if possible, or create a new one with an unknown location
instead.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74925
body
We started seeing cases where ARC optimizer would move retain calls into
loop bodies, causing imbalance in the number of retain and release
calls, after changes were made to delete inert ARC calls since the inert
calls that used to block code motion are gone.
Fix the bug by setting the CFG hazard flag when visiting a loop header.
rdar://problem/56908836
Summary:
Replacing uses of IV outside of the loop is likely generally useful,
but `rewriteLoopExitValues()` is cautious, and if it isn't told to always
perform the replacement, and there are hard uses of IV in loop,
it doesn't replace.
In [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]],
that prevents `-indvars` from replacing uses of induction variable
after the loop, which might be one of the optimization failures
preventing that code from being vectorized.
Instead, now that the cost model is fixed, i believe we should be
a little bit more optimistic, and also perform replacement
if we believe it is within our budget.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44668 | PR44668 ]].
Reviewers: reames, mkazantsev, asbirlea, fhahn, skatkov
Reviewed By: mkazantsev
Subscribers: nikic, hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73501
Summary:
In future patches`SCEVExpander::isHighCostExpansionHelper()` will respect the budget allocated by performing TTI cost modelling.
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73705
Summary:
Future patches will make use of TTI to perform cost-model-driven `SCEVExpander::isHighCostExpansionHelper()`
This is a fully NFC patch to make things reviewable.
Reviewers: reames, mkazantsev, wmi, sanjoy
Reviewed By: mkazantsev
Subscribers: hiraditya, zzheng, javed.absar, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73704
This reverts commit 8d22100f66.
There was a functional regression reported (https://bugs.llvm.org/show_bug.cgi?id=44996). I'm not actually sure the patch is wrong, but I don't have time to investigate currently, and this line of work isn't something I'm likely to get back to quickly.
Much like with reassociateShiftAmtsOfTwoSameDirectionShifts(),
as input, we have the following pattern:
icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0
We want to rewrite that as:
icmp eq/ne (and (x shift (Q+K)), y), 0 iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
As input, we have the following pattern:
Sh0 (Sh1 X, Q), K
We want to rewrite that as:
Sh x, (Q+K) iff (Q+K) u< bitwidth(x)
While we know that originally (Q+K) would not overflow
(because 2 * (N-1) u<= iN -1), we may have looked past extensions of
shift amounts. so it may now overflow in smaller bitwidth.
To ensure that does not happen, we need to ensure that the total maximal
shift amount is still representable in that smaller bitwidth.
If the overflow would happen, (Q+K) u< bitwidth(x) check would be bogus.
https://bugs.llvm.org/show_bug.cgi?id=44802
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.
We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.
We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.
AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).
Differential Revision: https://reviews.llvm.org/D75062
This patch adds bindings to C and Go for
addCoroutinePassesToExtensionPoints, which is used to add coroutine
passes to the correct locations in PassManagerBuilder.
Differential Revision: https://reviews.llvm.org/D51642
Summary:
There is no need to write out gcdas when forking because we can just reset the counters in the parent process.
Let say a counter is N before the fork, then fork and this counter is set to 0 in the child process.
In the parent process, the counter is incremented by P and in the child process it's incremented by C.
When dump is ran at exit, parent process will dump N+P for the given counter and the child process will dump 0+C, so when the gcdas are merged the resulting counter will be N+P+C.
About exec** functions, since the current process is replaced by an another one there is no need to reset the counters but just write out the gcdas since the counters are definitely lost.
To avoid to have lists in a bad state, we just lock them during the fork and the flush (if called explicitely) and lock them when an element is added.
Reviewers: marco-c
Reviewed By: marco-c
Subscribers: hiraditya, cfe-commits, #sanitizers, llvm-commits, sylvestre.ledru
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74953
This version fixes a buildbot failure cause by picking the wrong insert
point for XORs. We cannot pick the XOR binary operator as insert point,
as it is not guaranteed that both input operands for the overflow
intrinsic are defined before it.
This reverts the revert commit
c7fc0e5da6.
Add a map from BasicBlocks to overlap intervals. For partial writes, we
can keep track of those in IOLs. We only add candidates that are valid
for eliminations.
Reviewers: dmgreen, bryant, asbirlea, Tyker
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D73757
Summary:
Blocks in a loop can be in any order as long as the loop header is the
first block in Blocks.
With some order of Blocks, cloneLoopWithPreheader would trigger the
assertion in addBasicBlockToLoop.
Example:
define void @test(i64 %N) {
preheader.i:
br label %header.i
header.i:
%i = phi i64 [ 0, %preheader.i ], [ %inc.i, %latch.i ]
br label %header.j
header.j:
%j = phi i64 [ 0, %header.i ], [ %inc.j, %latch.j ]
br label %header.k
header.k:
%k = phi i64 [ 0, %header.j ], [ %inc.k, %latch.k ]
call void @baz(i64 %i, i64 %j, i64 %k)
br label %latch.k
latch.k:
%inc.k = add nsw i64 %k, 1
%cmp.k = icmp slt i64 %inc.k, %N
br i1 %cmp.k, label %header.k, label %latch.j
latch.j:
%inc.j = add nsw i64 %j, 1
%cmp.j = icmp slt i64 %inc.j, %N
br i1 %cmp.j, label %header.j, label %latch.i
latch.i:
%inc.i = add nsw i64 %i, 1
%cmp.i = icmp slt i64 %inc.i, %N
br i1 %cmp.i, label %header.i, label %exit.i
exit.i:
ret void
}
declare void @baz(i64, i64, i64)
If the blocks of loop-i is in the order: header.i, latch.k, header.k,
header.j, latch.j, latch.i,
then cloneLoopWithPreheader would trigger the assertion in
addBasicBlockToLoop
assert(contains(SameHeader) && getHeader() == SameHeader->getHeader() &&
"Incorrect LI specified for this loop!");
As latch.k is in both loop-j and loop-k, it would be set as the header
of both loops after adding latch.k.
If we update loop headers during cloning blocks, then after adding
header.k,
the header of loop-k would be updated with header.k,
while the header of loop-j stays as latch.k.
When adding header.j, SameHeader is loop-k, SameHeader->getHeader() is
header.k, but getHeader() is latch.k, which trigger the assertion.
Reviewer: jdoerfert, Meinersbur, fhahn, kbarton, hfinkel, bmahjour,
etiotto
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D74382