Summary:
Following the cleanup in D48202, method foldBlockIntoPredecessor has the
same behavior. Replace its uses with MergeBlockIntoPredecessor.
Remove foldBlockIntoPredecessor.
Reviewers: chandlerc, dmgreen
Subscribers: jlebar, javed.absar, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62751
llvm-svn: 362538
This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash.
Differential Revision: https://reviews.llvm.org/D62560
llvm-svn: 362427
Extract a willNotOverflow() helper function that is shared between
eliminateOverflowIntrinsic() and strengthenOverflowingOperation().
Use WithOverflowInst for the former.
We'll be able to reuse the same code for saturating intrinsics as
well.
llvm-svn: 362305
Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix
for LFTR poison handling issues in general.
When LFTR moves a condition from pre-inc to post-inc, it may now
depend on value that is poison due to nowrap flags. To avoid this,
we clear any nowrap flag that SCEV cannot prove for the post-inc
addrec.
Additionally, LFTR may switch to a different IV that is dynamically
dead and as such may be arbitrarily poison. This patch will correct
nowrap flags in some but not all cases where this happens. This is
related to the adoption of IR nowrap flags for the pre-inc addrec.
(See some of the switch_to_different_iv tests, where flags are not
dropped or insufficiently dropped.)
Finally, there are likely similar issues with the handling of GEP
inbounds, but we don't have a test case for this yet.
Differential Revision: https://reviews.llvm.org/D60935
llvm-svn: 362292
At the moment, LoopPredication completely bails out if it sees a latch of the form:
%cmp = icmp ne %iv, %N
br i1 %cmp, label %loop, label %exit
OR
%cmp = icmp ne %iv.next, %NPlus1
br i1 %cmp, label %loop, label %exit
This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can.
For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially.
For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem.
This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later.
Differential Revision: https://reviews.llvm.org/D62748
llvm-svn: 362282
When the object size argument is -1, no checking can be done, so calling the
_chk variant is unnecessary. We already did this for a bunch of these
functions.
rdar://50797197
Differential revision: https://reviews.llvm.org/D62358
llvm-svn: 362272
If we can determine that a saturating add/sub will not overflow based
on range analysis, convert it into a simple binary operation. This is
a sibling transform to the existing with.overflow handling.
Reapplying this with an additional check that the saturating intrinsic
has integer type, as LVI currently does not support vector types.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362263
Noticed on D62703. LVI only handles plain integers, not vectors of
integers. This was previously not an issue, because vector support
for with.overflow is only a relatively recent addition.
llvm-svn: 362261
If we can determine that a saturating add/sub will not overflow
based on range analysis, convert it into a simple binary operation.
This is a sibling transform to the existing with.overflow handling.
Differential Revision: https://reviews.llvm.org/D62703
llvm-svn: 362242
It looks this fold was already partially happening, indirectly
via some other folds, but with one-use limitation.
No other fold here has that restriction.
https://rise4fun.com/Alive/ftR
llvm-svn: 362217
Previously, this used a statement like this:
Map[A] = Map[B];
This is equivalent to the following:
const auto &Src = Map[B];
auto &Dest = Map[A];
Dest = Src;
The second statement, "auto &Dest = Map[A];" can insert a new
element into the DenseMap, which can potentially grow and reallocate
the DenseMap's internal storage, which will invalidate the existing
reference to the source. When doing the actual assignment,
the Src reference is dereferenced, accessing memory that was
freed when the DenseMap grew.
This issue hasn't shown up when LLVM was built with Clang, because
the right hand side ended up dereferenced before evaulating the
left hand side. (If the value type is a larger data type, Clang doesn't
do this but behaves like GCC.)
With GCC, a cast to Value* isn't enough to make it dereference the
right hand side reference before invoking operator[] (while that is
enough to make Clang/LLVM do the right thing for larger types), but
storing it in an intermediate variable in a separate statement works.
This fixes PR42065.
Differential Revision: https://reviews.llvm.org/D62624
llvm-svn: 362150
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.
If present, the type must match the pointee type of the argument.
The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.
Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.
llvm-svn: 362128
VPlan.h already contains the declaration of VPlanPtr type alias:
using VPlanPtr = std::unique_ptr<VPlan>;
The LoopVectorizationPlanner class also contains the same declaration
of VPlanPtr and therefore LoopVectorize requires a long wording when
its methods return VPlanPtr:
LoopVectorizationPlanner::VPlanPtr
LoopVectorizationPlanner::buildVPlanWithVPRecipes(...)
but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h)
and can use VPlanPtr from that header.
Patch by Pavel Samolysov.
Reviewers: hsaito, rengolin, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D62576
llvm-svn: 362126
Summary:
I'm adding ORE to memset/memcpy formation, with tests,
but mainly this is split off from D61144.
Reviewers: reames, anemet, thegameg, craig.topper
Reviewed By: thegameg
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62631
llvm-svn: 362092
Currently, only the following information is provided by LoopVectorizer
in the case when the CF of the loop is not legal for vectorization:
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.
But this information is not enough for the root cause analysis; what is
exactly wrong with the loop should also be printed:
LV: Not vectorizing: The exiting block is not the loop latch.
Patch by Pavel Samolysov.
Reviewers: mkuper, hsaito, rengolin, fhahn
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D62311
llvm-svn: 362056
Based on the overflow direction information added in D62463, we can
now fold always overflowing signed saturating add/sub to signed min/max.
Differential Revision: https://reviews.llvm.org/D62544
llvm-svn: 362006
Summary:
When we import an alias, we do so by making a clone of the aliasee. Just
as this clone uses the original alias name and linkage, it should also
use the same visibility (not the aliasee's visibility). Otherwise,
linker behavior is affected (e.g. if the aliasee was hidden, but the
alias is not, the resulting imported clone should not be hidden,
otherwise the linker will make the final symbol hidden which is
incorrect).
Reviewers: wmi
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62535
llvm-svn: 361989
Fix PR41279 where critical edges to EHPad are not split.
The fix is to not instrument those critical edges. We used to be able to know
the size of counters right after MST is computed. With this, we have to
pre-collect the instrument BBs to know the size, and then instrument them.
Differential Revision: https://reviews.llvm.org/D62439
llvm-svn: 361882
This reverts commit 53f2f32865.
As reported on D62126, this causes assertion failures if the switch
has incorrect branch_weights metadata, which may happen as a result
of other transforms not handling it correctly yet.
llvm-svn: 361881
In order to fold an always overflowing signed saturating add/sub,
we need to know in which direction the always overflow occurs.
This patch splits up AlwaysOverflows into AlwaysOverflowsLow and
AlwaysOverflowsHigh to pass through this information (but it is
not used yet).
Differential Revision: https://reviews.llvm.org/D62463
llvm-svn: 361858
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 361811
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126
llvm-svn: 361808
The code to preserve LCSSA PHIs currently only properly supports
reduction PHIs and PHIs for values defined outside the latches.
This patch improves the LCSSA PHI handling to cover PHIs for values
defined in the latches.
Fixes PR41725.
Reviewers: efriedma, mcrosier, davide, jdoerfert
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61576
llvm-svn: 361743
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
llvm-svn: 361727
and replace with an equilivent countTrailingZeros.
GCD is much more expensive than this, with repeated division.
This depends on D60823
llvm-svn: 361726
This matches countLeadingOnes() and countTrailingOnes(), and
APInt's countLeadingZeros() and countTrailingZeros().
(as well as __builtin_clzll())
llvm-svn: 361724
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
llvm-svn: 361721
The guaranteed no-wrap region is never empty, it always contains at
least zero, so these optimizations don't ever apply.
To make this more obviously true, replace the conversative return
in makeGNWR with an assertion.
llvm-svn: 361698
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
llvm-svn: 361613
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Differential Revision: https://reviews.llvm.org/D62198
llvm-svn: 361610
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.
This solves problems seen in
https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.
The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=41949
Reviewers: spatel, efriedma, fhahn
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62250
llvm-svn: 361605
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
llvm-svn: 361559
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.
The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61955
llvm-svn: 361537
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
llvm-svn: 361533
Summary:
This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.
/// Example:
/// for (int i = lb; i < ub; i+=step)
/// <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
/// guardcmp = (lb < ub)
/// if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
/// i1 = phi[{lb, preheader}, {i2, latch}]
/// <loop body>
/// i2 = i1 + step
/// latch:
/// cmp = (i2 < ub)
/// if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
/// getInitialIVValue --> lb
/// getStepInst --> i2 = i1 + step
/// getStepValue --> step
/// getFinalIVValue --> ub
/// getCanonicalPredicate --> '<'
/// getDirection --> Increasing
/// getGuard --> if (guardcmp) goto loop; else goto afterloop
/// getInductionVariable --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical --> false
Committed on behalf of @Whitney (Whitney Tsang).
Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60565
llvm-svn: 361517
`fadd` and `fsub` have recently (r351850) been added as `atomicrmw`
operations. This diff adds lowering cases for them to the LowerAtomic
transform.
Patch by Josh Berdine!
llvm-svn: 361512
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.
basically this allows the array in following C code to be optimized out
struct Expr {
int a[2];
int b;
};
static struct Expr e;
int foo (int i)
{
e.b = 2;
e.a[i] = 1;
return e.b;
}
Reviewers: greened, bkramer, nicholas, jmolloy
Reviewed By: jmolloy
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61911
llvm-svn: 361460
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
Summary: Avoid visiting an instruction more than once by using a map.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62262
llvm-svn: 361416
The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero.
llvm-svn: 361387
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.
Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rAhttps://bugs.llvm.org/show_bug.cgi?id=16739
And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.
Differential Revision: https://reviews.llvm.org/D62024
llvm-svn: 361338
Summary:
Because the sort order was not strongly stable on the RHS, whether the
chain could merge would depend on the order of the blocks in the Phi.
EXPENSIVE_CHECKS would shuffle the blocks before sorting, resulting in
non-deterministic merging.
Reviewers: gchatelet
Subscribers: hiraditya, llvm-commits, RKSimon
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62193
llvm-svn: 361281
And handle for self-move. This is required so that llvm::sort can work
with EXPENSIVE_CHECKS, as it will do a random shuffle of the input
which can result in self-moves.
llvm-svn: 361257
This reverts commit rr360902. It caused an assertion failure in
lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <=
FragmentSizeInBits) && "new fragment outside of original fragment"'
failed.
PR41931.
llvm-svn: 361246
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).
Differential Revision: https://reviews.llvm.org/D61693
llvm-svn: 361188
This reverts commit 95805bc425.
I've squashed the test fix into this commit.
[DebugInfo] Update loop metadata for inlined loops
Currently, when a loop is cloned while inlining function (A) into function (B)
the loop metadata is copied and then not modified at all. The loop metadata can
encode the loop's start and end DILocations. Therefore, the new inlined loop in
function (B) may have loop metadata which shows start and end locations residing
in function (A).
This patch ensures loop metadata is updated while inlining so that the start and
end DILocations are given the "inlinedAt" operand. I've also added a regression
test for this.
This fix is required for D60831 because that patch uses loop metadata to
determine the DILocation for the branches of new loop preheaders.
Reviewers: aprantl, dblaikie, anemet
Reviewed By: aprantl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D61933
llvm-svn: 361149
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D61943
llvm-svn: 361137
Summary:
Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A).
This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this.
This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders.
Reviewers: aprantl, dblaikie, anemet
Reviewed By: aprantl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D61933
llvm-svn: 361132
This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61795
llvm-svn: 361110
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.
Reviewers: spatel, craig.topper, efriedma, majnemer
Reviewed By: spatel
Subscribers: llvm-commits, craig.topper
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61938
llvm-svn: 361043
With a fix for PR41917: The predecessor list was changing under our feet.
- for (BasicBlock *Pred : predecessors(EntryBlock_)) {
+ while (!pred_empty(EntryBlock_)) {
+ BasicBlock* const Pred = *pred_begin(EntryBlock_);
llvm-svn: 361009
Using dominance vs a set membership check is indistinguishable from a compile time perspective, and the two queries return equivelent results. Simplify code by using the existing function.
llvm-svn: 360976
Summary:
Adds a call to __hwasan_handle_vfork(SP) at each landingpad entry.
Reusing __hwasan_handle_vfork instead of introducing a new runtime call
in order to be ABI-compatible with old runtime library.
Reviewers: pcc
Subscribers: kubamracek, hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D61968
llvm-svn: 360959
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop casts
impossible. With the recent addition of DW_OP_LLVM_convert this salvaging is
now possible, and so can be used to fix the attached bug as well as any cases
where SExt instruction results are lost in the debugging metadata. This patch
introduces this fix by expanding the salvage debug info method to cover these
cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360902
Summary: We should excluded unreachable operands from processing as their DFS visitation order is undefined. When `renameUses` function sorts `OpsToRename` (https://fburl.com/d2wubn60), the comparator assumes that the parent block of the operand has a corresponding dominator tree node. This is not the case for unreachable operands and crashes the compiler.
Reviewers: dberlin, mgrang, davide
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61154
llvm-svn: 360796
Summary:
The return value of a TryToUnfoldSelect call was not checked, which led to an
incorrectly preserved loop info and some crash.
The original crash was reported on https://reviews.llvm.org/D59514.
Reviewers: davidxl, amehsan
Reviewed By: davidxl
Subscribers: fhahn, brzycki, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61920
llvm-svn: 360780
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=40645
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. With the recent addition of DW_OP_LLVM_convert this
salvaging is now possible, and so can be used to fix the attached bug as
well as any cases where SExt instruction results are lost in the
debugging metadata. This patch introduces this fix by expanding the
salvage debug info method to cover these cases using the new operator.
Differential revision: https://reviews.llvm.org/D61184
llvm-svn: 360772
Instead of patching the original blocks, we now generate new blocks and
delete the old blocks. This results in simpler code with a less twisted
control flow (see the change in `entry-block-shuffled.ll`).
This will make https://reviews.llvm.org/D60318 simpler by making it more
obvious where control flow created and deleted.
Reviewers: gchatelet
Subscribers: hiraditya, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61736
llvm-svn: 360771
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.
As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.
Reviewers: Ayal, hsaito, dcaballe, rengolin
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D61638
llvm-svn: 360758
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0
For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.
For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).
Reviewed By: rnk, dexonsmith
Differential Revision: https://reviews.llvm.org/D61547
llvm-svn: 360742
Port hardware assisted address sanitizer to new PM following the same guidelines as msan and tsan.
Changes:
- Separate HWAddressSanitizer into a pass class and a sanitizer class.
- Create new PM wrapper pass for the sanitizer class.
- Use the getOrINsert pattern for some module level initialization declarations.
- Also enable kernel-kwasan in new PM
- Update llvm tests and add clang test.
Differential Revision: https://reviews.llvm.org/D61709
llvm-svn: 360707
When an outer loop gets deleted by a different pass, before LICM visits
it, we cannot clean up its sub-loops in AliasSetMap, because at the
point we receive the deleteAnalysisLoop callback for the outer loop, the loop
object is already invalid and we cannot access its sub-loops any longer.
Reviewers: asbirlea, sanjoy, chandlerc
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D61904
llvm-svn: 360704
LoopSimplify can preserve MemorySSA after r360270.
But the MemorySSA analysis is retrieved and preserved only when the
EnableMSSALoopDependency is set to true. Use the same conditional to
mark the pass as preserved, otherwise subsequent passes will get an
invalid analysis.
Resolves PR41853.
llvm-svn: 360697
Some atomic loads are implemented as cmpxchg (particularly if large or
floating), and that usually requires write access to the memory involved
or it will segfault.
We can still propagate the constant value to users we understand though.
llvm-svn: 360662
Summary:
CoroFrame was not considering static array allocas, and was only ever reserving a single element in the coroutine frame.
This meant that stores to the non-zero'th element would corrupt later frame data.
Store static array allocas as field arrays in the coroutine frame.
Added test.
Committed by Gor Nishanov on behalf of ben-clayton
Reviewers: GorNishanov, modocache
Reviewed By: GorNishanov
Subscribers: Orlando, capn, EricWF, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61372
llvm-svn: 360636
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.
Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):
%shamt = zext i8 %i to i32
%m = and i32 %shamt, 31
%neg = sub i32 0, %shamt
%and4 = and i32 %neg, 31
%shl = shl i32 %v, %m
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
=>
%a = and i8 %i, 31
%shamt2 = zext i8 %a to i32
%neg2 = sub i32 0, %shamt2
%and4 = and i32 %neg2, 31
%shl = shl i32 %v, %shamt2
%shr = lshr i32 %v, %and4
%or = or i32 %shr, %shl
https://rise4fun.com/Alive/V9r
llvm-svn: 360605
Summary:
We hit undefined references building with ThinLTO when one source file
contained explicit instantiations of a template method (weak_odr) but
there were also implicit instantiations in another file (linkonce_odr),
and the latter was the prevailing copy. In this case the symbol was
marked hidden when the prevailing linkonce_odr copy was promoted to
weak_odr. It led to unsats when the resulting shared library was linked
with other code that contained a reference (expecting to be resolved due
to the explicit instantiation).
Add a CanAutoHide flag to the GV summary to allow the thin link to
identify when all copies are eligible for auto-hiding (because they were
all originally linkonce_odr global unnamed addr), and only do the
auto-hide in that case.
Most of the changes here are due to plumbing the new flag through the
bitcode and llvm assembly, and resulting test changes. I augmented the
existing auto-hide test to check for this situation.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, arphaman, dang, llvm-commits, steven_wu, wmi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59709
llvm-svn: 360466
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.
Committed on behalf of @vporpo (Vasileios Porpodas)
Differential Revision: https://reviews.llvm.org/D61706
llvm-svn: 360456
The original change introduced a depth limit of 7 which caused a 22% regression
in the Swift MapReduceLazyCollection & Ackermann benchmarks. This new threshold
still ensures that the original test case doesn't hang.
rdar://50359639
llvm-svn: 360444
Summary:
- Constant expressions may not be added in strict postorder as the
forward instruction scan order. Thus, for a constant express (CE0), if
its operand (CE1) is used in an previous instruction, they are not in
postorder. However, different from
`cloneInstructionWithNewAddressSpace`,
`cloneConstantExprWithNewAddressSpace` doesn't bookkeep uninferred
instructions for later resolving. That results in failure of inferring
constant address.
- This patch adds the support to infer constant expression operand
recursively, since there won't be loop, if that operand is another
constant expression.
Reviewers: arsenm
Subscribers: jholewinski, jvesely, wdng, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61760
llvm-svn: 360431
In certain circumstances, optimizations pick line numbers from debug
intrinsic instructions as the new location for altered instructions. This
is problematic because the line number of a debugging intrinsic is
meaningless (it doesn't produce any machine instruction), only the scope
information is valid. The result can be the line number of a variable
declaration "leaking" into real code from debugging intrinsics, making the
line table un-necessarily jumpy, and potentially different with / without
variable locations.
Fix this by using zero line numbers when promoting dbg.declare intrinsics
into dbg.values: this is safe for debug intrinsics as their line numbers
are meaningless, and reduces the scope for damage / misleading stepping
when optimizations pick locations from the wrong place.
Differential Revision: https://reviews.llvm.org/D59272
llvm-svn: 360415
Summary:
Seeing some issues for windows debug pathological cases with collectBitParts
recursion (1525 levels of recursion!)
Setting the limit to 64 as this should be sufficient - passes all lit cases
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61728
Change-Id: I7f44cdc6c1badf1c2ccbf1b0c4b6afe27ecb39a1
llvm-svn: 360347
The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements.
My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement.
Differential Revision: https://reviews.llvm.org/D61695
llvm-svn: 360284
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.
Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60833
llvm-svn: 360270
Reassociation's NegateValue moved instructions to the beginning of
blocks (after PHIs) without checking for exception handling pads.
It's possible for reassociation to move something into an exception
handling block so we need to make sure we don't move things too early
in the block. This change advances the insertion point past any
exception handling pads.
If the block we want to move into contains a catchswitch, we cannot
move into it. In that case just create a new neg as if we had not
found an existing neg to move.
Differential Revision: https://reviews.llvm.org/D61089
llvm-svn: 360262
If we fold a branch/switch to an unconditional branch to another dead block we
replace the branch with unreachable, to avoid attempting to fold the
unconditional branch.
Reviewers: davide, efriedma, mssimpso, jdoerfert
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D61300
llvm-svn: 360232
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)
I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV
This transform enables the following transform that already exists in
instcombine:
(X | Y) ^ Y --> X & ~Y
As a result, the full expected transform is:
(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)
There already exists the transform in the sub case:
(X | Y) - Y --> X & ~Y
However this does not trigger in the case where Y is constant due to an earlier
transform:
X - (-C) --> X + C
With this new add fold, both the add and sub constant cases are handled.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61517
llvm-svn: 360185
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.
llvm-svn: 360180
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel
Reviewed By: hfinkel
Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 360162
Fixes the main issue in PR41693
When both modes are used, two functions are created:
`sancov.module_ctor`, `sancov.module_ctor.$LastUnique`, where
$LastUnique is the current LastUnique counter that may be different in
another module.
`sancov.module_ctor.$LastUnique` belongs to the comdat group of the same
name (due to the non-null third field of the ctor in llvm.global_ctors).
COMDAT group section [ 9] `.group' [sancov.module_ctor] contains 6 sections:
[Index] Name
[ 10] .text.sancov.module_ctor
[ 11] .rela.text.sancov.module_ctor
[ 12] .text.sancov.module_ctor.6
[ 13] .rela.text.sancov.module_ctor.6
[ 23] .init_array.2
[ 24] .rela.init_array.2
# 2 problems:
# 1) If sancov.module_ctor in this module is discarded, this group
# has a relocation to a discarded section. ld.bfd and gold will
# error. (Another issue: it is silently accepted by lld)
# 2) The comdat group has an unstable name that may be different in
# another translation unit. Even if the linker allows the dangling relocation
# (with --noinhibit-exec), there will be many undesired .init_array entries
COMDAT group section [ 25] `.group' [sancov.module_ctor.6] contains 2 sections:
[Index] Name
[ 26] .init_array.2
[ 27] .rela.init_array.2
By using different module ctor names, the associated comdat group names
will also be different and thus stable across modules.
Reviewed By: morehouse, phosek
Differential Revision: https://reviews.llvm.org/D61510
llvm-svn: 360107
This reverts r357452 (git commit 21eb771dcb).
This was causing strange optimization-related test failures on an internal test. Will followup with more details offline.
llvm-svn: 360086
We don't always get this:
Cond ? -X : -Y --> -(Cond ? X : Y)
...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.
Differential Revision: https://reviews.llvm.org/D61604
llvm-svn: 360075
Properly initialize store type to null then ensure we find a real store type in the chain.
Fixes scan-build null dereference warning and makes the code clearer.
llvm-svn: 360031
Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert
DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian
targets for same reasons as in D59687.
Differential Revision: https://reviews.llvm.org/D60611
llvm-svn: 360013
Summary:
It is a common thing to loop over every `PHINode` in some `BasicBlock`
and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block.
`replaceSuccessorsPhiUsesWith()` already had code to do that,
it just wasn't a function.
So outline it into a new function, and use it.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61013
llvm-svn: 359996
Summary:
There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()`
and `PHINode::getNumOperands()`, but no function to replace every
specified `BasicBlock*` predecessor with some other specified `BasicBlock*`.
Clearly, there are a lot of places that could use that functionality.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61011
llvm-svn: 359995
Summary:
There is `Instruction::getNumSuccessors()`, `Instruction::getSuccessor()`
and `Instruction::setSuccessor()`, but no function to replace every
specified `BasicBlock*` successor with some other specified `BasicBlock*`.
I've found one place where it should clearly be used.
Reviewers: chandlerc, craig.topper, spatel, danielcdh
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61010
llvm-svn: 359994
Summary:
If `deleteDeadLoop()` is called on such a loop, that has "bad" exit block,
one that e.g. has no terminator instruction, the `DIBuilder::insertDbgValueIntrinsic()`
will be told to insert the Dbg Value Intrinsic after `nullptr`
(since there is no first non-PHI instruction), which will cause it to not insert
those instructions into any basic block. The instructions will be parent-less,
and IR verifier will complain. It is rather obvious to track down the root cause
when that happens, so let's just assert it never happens.
Reviewers: sanjoy, davide, vsk
Reviewed By: vsk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61008
llvm-svn: 359993
Summary:
Inalloca parameters require special handling in some optimizations.
This change causes globalopt to strip the inalloca attribute from
function parameters when it is safe to do so, removes the special
handling for inallocas from argpromotion, and replaces it with a
simple check that causes argpromotion to skip functions that receive
inallocas (for when the pass is invoked on code that didn't run
through globalopt first). This also avoids a case where argpromotion
would incorrectly try to pass an inalloca in a register.
Fixes PR41658.
Reviewers: rnk, efriedma
Reviewed By: rnk
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61286
llvm-svn: 359743
Summary: Fix a transformation bug where two scopes share a common instrution to hoist.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61405
llvm-svn: 359736
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.
This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.
llvm-svn: 359633
Summary:
Match NewPassManager behavior: add option for interleaved loops in the
old pass manager, and use that instead of the flag used to disable loop unroll.
No changes in the defaults.
Reviewers: chandlerc
Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61030
llvm-svn: 359615
Summary:
When a variable goes into scope several times within a single function
or when two variables from different scopes share a stack slot it may
be incorrect to poison such scoped locals at the beginning of the
function.
In the former case it may lead to false negatives (see
https://github.com/google/sanitizers/issues/590), in the latter - to
incorrect reports (because only one origin remains on the stack).
If Clang emits lifetime intrinsics for such scoped variables we insert
code poisoning them after each call to llvm.lifetime.start().
If for a certain intrinsic we fail to find a corresponding alloca, we
fall back to poisoning allocas for the whole function, as it's now
impossible to tell which alloca was missed.
The new instrumentation may slow down hot loops containing local
variables with lifetime intrinsics, so we allow disabling it with
-mllvm -msan-handle-lifetime-intrinsics=false.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60617
llvm-svn: 359536
Follow-up to:
rL359482
Avoid this potential problem throughout by giving the type a name
and verifying the assumption that both operands are the same type.
llvm-svn: 359485
PVS Studio's copy+paste recognizer was seeing this as a typo, technically Op0/Op1 in a fcmp should always be the same type, but we might as well avoid the issue.
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359482
This change aims at making the file format be compatible with the
way LLVM handles command line options.
Differential Revision: https://reviews.llvm.org/D60970
llvm-svn: 359462
This enables the pass to be used in the absence of
TargetTransformInfo. When the argument isn't passed, the factory
defaults to UninitializedAddressSpace and the flat address space is
obtained from the TargetTransformInfo as before this change. Existing
users won't have to change.
Patch by Kevin Petit.
Differential Revision: https://reviews.llvm.org/D60602
llvm-svn: 359290
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.
This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.
llvm-svn: 359253
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
llvm-svn: 359236
it keeps track of becomes too large
ARC optimizer does a top-down and a bottom-up traversal of the whole
function to pair up retain and release instructions and remove them.
This can be expensive if the number of instructions in the function and
pointer states it tracks are large since it has to look at each pointer
state and determine whether the instruction being visited can
potentially use the pointer.
This patch adds a command line option that sets a limit to the number of
pointers it tracks.
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D61100
llvm-svn: 359226
When evaluating a store through a bitcast, the evaluator tries to move the
bitcast from the pointer onto the stored value. If the cast is invalid, it
tries to "introspect" the type to get a valid cast by obtaining a pointer to
the initial element (if the type is nested, this may require walking several
initial elements).
In some situations it is possible to get a bitcast on a load (e.g. with
unions, where the bitcast may not be the same type as the store). However,
equivalent logic to the store to introspect the type is missing. This patch
add this logic.
Note, when developing the patch I was unhappy with adding similar logic
directly to the load case as it could get out of step. Instead, I have
abstracted the "introspection" into a helper function, with the specifics
being handled by a passed-in lambda function.
Differential Revision: https://reviews.llvm.org/D60793
llvm-svn: 359205
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.
Reviewers: chandlerc, jgorbe
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61091
llvm-svn: 359167
Do not wrap the contents of printFusionCandidates in the LLVM_DEBUG macro. This
fixes an unused variable warning generated when compiling without asserts but
with -DENABLE_LLVM_DUMP.
Differential Revision: https://reviews.llvm.org/D61035
llvm-svn: 359161
Summary: The code did not check if operand was undef before casting it to Instruction.
Reviewers: RKSimon, ABataev, dtemirbulatov
Reviewed By: ABataev
Subscribers: uabelho
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61024
llvm-svn: 359136
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.
This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).
llvm-svn: 359111
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
Summary:
Enabling MemorySSA in the old pass manager leads to MemorySSA being run
twice due to the fact that LCSSA and LoopSimplify do not preserve
MemorySSA. This is the first step to address that: target LCSSA.
LCSSA does not make any changes that invalidate MemorySSA, so it
preserves it by design. It must preserve AA as well, for this to hold.
After this patch, MemorySSA is still run twice in the old pass manager.
Step two follows: target LoopSimplify.
Subscribers: mehdi_amini, jlebar, Prazek, llvm-commits, george.burgess.iv, chandlerc
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60832
llvm-svn: 359032
DominatorTree::dominate.
ARC contract pass has an optimization that replaces the uses of the
argument of an ObjC runtime function call with the call result.
For example:
; Before optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %1, i8** @g0, align 8
; After optimization
%1 = tail call i8* @foo1()
%2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
store i8* %2, i8** @g0, align 8 // %1 is replaced with %2
Before replacing the argument use, DominatorTree::dominate is called to
determine whether the user instruction is dominated by the ObjC runtime
function call instruction. The call to DominatorTree::dominate can be
expensive if the two instructions belong to the same basic block and the
size of the basic block is large. This patch checks the basic block size
and just bails out if the size exceeds the limit set by command line
option "arc-contract-max-bb-size".
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D60900
llvm-svn: 359027
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.
Differential Revision: https://reviews.llvm.org/D59703
llvm-svn: 359000
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982
In some circumstances we can end up with setup costs that are very complex to
compute, even though the scevs are not very complex to create. This can also
lead to setupcosts that are calculated to be exactly -1, which LSR treats as an
invalid cost. This patch puts a limit on the recursion depth for setup cost to
prevent them taking too long.
Thanks to @reames for the report and test case.
Differential Revision: https://reviews.llvm.org/D60944
llvm-svn: 358958
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.
The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.
Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.
Differential Revision: https://reviews.llvm.org/D60659
llvm-svn: 358919
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.
llvm-svn: 358913
Back in August, r340525 introduced a dependency on the assumption
cache tracker in the ipsccp pass, but that commit missed a call to
INITIALIZE_PASS_DEPENDENCY, which leaves the assumption cache
improperly registered if SCCP is the only thing that pulls it in.
llvm-svn: 358903
Currently, we do not expose BPI to loop passes at all. In the old pass manager, we appear to have been ignoring the fact that LCSSA and/or LoopSimplify didn't preserve BPI, and making it available to the following loop passes anyways. In the new one, it's invalidated before running any loop pass if either LCSSA or LoopSimplify actually make changes. If they don't make changes, then BPI is valid and available. So, we go ahead and teach LCSSA and LoopSimplify how to preserve BPI for consistency between old and new pass managers.
This patch avoids an invalidation between the two requires in the following trivial pass pipeline:
opt -passes="requires<branch-prob>,loop(no-op-loop),requires<branch-prob>"
(when the input file is one which requires either LCSSA or LoopSimplify to canonicalize the loops)
Differential Revision: https://reviews.llvm.org/D60790
llvm-svn: 358901
This reverts commit 7bf4d7c07f2fac862ef34c82ad0fef6513452445.
After thinking about this more, this isn't right, the range is not exact
in the same sense as makeExactICmpRegion(). This needs a separate
function.
llvm-svn: 358876
Following D60632 makeGuaranteedNoWrapRegion() always returns an
exact nowrap region. Rename the function accordingly. This is in
line with the naming of makeExactICmpRegion().
llvm-svn: 358875
Summary:
Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative.
Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181)
Reviewers: sanjoy, apilipenko, nikic
Reviewed By: nikic
Subscribers: hiraditya, jfb, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60036
llvm-svn: 358816
This is a follow-up to r291037+r291258, which used null debug locations
to prevent jumpy line tables.
Using line 0 locations achieves the same effect, but works better for
crash attribution because it preserves the right inline scope.
Differential Revision: https://reviews.llvm.org/D60913
llvm-svn: 358791
Summary:
Make the flags in LICM + MemorySSA tuning options in the old and new
pass managers.
Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60490
llvm-svn: 358772
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.
Reviewers: chandlerc
Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59723
llvm-svn: 358763
removeUsers uses a work list to collect indirect users and call remove()
on those functions. However it has a bug (`if (!Visited.insert(UU).second)`).
Actually, we don't have to collect indirect users.
After the merge of F and G, G's callers will be considered (added to
Deferred). If G's callers can be merged, G's callers' callers will be
considered.
Update the test unnamed-addr-reprocessing.ll to make it clear we can
still merge indirect callers.
llvm-svn: 358741
code to `CallBase`.
This patch focuses on the legacy PM, call graph, and some of inliner and legacy
passes interacting with those APIs from `CallSite` to the new `CallBase` class.
No interesting changes.
Differential Revision: https://reviews.llvm.org/D60412
llvm-svn: 358739
We would previously drop the COMDAT on the thunk we generated when replacing a
function body with the forwarding thunk. This would result in a function that
may have been multiply emitted and multiply merged to be emitted with the same
name without the COMDAT. This is a hard error with PE/COFF where the COMDAT is
used for the deduplication of Value Witness functions for Swift.
llvm-svn: 358728