If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.
Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.
This came up as a regression when I was looking at a different
demanded bits improvement.
https://rise4fun.com/Alive/5fl
Name: general
Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
%a1 = add i8 %x, C1
%a2 = and i8 %x, C2
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, ~C2
Name: test 1
%a1 = add i8 %x, 192
%a2 = and i8 %x, 10
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, -11
Name: test 2
%a1 = add i8 %x, -108
%a2 = and i8 %x, 3
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, -4
Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: asbirlea (Alina Sbirlea)
Differential Revision: https://reviews.llvm.org/D90482
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.
Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.
This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.
Differential Revision: https://reviews.llvm.org/D91305
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.
This patch adds recognition of such pattern using SCEV.
Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
We need to be able to call function pointers. Inline the dispatch
function.
Also inline the context projection function.
Transfer debug locations from the suspend point to the inlined functions.
Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.
Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D91098
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.
This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass(). The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.
We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.
This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D91046
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234
This reverts commits:
* [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
b873aba394.
* [LoopVectorizer] Silence warning in GetRegUsage.
9ff701100a.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
This patch silences the warning:
error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture]
auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) {
~^~~
1 error generated.
Introduced in:
https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88
This is more accurate than dividing the bitwidth based on the element count by the
maximum register size, as it can just reuse whatever has been calculated for
legalization of these types.
This change is also necessary when calculating register usage for scalable vectors, where
the legalization of these types cannot be done based on the widest register size, because
that does not take the 'vscale' component into account.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D91059
delete abs/nabs handling in earlycse pass to avoid bugs related to
hashing values. After abs/nabs is canonicalized to intrinsics in D87188,
we should get CSE ability for abs/nabs back.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D90734
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:
```
resumable foo() {
int x[10] = {};
int a = 3;
co_await std::experimental::suspend_always();
a++;
x[0] = 1;
a += 2;
x[1] = 2;
a += 3;
x[2] = 3;
}
```
Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):
```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
40 co_await std::experimental::suspend_always();
41 a++;
42 x[0] = 1;
-> 43 a += 2;
44 x[1] = 2;
45 a += 3;
46 x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```
The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.
Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:
```
await.ready:
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
becomes:
```
await.ready:
...
call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
Differential Revision: https://reviews.llvm.org/D90772
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.
Differential Revision: https://reviews.llvm.org/D90402
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.
Differential Revision: https://reviews.llvm.org/D90940
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84682
This was missing as discovered by the SystemZ multistage bot:
http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this
extension was not performed.
Thanks for review by Ulrich Weigand and Roman Lebedev.
Differential Revision: https://reviews.llvm.org/D90760
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.
We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.
Differential Revision: https://reviews.llvm.org/D90055
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.
With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.
Differential Revision: https://reviews.llvm.org/D89896
Interfaces changed to take `ElementCount` as parameters:
* LoopVectorizationPlanner::buildVPlans
* LoopVectorizationPlanner::buildVPlansWithVPRecipes
* LoopVectorizationCostModel::selectVectorizationFactor
This patch is NFC for fixed-width vectors.
Reviewed By: dmgreen, ctetreau
Differential Revision: https://reviews.llvm.org/D90879
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
Reviewed By: mehdi_amini, fghanim
Differential Revision: https://reviews.llvm.org/D91109
Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.
Differential Revision: https://reviews.llvm.org/D90977
Introduce struct FlattenInfo to group some of the bookkeeping. Besides this
being a bit of a clean-up, it is a prep step for next additions (D90640). I
could take things a bit further, but thought this was a good first step also
not to make this change too large.
Differential Revision: https://reviews.llvm.org/D90408
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84681
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D89628
As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.
From C11 and C++11 onwards, a forward-progress requirement has been
introduced for both languages. In the case of C, loops with non-constant
conditionals that do not have any observable side-effects (as defined by
6.8.5p6) can be assumed by the implementation to terminate, and in the
case of C++, this assumption extends to all functions. The clang
frontend will emit the `mustprogress` function attribute for C++
functions (D86233, D85393, D86841) and emit the loop metadata
`llvm.loop.mustprogress` for every loop in C11 or later that has a
non-constant conditional.
This patch modifies LoopDeletion so that only loops with
the `llvm.loop.mustprogress` metadata or loops contained in functions
that are required to make progress (`mustprogress` or `willreturn`) are
checked for observable side-effects. If these loops do not have an
observable side-effect, then we delete them.
Loops without observable side-effects that do not satisfy the above
conditions will not be deleted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86844
Currently, LoopDeletion refuses to remove dead loops with no exit blocks
because it cannot statically determine the control flow after it removes
the block. This leads to miscompiles if the loop is an infinite loop and
should've been removed.
Differential Revision: https://reviews.llvm.org/D90115
Results of convergent operations are implicitly affected by the
enclosing control flows and should not be hoisted out of arbitrary
loops.
Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>
Differential Revision: https://reviews.llvm.org/D90361
The `llvm.coro.suspend.async` intrinsic takes a function pointer as its
argument that describes how-to restore the current continuation's
context from the context argument of the continuation function. Before
we assumed that the current context can be restored by loading from the
context arguments first pointer field (`first_arg->caller_context`).
This allows for defining suspension points that reuse the current
context for example.
Also:
llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic
Blocks inlining until after the async coroutine was split.
Also, change the async function pointer's context size position
struct async_function_pointer {
uint32_t relative_function_pointer_to_async_impl;
uint32_t context_size;
}
And make the position of the `async context` argument configurable. The
position is specified by the `llvm.coro.id.async` intrinsic.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90783
This patch changes the type of Start, End in VFRange to be an ElementCount
instead of `unsigned`. This is done as preparation to make VPlans for
scalable vectors, but is otherwise NFC.
Reviewed By: dmgreen, fhahn, vkmr
Differential Revision: https://reviews.llvm.org/D90715
This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have
createWideIV available as a generic helper utility. I.e., this is not only
useful in IndVarSimplify, but could be useful for loop transformations. For
example, motivation for this refactoring is the loop flatten transformation: if
induction variables in a loop nest can be widened, we can avoid having to
perform certain overflow checks, enabling this transformation.
Differential Revision: https://reviews.llvm.org/D90421
When replacing an assume(false) with a store, we have to be more careful
with the order we insert the new access. This patch updates the code to
look at the accesses in the block to find a suitable insertion point.
Alterantively we could check the defining access of the assume, but IIRC
there has been some discussion about making assume() readnone, so
looking at the access list might be more future proof.
Fixes PR48072.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90784
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Reapply with fix for memory sanitizer failure and sphinx failure.
Differential Revision: https://reviews.llvm.org/D90612
This patch adds the `async` lowering of coroutines.
This will be used by the Swift frontend to lower async functions. In
contrast to the `retcon` lowering the frontend needs to be in control
over control-flow at suspend points as execution might be suspended at
these points.
This is very much work in progress and the implementation will change as
it evolves with the frontend. As such the documentation is lacking
detail as some of it might change.
rdar://70097093
Differential Revision: https://reviews.llvm.org/D90612
This is slightly better compile-time wise,
since we avoid potentially-costly knownbits analysis that will
ultimately not allow us to actually do anything with said `add`.
This reverts commit 59b22e495c.
That commit broke building for ARM and AArch64, reproducible like this:
$ cat apedec-reduced.c
a;
b(e) {
int c;
unsigned d = f();
c = d >> 32 - e;
return c;
}
g() {
int h = i();
if (a)
h = h << a | b(a);
return h;
}
$ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c
clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed.
Same thing for e.g. an armv7-linux-gnueabihf target.
There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction.
Use it instead of manually checking in each place.
Differential Revision: https://reviews.llvm.org/D90693
InstCombine is quite aggressive in doing the opposite transform,
folding `add` of operands with no common bits set into an `or`,
and that not many things support that new pattern..
In this case, teaching Reassociate about it is easy,
there's preexisting art for `sub`/`shl`:
just convert such an `or` into an `add`:
https://rise4fun.com/Alive/Xlyv
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.
With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.
Differential Revision: https://reviews.llvm.org/D89896
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.
Fixes PR47991.
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.
This should allow us to begin resolving PR46896 et al.
Differential Revision: https://reviews.llvm.org/D90625
If we know that some check will not be executed on the last iteration, we can use this
fact to eliminate its check.
Differential Revision: https://reviews.llvm.org/D88210
Reviwed By: ebrevnov
This patch enhances computeOutliningColdRegionsInfo() to allow it to
consider regions containing a single basic block and a single
predecessor as candidate for partial inlining.
Reviewed By: fhann
Differential Revision: https://reviews.llvm.org/D89911
This reverts the revert commit 408c4408fa.
This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.
Original message:
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.
This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.
This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.
I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
This patch updates DSE + MemorySSA to use the same check as the legacy
implementation to determine if a location is killed by a free call.
This changes the existing behavior so that a free does not kill
locations before the start of the freed pointer.
This should fix PR48036.
This reverts the revert commit a1b53db324.
This patch includes a fix for a reported issue, caused by
matchSelectPattern returning UMIN for selects of pointers in
some cases by looking to some connected casts.
For now, ensure integer instrinsics are only returned for selects of
ints or int vectors.
This is the last of the rotate->funnel shift InstCombine generalizations for PR46896
We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine
Differential Revision: https://reviews.llvm.org/D90382
Previously, !noalias and !alias.scope metadata on the call site was
applied as part of CloneAliasScopeMetadata(), which short-circuits
if the callee does not use any noalias metadata itself. However,
these two things have no relation to each other.
Consistently apply !noalias and !alias.scope metadata by integrating
this into an existing function that handled !llvm.access.group and
!llvm.mem.parallel_loop_access metadata. The handling for all of
these metadata kinds essentially the same.
This reverts commit 1922570489.
This appears to cause a crash in the following example
a, b, c;
l() {
int e = a, f = l, g, h, i, j;
float *d = c, *k = b;
for (;;)
for (; g < f; g++) {
k[h] = d[i];
k[h - 1] = d[j];
h += e << 1;
i += e;
}
}
clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c
llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.
Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.
With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:
code size boot time
before 92824064 6.18s
after 38822400 6.65s
[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
Depends on D90425
Differential Revision: https://reviews.llvm.org/D90426
This is a workaround for poor heuristics in the backend where we can
end up materializing the constant multiple times. This is particularly
bad when using outlined checks because we materialize it for every call
(because the backend considers it trivial to materialize).
As a result the field containing the shadow base value will always
be set so simplify the code taking that into account.
Differential Revision: https://reviews.llvm.org/D90425
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
This patch modifies two for loops to use the range based syntax.
Since they are equivalent, this patch is tagged NFC.
Differential Revision: https://reviews.llvm.org/D90069
- As convergent intrinsics/calls could only be moved to
control-equivalent blocks, or more precisely the same divergent
branch, PRE needs to skip them.
Differential Revision: https://reviews.llvm.org/D90391
Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities.
Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think?
Reviewed By: fhahn, asbirlea
Differential Revision: https://reviews.llvm.org/D90371