We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.
We can do the same thing for generic load/stores.
Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).
Differential Revision: http://reviews.llvm.org/D6585
llvm-svn: 223862
patterns.
This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.
This reverts commit r223764.
Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.
Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.
All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.
With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.
For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
integer loads and stores. SSA values are tremendously more powerful
than "copy" intrinsics. Not doing this regresses massive amounts of
LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
SROA or every memcpy of a trivially copyable struct will prevent SSA
formation of the members of that struct. It essentially turns off
SROA.
- The closest alternative is to actually split the loads and stores when
partitioning with SROA, but this has all of the downsides historically
discussed of splitting up loads and stores -- the wide-store
information is fundamentally lost. We would also see performance
regressions for bitfield-heavy code and other places where the
integers aren't really intended to be split without seemingly
arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
a choice to make IMO.
llvm-svn: 223813
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.
test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.
(Removed test57 and test58)
llvm-svn: 223767
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.
Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.
All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.
With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.
For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
integer loads and stores. SSA values are tremendously more powerful
than "copy" intrinsics. Not doing this regresses massive amounts of
LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
SROA or every memcpy of a trivially copyable struct will prevent SSA
formation of the members of that struct. It essentially turns off
SROA.
- The closest alternative is to actually split the loads and stores when
partitioning with SROA, but this has all of the downsides historically
discussed of splitting up loads and stores -- the wide-store
information is fundamentally lost. We would also see performance
regressions for bitfield-heavy code and other places where the
integers aren't really intended to be split without seemingly
arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
a choice to make IMO.
Differential Revision: http://reviews.llvm.org/D6548
llvm-svn: 223764
Disallow complex types of function-local metadata. The only valid
function-local metadata is an `MDNode` whose sole argument is a
non-metadata function-local value.
Part of PR21532.
llvm-svn: 223564
Reapply r223347, with a fix to not crash on uninserted instructions (or more
precisely, instructions in uninserted blocks). bugpoint was able to reduce the
test case somewhat, but it is still somewhat large (and relies on setting
things up to be simplified during inlining), so I've not included it here.
Nevertheless, it is clear what is going on and why.
Original commit message:
Restrict somewhat the memory-allocation pointer cmp opt from r223093
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.
llvm-svn: 223371
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:
OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:
fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))
Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.
Differential Revision: http://reviews.llvm.org/D6407
llvm-svn: 223349
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.
llvm-svn: 223347
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);
rdar://19049359
Differential Revision: http://reviews.llvm.org/D6496
llvm-svn: 223270
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n
llvm-svn: 223224
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case. We need to use LoopInfo to
correctly determine which back-edges are loops.
llvm-svn: 223199
Such loops shouldn't be vectorized due to the loops form.
After applying loop-rotate (+simplifycfg) the tests again start to check
what they are intended to check.
llvm-svn: 223170
Follow up from r222926. Also handle multiple destinations from merged
cases on multiple and subsequent phi instructions.
rdar://problem/19106978
llvm-svn: 223135
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.
llvm-svn: 223119
System memory allocation functions, which are identified at the IR level by the
noalias attribute on the return value, must return a pointer into a memory region
disjoint from any other memory accessible to the caller. We can use this
property to simplify pointer comparisons between allocated memory and local
stack addresses and the addresses of global variables. Neither the stack nor
global variables can overlap with the region used by the memory allocator.
Fixes PR21556.
llvm-svn: 223093
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.
For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.
On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).
llvm-svn: 223050
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936
We may be in a situation where the icmps might not be near each other in
a tree of or instructions. Try to dig out related compare instructions
and see if they combine.
N.B. This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR. We may have to resort
to something more sophisticated if this is a real problem.
llvm-svn: 222928
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.
Differential Revision: http://reviews.llvm.org/D6414
rdar://problem/18943047
llvm-svn: 222927
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.
Differential Revision: http://reviews.llvm.org/D6415
rdar://problem/19040894
llvm-svn: 222926
Fixed missing dominance check.
Original commit message:
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
Jump threading will then eliminate the second if(cond).
llvm-svn: 222891
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
\endcode
Jump threading will then eliminate the second if(cond).
llvm-svn: 222872
This restores our ability to optimize:
(X & C) ? X & ~C : X into X & ~C
(X & C) ? X : X & ~C into X
(X & C) ? X | C : X into X
(X & C) ? X : X | C into X | C
llvm-svn: 222868
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.
A test has been added to ensure that we don't regress again.
I'll work on a rewrite of what the optimization was trying to do later.
llvm-svn: 222856
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.
The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.
Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.
I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.
Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.
Differential Revision: http://reviews.llvm.org/D6397
llvm-svn: 222768
stored rather than the pointer type.
This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.
With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.
I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.
llvm-svn: 222748
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.
Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
llvm-svn: 222739
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.
llvm-svn: 222716
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.
This fixes PR21653.
llvm-svn: 222659
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout. This resulted in opt crashing.
This fixes PR21651.
llvm-svn: 222645
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).
llvm-svn: 222590
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
llvm-svn: 222451
This reverts commit r222142. This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.
Conflicts:
test/Transforms/Reassociate/optional-flags.ll
llvm-svn: 222398
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.
The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.
This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.
llvm-svn: 222354
AliasSetTracker::addUnknown may create an AliasSet devoid of pointers
just to contain an instruction if no suitable AliasSet already exists.
It will then AliasSet::addUnknownInst and we will be done.
However, it's possible for addUnknown to choose an existing AliasSet to
addUnknownInst.
If this were to occur, we are in a bit of a pickle: removing pointers
from the AliasSet can cause the entire AliasSet to become destroyed,
taking our unknown instructions out with them.
Instead, keep track whether or not our AliasSet has any unknown
instructions.
This fixes PR21582.
llvm-svn: 222338
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.
This fixes the second part of PR21576.
llvm-svn: 222265
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301
llvm-svn: 222241
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.
While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.
llvm-svn: 222230
I added a pessimization in r217102 to prevent miscompiles when the
incremented induction variable was used in a comparison; it would be
poison.
Try to use the incremented induction variable more often when we can be
sure that the increment won't end in poison.
Differential Revision: http://reviews.llvm.org/D6222
llvm-svn: 222213
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.
The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.
To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.
This fixes rdar://problem/18984639.
llvm-svn: 222168
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:
int f1(int x) {
switch (x) {
case 0: return 10;
case 1: return 11;
case 2: return 12;
case 3: return 13;
}
return 0;
}
generates:
define i32 @f1(i32 %x) #0 {
entry:
%0 = icmp ult i32 %x, 4
br i1 %0, label %switch.lookup, label %return
switch.lookup:
%switch.offset = add i32 %x, 10
ret i32 %switch.offset
return:
ret i32 0
}
llvm-svn: 222121
This adds back r222061, but now calls initializePAEvalPass from the correct
library to avoid link problems.
Original message:
Don't make assumptions about the name of private global variables.
Private variables are can be renamed, so it is not reliable to make
decisions on the name.
The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).
This patch changes one case where we were looking at the variable name to use
the section instead.
Test tuning by Michael Gottesman.
llvm-svn: 222117
Private variables are can be renamed, so it is not reliable to make
decisions on the name.
The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).
This patch changes one case where we were looking at the variable name to use
the section instead.
Test tuning by Michael Gottesman.
llvm-svn: 222061
We would attempt to replace a fptrunc of an frem with an identical
fptrunc. This would cause the new fptrunc to be added to the worklist.
Of course, this results in an infinite loop because we will keep
visiting the newly created fptruncs.
This fixes PR21576.
llvm-svn: 222040
doing Load PRE"
This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll
The failing test is sensitive to the order in which we process loads. This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.
This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.
llvm-svn: 222039