case where a static caller is itself inlined everywhere else, and
thus may go away if it doesn't get too big due to inlining other
things into it. If there are references to the caller other than
calls, it will not be removed; account for this.
This results in same-day completion of the case in PR8853.
llvm-svn: 122821
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
llvm-svn: 122806
instruction *after* the store. The store will always be deleted
if the transformation kicks in, so we'd do an N^2 scan of every
loop block. Whoops.
llvm-svn: 122805
FunctionPass. It probably doesn't have a reason to be a LoopPass, as it will
probably drop the simple fixed point and either use RPO iteration or Duncan's
approach in instsimplify of only revisiting instructions that have changed.
The next step is to preserve LoopSimplify. This looks like it won't be too hard,
although the pass manager doesn't actually seem to respect when non-loop passes
claim to preserve LCSSA or LoopSimplify. This will have to be fixed.
llvm-svn: 122791
that are allowed to have metadata operands are intrinsic calls,
and the only ones that take metadata currently return void.
Just reject all void instructions, which should not be value
numbered anyway. To future proof things, add an assert to the
getHashValue impl for calls to check that metadata operands
aren't present.
llvm-svn: 122759
nested values, so they can change and drop to null, which can
change the hash and cause havok.
It turns out that it isn't a good idea to value number stuff
with metadata operands anyway, so... don't.
llvm-svn: 122758
capacity on the Visited SmallPtrSet. On 403.gcc, this is about a 4.5% speedup of
CodeGenPrepare time (which itself is 10% of time spent in the backend).
This is progress towards PR8889.
llvm-svn: 122741
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.
llvm-svn: 122719
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
llvm-svn: 122712
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was
*just* a tree and didn't have DFS numbers. Checking DFS numbers is faster
and easier than "limiting the search of the tree".
llvm-svn: 122702
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
llvm-svn: 122695
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
llvm-svn: 122685
maintains the guarantee that the DenseSet expects two elements it contains to
not go from inequal to equal under its nose.
As a side-effect, this also lets us switch from iterating to a fixed-point to
actually maintaining a work queue of functions to look at again, and we don't
add thunks to our work queue so we don't need to detect and ignore them.
llvm-svn: 122677
pipeline to be caught by instcombine, and it's not feasible to catch them in SimplifyCFG because the
use-lists are in an inconsistent state at the point where it could know that it need to simplify them.
Instead, have CodeGenPrepare look for trivially redundant PHIs as part of its general cleanup effort.
llvm-svn: 122516
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
llvm-svn: 122399
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
llvm-svn: 122378
visit instructions before their uses, since InstructionSimplify does a
better job in that case. All this prompted by Frits van Bommel.
llvm-svn: 122343
it could only be tested indirectly, via instcombine, gvn or some other
pass that makes use of InstructionSimplify, which means that testcases
had to be carefully contrived to dance around any other transformations
that that pass did.
llvm-svn: 122264
argument. The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.
llvm-svn: 122234
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
llvm-svn: 122182
the old thing end up on the instcombine worklist. Not doing this
can cause an extra top-level iteration of instcombine, burning
compile time.
llvm-svn: 122179
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:
unsigned char X(char a, char b) {
int res = a+b;
if ((unsigned )(res+128) > 255U)
abort();
return res;
}
llvm-svn: 122178
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.
This is likely to be the fix for 8816.
llvm-svn: 122177
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.
llvm-svn: 122164
on the DragonEgg self-host bot. Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.
llvm-svn: 122072
a null endptr argument, because they may write to errno.
This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.
llvm-svn: 122014
dragonegg self-host buildbot. Original commit message:
Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
llvm-svn: 121965
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
Fixes <rdar://problem/8558713>.
llvm-svn: 121905
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.
llvm-svn: 121859
which is simpler than finding a place to insert in BB.
- Don't perform the 'if condition hoisting' xform on certain
i1 PHIs, as it interferes with switch formation.
This re-fixes "example 7", without breaking the world hopefully.
llvm-svn: 121764
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.
second, it doesn't clean up the "if diamond" that it just
eliminated away. This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.
llvm-svn: 121762
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
llvm-svn: 121690
location in simplifycfg. In the old days, SimplifyCFG was never run on
the entry block, so we had to scan over all preds of the BB passed into
simplifycfg to do this xform, now we can just check blocks ending with
a condbranch. This avoids a scan over all preds of every simplified
block, which should be a significant compile-time perf win on functions
with lots of edges. No functionality change.
llvm-svn: 121668
(x & 2^n) ? 2^m+C : C
we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.
llvm-svn: 121609
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
llvm-svn: 121120
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.
llvm-svn: 121111
optimization.
Consider:
static void foo() {
A = alloca
...
}
static void bar() {
B = alloca
...
call foo();
}
void main() {
bar()
}
The inliner proceeds bottom up, but lets pretend it decides not to inline foo
into bar. When it gets to main, it inlines bar into main(), and says "hey, I
just inlined an alloca "B" into main, lets remember that. Then it keeps going
and finds that it now contains a call to foo. It decides to inline foo into
main, and says "hey, foo has an alloca A, and I have an alloca B from another
inlined call site, lets reuse it". The problem with this of course, is that
the lifetime of A and B are nested, not disjoint.
Unfortunately I can't create a reasonable testcase for this: the one in the
PR is both huge and extremely sensitive, because you minor tweaks end up
causing foo to get inlined into bar too early. We already have tests for the
basic alloca merging optimization and this does not break them.
llvm-svn: 120995
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
llvm-svn: 120974
Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output.
Internally, it now stores the ConstantInt*s as Constant*s, and actual undef values instead of nulls.
llvm-svn: 120946
1. if the underlying pointer passed in can be resolved
to any argument or alloca, then we don't need to scan.
Previously we would only avoid the scan if the alloca
or byval was actually considered dead.
2. The dead store processing code is itself completely
dead and didn't handle volatile stores right anyway,
so delete it. This allows simplifying the interface
to RemoveAccessedObjects.
llvm-svn: 120467
made sense to me. We now have a set of dead stack objects, and
they become live when loaded. Fix a theoretical problem where
we'd pass in the wrong pointer to the alias query.
llvm-svn: 120465
If the call might read all the allocas, stop scanning early.
Convert a vector to smallvector, shrink SmallPtrSet to 16 instead
of 64 to avoid crazy linear scans.
llvm-svn: 120463
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
llvm-svn: 120406
contains "ref".
Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.
llvm-svn: 120368
1. Don't bother trying to optimize:
lifetime.end(ptr)
store(ptr)
as it is undefined, and therefore shouldn't exist.
2. Move the 'storing a loaded pointer' xform up, simplifying
the may-aliased store code.
llvm-svn: 120359
by my recent GVN improvement. Looking through a single layer of
PHI nodes when attempting to sink GEPs, we need to iteratively
look through arbitrary PHI nests.
llvm-svn: 120202
fairly systematic way in instcombine. Some of these cases were already dealt
with, in which case I removed the existing code. The case of Add has a bunch of
funky logic which covers some of this plus a few variants (considers shifts to be
a form of multiplication), which I didn't touch. The simplification performed is:
A*B+A*C -> A*(B+C). The improvement is to do this in cases that were not already
handled [such as A*B-A*C -> A*(B-C), which was reported on the mailing list], and
also to do it more often by not checking for "only one use" if "B+C" simplifies.
llvm-svn: 120024
allowing the memcpy to be eliminated.
Unfortunately, the requirements on byval's without explicit
alignment are really weak and impossible to predict in the
mid-level optimizer, so this doesn't kick in much with current
frontends. The fix is to change clang to set alignment on all
byval arguments.
llvm-svn: 119916
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed. This led to really bad performance on tall, narrow CFGs.
We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree. Because there are typically few duplicates of a given value, this scan tends to be
quite fast. Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.
This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.
The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack. If you see conditional propagation that's not happening, please file bugs against LVI or CVP.
llvm-svn: 119714
refusing to optimize two memcpy's like this:
copy A <- B
copy C <- A
if it couldn't prove that noalias(B,C). We can eliminate
the copy by producing a memmove instead of memcpy.
llvm-svn: 119694
if it is passed as a byval argument. The byval argument will just be a
read, so it is safe to read from the original global instead. This allows
us to promote away the %agg.tmp alloca in PR8582
llvm-svn: 119686
instructions out of InstCombine and into InstructionSimplify. While
there, introduce an m_AllOnes pattern to simplify matching with integers
and vectors with all bits equal to one.
llvm-svn: 119536
hasConstantValue. I was leery of using SimplifyInstruction
while the IR was still in a half-baked state, which is the
reason for delaying the simplification until the IR is fully
cooked.
llvm-svn: 119494
systematically, CollapsePhi will always return null here. Note
that CollapsePhi did an extra check, isSafeReplacement, which
the SimplifyInstruction logic does not do. I think that check
was bogus - I guess we will soon find out! (It was originally
added in commit 41998 without a testcase).
llvm-svn: 119456
offload the work to hasConstantValue rather than do something more
complicated (such handling mutually recursive phis) because (1) it is
not clear it is worth it; and (2) if it is worth it, maybe such logic
would be better placed in hasConstantValue. Adjust some GVN tests
which are now cleaned up much further (eg: all phi nodes are removed).
llvm-svn: 119043
SimplifyAssociativeOrCommutative) "(A op C1) op C2" -> "A op (C1 op C2)",
which previously was only done if C1 and C2 were constants, to occur whenever
"C1 op C2" simplifies (a la InstructionSimplify). Since the simplifying operand
combination can no longer be assumed to be the right-hand terms, consider all of
the possible permutations. When compiling "gcc as one big file", transform 2
(i.e. using right-hand operands) fires about 4000 times but it has to be said
that most of the time the simplifying operands are both constants. Transforms
3, 4 and 5 each fired once. Transform 6, which is an existing transform that
I didn't change, never fired. With this change, the testcase is now optimized
perfectly with one run of instcombine (previously it required instcombine +
reassociate + instcombine, and it may just have been luck that this worked).
llvm-svn: 119002
"%z = %x and %y". If GVN can prove that %y equals %x, then it turns
this into "%z = %x and %x". With the new code, %z will be replaced
with %x everywhere (and then deleted). Previously %z would be value
numbered too, which is a waste of time. Also, while a clever value
numbering algorithm would give %z the same value number as %x, our
current one doesn't do so (at least I don't think it does). The new
logic has an essentially equivalent effect to what you would get if
%z was given the same value number as %x, i.e. it should make value
numbering smarter. While there, get hold of target data once at the
start rather than a gazillion times all over the place.
llvm-svn: 118923
testing for dereferenceable pointers into a helper function,
isDereferenceablePointer. Teach it how to reason about GEPs
with simple non-zero indices.
Also eliminate ArgumentPromtion's IsAlwaysValidPointer,
which didn't check for weak externals or out of range gep
indices.
llvm-svn: 118840
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
to optionally look for constant or local (alloca) memory.
Teach BasicAliasAnalysis::pointsToConstantMemory to look through Select
and Phi nodes, and to support looking for local memory.
Remove FunctionAttrs' PointsToLocalOrConstantMemory function, now that
AliasAnalysis knows all the tricks that it knew.
llvm-svn: 118412
consider it to be readonly. In fact, don't even consider it to be
readonly if it does a volatile load from an AllocaInst either (it
is debatable as to whether readonly would be correct or not in this
case; play safe for the moment). This fixes PR8279.
llvm-svn: 117783
This code had previously used 2*N, where N is the mask length, to represent
undef. That is not safe because the shufflevector operands may have more
than N elements -- they don't have to match the result type.
llvm-svn: 117721
Allow splats even if they don't match either of the original shuffles,
possibly due to undef entries in the shuffles masks. Radar 8597790.
Also fix some 80-column violations.
llvm-svn: 117719
needs to be guaranteed never to be run on an unreachable block. However, earlier block simplifications may have
changed the CFG to make block that were reachable when we began our iteration unreachable by the time we try to
simplify them. (Note that this also means that our depth-first iterators were potentially being invalidated).
This should not have a large impact on code quality, since later runs of instcombine should pick up these simplifications.
Fixes PR8506.
llvm-svn: 117709
it isn't unreachable and should not be zapped. The check for the entry block
was missing in one case: a block containing a unwind instruction. While there,
do some small cleanups: "M" is not a great name for a Function* (it would be
more appropriate for a Module*), change it to "Fn"; use Fn in more places.
llvm-svn: 117224
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
formulae which become illegal as a result of the offset updating don't
escape.
This is for rdar://8529692. No testcase yet, because the given cases
hit use-list ordering differences.
llvm-svn: 116093
This doesn't usually matter, because the other heuristics usually
succeed regardless, but it's good to keep the register use
bookkeeping consistent.
llvm-svn: 116005
Anyone interested in more general PRE would be better served by implementing it separately, to get real
anticipation calculation, etc.
llvm-svn: 115337
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
llvm-svn: 115243
Because of this, we cannot use the Simplify* APIs, as they can assert-fail on unreachable code. Since it's not easy to determine
if a given threading will cause a block to become unreachable, simply defer simplifying simplification to later InstCombine and/or
DCE passes.
llvm-svn: 115082
register pressure and thus excess spills, which we don't currently recover from well. This should
be re-evaluated in the future if our ability to generate good spills/splits improves.
Partial fix for <rdar://problem/7635585>.
llvm-svn: 114919
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.
It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.
Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.
llvm-svn: 114859
Splitting critical edges at the merge point only addressed part of the issue; it is also possible for non-post-domination
to occur when the path from the load to the merge has branches in it. Unfortunately, full anticipation analysis is
time-consuming, so for now approximate it. This is strictly more conservative than real anticipation, so we will miss
some cases that real PRE would allow, but we also no longer insert loads into paths where they didn't exist before. :-)
This is a very slight net positive on SPEC for me (0.5% on average). Most of the benchmarks are largely unaffected, but
when it pays off it pays off decently: 181.mcf improves by 4.5% on my machine.
llvm-svn: 114785
"external" even when doing lazy bitcode loading. This was broken because
a function that is not materialized fails the !isDeclaration() test.
llvm-svn: 114666
truncates are free only in the case where the extended type is legal but the
load type is not. If both types are illegal, such as when they are too big,
the load may not be legalized into an extended load.
llvm-svn: 114568
load when the type of the load is not legal, even if truncates are not free.
The load is going to be legalized to an extending load anyway.
llvm-svn: 114488
walking the asm arguments once and stashing their Values. This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free). PR 8154.
llvm-svn: 114104
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
llvm-svn: 113763
This can result in increased opportunities for store narrowing in code generation. Update a number of
tests for this change. This fixes <rdar://problem/8285027>.
Additionally, because this inverts the order of ors and ands, some patterns for optimizing or-of-and-of-or
no longer fire in instances where they did originally. Add a simple transform which recaptures most of these
opportunities: if we have an or-of-constant-or and have failed to fold away the inner or, commute the order
of the two ors, to give the non-constant or a chance for simplification instead.
llvm-svn: 113679
not unrolling loops that contain calls that would be better off getting inlined. This mostly
comes up when an interleaved devirtualization pass has devirtualized a call which the inliner
will inline on a future pass. Thus, rather than blocking all loops containing calls, add
a metric for "inline candidate calls" and block loops containing those instead.
llvm-svn: 113535
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
The threshold value of 50 is arbitrary, and I chose it simply by analogy to the inlining thresholds, where
the baseline unrolling threshold is slightly smaller than the baseline inlining threshold. This could
undoubtedly use some tuning.
llvm-svn: 113306
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
Switch from isWeakForLinker to mayBeOverridden which is more accurate.
Add more statistics and debugging info. Add comments. Move static function
outside anonymous namespace.
llvm-svn: 113190
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
Loop::hasLoopInvariantOperands method. Remove
a useless and confusing Loop::isLoopInvariant(Instruction)
method, which didn't do what you thought it did.
No functionality change.
llvm-svn: 113133
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
I'm sure it is harmless. Original commit message:
If PrototypeValue is erased in the middle of using the SSAUpdator
then the SSAUpdator may access freed memory. Instead, simply pass
in the type and name explicitly, which is all that was used anyway.
llvm-svn: 112810
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
two are weak, we make them thunks to a new strong function) so don't iterate
through the function list as we're modifying it.
Also add back the outermost loop which got removed during the cleanups.
llvm-svn: 112595
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
instead of PromoteMemToReg. This allows it to stop using DF and DT,
eliminating a computation of DT and DF from clang -O3. Clang is now
down to 2 runs of DomFrontier.
llvm-svn: 112457
assertingvh so we get a violent explosion if the pointer dangles.
2) Fix AliasSetTracker::deleteValue to remove call sites with
by-pointer comparisons instead of by-alias queries. Using
findAliasSetForCallSite can cause alias sets to get merged
when they shouldn't, and can also miss alias sets when the
call is readonly.
#2 fixes PR6889, which only repros with a .c file :(
llvm-svn: 112452
LICM correctly. When sinking an instruction, it should not add
entries for the sunk instruction to the AST, it should remove
the entry for the sunk instruction. The blocks being sunk to
are not in the loop, so their instructions shouldn't be in the
AST (yet)!
llvm-svn: 112447
keeping them around until the pass is destroyed, keep them
around a) just when useful (not for outer loops) and b) destroy
them right after we use them. This should reduce memory use
and fixes potential bugs where a loop is deleted and another
loop gets allocated to the same address.
llvm-svn: 112446
LSRInstance data structures up to date. This fixes some
pessimizations caused by stale data which will be exposed
in an upcoming change.
llvm-svn: 112440
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).
This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.
llvm-svn: 112190
which does the same thing. This eliminates redundant code and
handles MDNodes better. MDNode linking still doesn't fully
work yet though.
llvm-svn: 111941
that it avoids a lot of unnecessary cloning by avoiding remapping
MDNode cycles when none of the nodes in the cycle actually need to
be remapped. Also it uses the new temporary MDNode mechanism.
llvm-svn: 111922
from the LHS should disable reconsidering that pred on the
RHS. However, knowing something about the pred on the RHS
shouldn't disable subsequent additions on the RHS from
happening.
llvm-svn: 111349
uninteresting, just put all the operands on one list and make
GenerateReassociations make the decision about what's interesting.
This is simpler, and it avoids an extra ScalarEvolution::getAddExpr call.
llvm-svn: 111133
- Eliminate redundant successors.
- Convert an indirectbr with one successor into a direct branch.
Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.
llvm-svn: 111060
patterns generated by clang for transpose of a matrix in generic vectors. This is made
of two parts:
1) Propagating vector extracts of hi/lo half into their users
2) Recognizing an insertion of even elements followed by the odd elements as an unpack.
Testcase to come, but this shrinks the # of shuffle instructions generated on x86 from ~40 to the minimal 8.
llvm-svn: 110734
Further clean up the comparison function by removing overly generalized
"domains".
Remove all understanding of ELF aliases and simplify folding code and comments.
llvm-svn: 110434
eliminate several const_casts.
Make CallSite implicitly convertible to ImmutableCallSite.
Rename the getModRefBehavior for intrinsic IDs to
getIntrinsicModRefBehavior to avoid overload ambiguity with CallSite,
which happens to be implicitly convertible to bool.
llvm-svn: 110155
Start cleaning up MergeFunctions to look more like the rest of LLVM. The
primary change here is to move the methods responsible for comparison into the
new FunctionComparator object. Some comments added. There's more to do.
llvm-svn: 110021
exactly what bugpoint expected it to do.
There was also only one user of
BlockExtractorPass(const std::vector<BasicBlock*> &B), so just remove it and
make BlockExtractorPass read BlockFile.
This fixes bugpoint's block extraction.
Nick, please review.
llvm-svn: 109936
alloca instructions (constrained by their internal encoding),
and add error checking for it. Fix an instcombine bug which
generated huge alignment values (null is infinitely aligned).
This fixes undefined behavior noticed by John Regehr.
llvm-svn: 109643