of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101364
numerator is an induction variable. For example, with code like this:
for (i=0;i<n;++i)
x[i%n] = 0;
IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the remainder.
llvm-svn: 101113
expression is a UDiv and it doesn't appear that the UDiv came from
the user's source.
ScalarEvolution has recently figured out how to compute a tripcount
expression for the inner loop in
SingleSource/Benchmarks/Shootout/sieve.c, using a udiv. Emitting a
udiv instruction dramatically slows down the enclosing loop.
llvm-svn: 101068
a ScalarEvolution bug with overflow handling is fixed, the normal analysis
code will automatically decline to operate on the icmp instructions which
are responsible for the loop exit.
llvm-svn: 101032
instead of deleting just the user. This makes it more consistent with
other code in IndVarSimplify, and theoretically can eliminate more users
earlier.
llvm-svn: 101027
the loop exit test. This usually doesn't come up for a variety of
reasons, but it isn't impossible, so make IndVarSimplify handle it
conservatively.
llvm-svn: 101008
variables. For example, with code like this:
for (i=0;i<n;++i)
if (i<n)
x[i] = 0;
IndVarSimplify will now recognize that i is always less than n inside
the loop, and eliminate the if.
llvm-svn: 101000
into adjacent loops. Also, ensure that the insert position is
dominated by the loop latch of any loop in the post-inc set which
has a latch.
llvm-svn: 100906
forced constant is changed to a constant, we would end
up adding the instruction to the wrong worklist,
preventing it from being properly revisited. This fixes
rdar://7832370
llvm-svn: 100837
explicitly split into stride-and-offset pairs. Also, add the
ability to track multiple post-increment loops on the same expression.
This refines the concept of "normalizing" SCEV expressions used for
to post-increment uses, and introduces a dedicated utility routine for
normalizing and denormalizing expressions.
This fixes the expansion of expressions which are post-increment users
of more than one loop at a time. More broadly, this takes LSR another
step closer to being able to reason about more than one loop at a time.
llvm-svn: 100699
undefs in branches/switches, we have two cases: a branch on a literal
undef or a branch on a symbolic value which is undef. If we have a
literal undef, the code was correct: forcing it to a constant is the
right thing to do.
If we have a branch on a symbolic value that is undef, we should force
the symbolic value to a constant, which then makes the successor block
live. Forcing the condition of the branch to being a constant isn't
safe if later paths become live and the value becomes overdefined. This
is the case that 'forcedconstant' is designed to handle, so just use it.
This fixes rdar://7765019 but there is no good testcase for this, the
one I have is too insane to be useful in the future.
llvm-svn: 100478
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100304
exits the loop. With this information we can guarantee
the iteration count of the loop is bounded by the
compare. I think this xforms is finally safe now.
llvm-svn: 100285
checker. Amusingly, we already had tests that we should
have rejects because they would be miscompiled in the
testsuite.
The remaining issue with this is that we don't check that
the branch causes us to exit the loop if it fails, so we
don't actually know if we remain in bounds.
llvm-svn: 100284
to a signed vs unsigned value depending on the sign of the
constant fp means that we can't distinguish between a
truly negative number and a positive number so large the
32nd bit is set. So, do don't this!
llvm-svn: 100283
this cleans up a bunch of code and also fixes several crashes and
miscompiles. More to come unfortunately, this optimization
is quite broken.
llvm-svn: 100270
Added support for address spaces and added a isVolatile field to memcpy, memmove, and memset,
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
llvm-svn: 100191
is necessary. Inherits from new templated baseclass CallSiteBase<>
which is highly customizable. Base CallSite on it too, in a configuration
that allows full mutation.
Adapt some call sites in analyses to employ ImmutableCallSite.
llvm-svn: 100100
generate wrong code pretty much anywhere AFAICT.
A case that hits the bug reproducibly is impossible,
but the situation was like this:
Addr = ...
Store -> Addr
Addr2 = GEP , 0, 0
Store -> Addr2
Handling the first store, the code changed replaced Addr
with a sunkaddr and deleted Addr, but not its table
entry. Code in OptimizedBlock replaced Addr2 with a
bitcast; if that happened to reuse the memory of Addr,
the old table entry was erroneously found when handling
the second store.
llvm-svn: 100044
e.g., llvm.memcpy.i32(i8*, i8*, i32, i32) -> llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i32, i1)
A update of langref will occur in a subsequent checkin.
llvm-svn: 99928
I have audited all getOperandNo calls now, fixing
hidden assumptions. CallSite related uglyness will
be eliminated successively.
Note this patch has a long and griveous history,
for all the back-and-forths have a look at
CallSite.h's log.
llvm-svn: 99399
so that the SCEVExpander doesn't retain a dangling pointer as its
insert position. The dangling pointer in this case wasn't ever used
to insert new instructions, but it was causing trouble with
SCEVExpander's code for automatically advancing its insert position
past debug intrinsics.
This fixes use-after-free errors that valgrind noticed in
test/Transforms/IndVarSimplify/2007-06-06-DeleteDanglesPtr.ll and
test/Transforms/IndVarSimplify/exit_value_tests.ll.
llvm-svn: 99036
This time I did a self-hosted bootstrap on Linux x86-64,
with no problems. Let's see how darwin 64-bit self-hosting
goes. At the first sign of failure I'll back this out.
Maybe the valgrind bots give me a hint of what may be wrong
(it at all).
llvm-svn: 98957
out the remainder of the calls that we should lower in some way and
move the tests to the new correct directory. Fix up tests that are now
optimized more than they were before by -instcombine.
llvm-svn: 97875
can be used in more places. Add an argument for the TargetData that
most of them need. Update for the getInt8PtrTy() change. Should be
no functionality change.
llvm-svn: 97844
predecessors before returning. Otherwise, if multiple predecessor edges need
splitting, we only get one of them per iteration. This makes a small but
measurable compile time improvement with -enable-full-load-pre.
llvm-svn: 97521
which branch on undef to branch on a boolean constant for the edge
exiting the loop. This helps ScalarEvolution compute trip counts for
loops.
Teach ScalarEvolution to recognize single-value PHIs, when safe, and
ForgetSymbolicName to forget such single-value PHI nodes as apprpriate
in ForgetSymbolicName.
llvm-svn: 97126
argument is non-null, pass it along to PHITranslateSubExpr so that it can
prefer using existing values that dominate the PredBB, instead of just
blindly picking the first equivalent value that it finds on a uselist.
Also when the DominatorTree is specified, have PHITranslateValue filter
out any result that does not dominate the PredBB. This is basically just
refactoring the check that used to be in GetAvailablePHITranslatedSubExpr
and also in GVN.
Despite my initial expectations, this change does not affect the results
of GVN for any testcases that I could find, but it should help compile time.
Before this change, if PHITranslateSubExpr picked a value that does not
dominate, PHITranslateWithInsertion would then insert a new value, which GVN
would later determine to be redundant and would replace. By picking a good
value to begin with, we save GVN the extra work of inserting and then
replacing a new value.
llvm-svn: 97010
induction variable value and a loop-variant value, don't force the
insert position to be at the post-increment position, because it may
not be dominated by the loop-variant value. This fixes a
use-before-def problem noticed on PPC.
llvm-svn: 96774
strides in foreign loops. This helps locate reuse opportunities
with existing induction variables in foreign loops and reduces
the need for inserting new ones. This fixes rdar://7657764.
llvm-svn: 96629
a loop exit value, so that if a loop gets deleted, ScalarEvolution
isn't stick holding on to dangling SCEVAddRecExprs for that loop. This
fixes PR6339.
llvm-svn: 96626
with multiplication by constants distributed through, occasionally
those subexpressions can include both x and -x. For now, if this
condition is discovered within LSR, just prune such cases away,
as they won't be profitable. This fixes a "zero allocated in a
base register" assertion failure.
llvm-svn: 96177
and add a doxygen comment.
Cache the phi entry to avoid doing tons of
PHINode::getBasicBlockIndex calls in the common case.
On my insane testcase from re2c, this speeds up CGP from
617.4s to 7.9s (78x).
llvm-svn: 96083
bug fixes, and with improved heuristics for analyzing foreign-loop
addrecs.
This change also flattens IVUsers, eliminating the stride-oriented
groupings, which makes it easier to work with.
llvm-svn: 95975
block. Other blocks may have pointer cycles that will crash
basicaa and other alias analyses. In any case, there is no
point wasting cycles optimizing dead blocks. This fixes
rdar://7635088
llvm-svn: 95852
Initial skeleton and SCEVUnknown lowering implemented,
the rest should come relatively quickly. Move testcase
to new directory.
Move pass to right before SimplifyLibCalls - which is
moved down a bit so we can take advantage of a few opts.
llvm-svn: 95628
short-circuited conditions to AND/OR expressions, and those expressions
are often converted back to a short-circuited form in code gen. The
original source order may have been optimized to take advantage of the
expected values, and if we reassociate them, we change the order and
subvert that optimization. Radar 7497329.
llvm-svn: 95333
The SRThreshold value makes perfect sense for checking if an entire aggregate
should be promoted to a scalar integer, but it is not so good for splitting
an aggregate into its separate elements. A struct may contain a large embedded
array along with some scalar fields that would benefit from being split apart
by SROA. Even if the total aggregate size is large, it may still be good to
perform SROA. Thus, the most important piece of this patch is simply moving
the aggregate size comparison vs. SRThreshold so that it guards only the
aggregate promotion.
We have also been checking the number of elements to decide if an aggregate
should be split up. The limit of "SRThreshold/4" seemed rather arbitrary,
and I don't think it's very useful to derive this limit from SRThreshold
anyway. I've collected some data showing that the current default limit of
32 (since SRThreshold defaults to 128) is a reasonable cutoff for struct
types. One thing suggested by the data is that distinguishing between structs
and arrays might be useful. There are (obviously) a lot more large arrays
than large structs (as measured by the number of elements and not the total
size -- a large array inside a struct still counts as a single element given
the way we do SROA right now). Out of 8377 arrays where we successfully
performed SROA while compiling a large set of benchmarks, only 16 of them had
more than 8 elements. And, for those 16 arrays, it's not at all clear that
SROA was actually beneficial. So, to offset the compile time cost of
investigating more large structs for SROA, the patch lowers the limit on array
elements to 8.
This fixes Apple Radar 7563690.
llvm-svn: 95224
disabled by default. This divides the existing load PRE code into 2 phases:
first it checks that it is safe to move the load to each of the predecessors
where it is unavailable, and then if it is safe, the code is changed to move
the load. Radar 7571861.
llvm-svn: 95007
parameter with a default value, instead of just hardcoding it in the
implementation. The limit of MaxLookup = 6 was introduced in r69151 to fix
a performance problem with O(n^2) behavior in instcombine, but the scalarrepl
pass is relying on getUnderlyingObject to go all the way back to an AllocaInst.
Making the limit part of the method signature makes it clear that by default
the result is limited and should help avoid similar problems in the future.
This fixes pr6126.
llvm-svn: 94433
for arbitrary terminators in predecessors, don't assume
it is a conditional or uncond branch. The testcase shows
an example where they can happen with switches.
llvm-svn: 94323
handle the case when we can infer an input to the xor
from all inputs that agree, instead of going into an
infinite loop. Another part of PR6199
llvm-svn: 94321
missing ones are libsupport, libsystem and libvmcore. libvmcore is
currently blocked on bugpoint, which uses EH. Once it stops using
EH, we can switch it off.
This #if 0's out 3 unit tests, because gtest requires RTTI information.
Suggestions welcome on how to fix this.
llvm-svn: 94164
loop-variant components, adds must be inserted after the increment.
Keep track of the increment position for this case, and insert
these adds in the correct location.
llvm-svn: 94110
This new version is much more aggressive about doing "full" reduction in
cases where it reduces register pressure, and also more aggressive about
rewriting induction variables to count down (or up) to zero when doing so
reduces register pressure.
It currently uses fairly simplistic algorithms for finding reuse
opportunities, but it introduces a new framework allows it to combine
multiple strategies at once to form hybrid solutions, instead of doing
all full-reduction or all base+index.
llvm-svn: 94061
than the scaled register. This makes it more likely that subsequent
AddrModeMatcher queries will match the new address the same way as the
old, instead of accidentally matching what had been the base register
as the new scaled register, and then failing to match the scaled register.
This fixes some problems with address-mode sinking multiple muls into a
block, which will be a lot more common with some upcoming
LoopStrengthReduction changes.
llvm-svn: 93935
are the same. I had already fixed a similar problem where the source and
destination were different bitcasts derived from the same alloca, but the
previous fix still did not handle the case where both operands are exactly
the same value. Radar 7552893.
llvm-svn: 93848
in JT.
2) When cloning blocks for PHI or xor conditions, use
instsimplify to simplify the code as we go. This allows us to
squish common cases early in JT which opens up opportunities for
subsequent iterations, and allows it to completely simplify the
testcase.
llvm-svn: 93253
condition is a xor with a phi node. This eliminates nonsense
like this from 176.gcc in several places:
LBB166_84:
testl %eax, %eax
- setne %al
- xorb %cl, %al
- notb %al
- testb $1, %al
- je LBB166_85
+ je LBB166_69
+ jmp LBB166_85
This is rdar://7391699
llvm-svn: 93221
on the example in PR4216. This doesn't trigger in the testsuite,
so I'd really appreciate someone scrutinizing the logic for
correctness.
llvm-svn: 92458
when a consequtive sequence of elements all satisfies the
predicate. Like the double compare case, this generates better
code than the magic constant case and generalizes to more than
32/64 element array lookups.
Here are some examples where it triggers. From 403.gcc, most
accesses to the rtx_class array are handled, e.g.:
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=547]
%142 = icmp eq i8 %141, 105
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=543]
%165 = icmp eq i8 %164, 60
Also, most of the 59-element arrays (mode_class/rid_to_yy, etc)
optimized before are actually range compares. This lets 32-bit
machines optimize them.
400.perlbmk has stuff like this:
400.perlbmk: PL_regkind, even for 32-bit:
@PL_regkind = constant [62 x i8] c"\00\00\02\02\02\06\06\06\06\09\09\0B\0B\0D\0E\0E\0E\11\12\12\14\14\16\16\18\18\1A\1A\1C\1C\1E\1F !!!$$&'((((,-.///88886789:;8$", align 32 ; <[62 x i8]*> [#uses=4]
%811 = icmp ne i8 %810, 33
@PL_utf8skip = constant [256 x i8] c"\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\04\04\04\04\04\04\04\04\05\05\05\05\06\06\07\0D", align 32 ; <[256 x i8]*> [#uses=94]
%12 = icmp ult i8 %10, 2
etc.
llvm-svn: 92426
two elements match or don't match with two comparisons. For
example, the testcase compiles into:
define i1 @test5(i32 %X) {
%1 = icmp eq i32 %X, 2 ; <i1> [#uses=1]
%2 = icmp eq i32 %X, 7 ; <i1> [#uses=1]
%R = or i1 %1, %2 ; <i1> [#uses=1]
ret i1 %R
}
This generalizes the previous xforms when the array is larger than
64 elements (and this case matches) and generates better code for
cases where it overlaps with the magic bitshift case.
This generalizes more cases than you might expect. For example,
400.perlbmk has:
@PL_utf8skip = constant [256 x i8] c"\01\01\01\...
%15 = icmp ult i8 %7, 7
403.gcc has:
@rid_to_yy = internal constant [114 x i16] [i16 259, i16 260, ...
%18 = icmp eq i16 %16, 295
and xalancbmk has a bunch of examples, such as
_ZN11xercesc_2_5L15gCombiningCharsE and _ZN11xercesc_2_5L10gBaseCharsE.
llvm-svn: 92417
arrays with variable indices into a comparison of the index
with a constant. The most common occurrence of this that
I see by far is stuff like:
if ("foobar"[i] == '\0') ...
which we compile into: if (i == 6), saving a load and
materialization of the global address. This also exposes
loop trip count information to later passes in many cases.
This triggers hundreds of times in xalancbmk, which is where I first
noticed it, but it also triggers in many other apps. Here are a few
interesting ones from various apps:
@must_be_connected_without = internal constant [8 x i8*] [i8* getelementptr inbounds ([3 x i8]* @.str64320, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str27283, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str71327, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str72328, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str18274, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8]* @.str11267, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str32288, i64 0, i64 0), i8* null], align 32 ; <[8 x i8*]*> [#uses=2]
%scevgep.i = getelementptr [8 x i8*]* @must_be_connected_without, i64 0, i64 %indvar.i ; <i8**> [#uses=1]
%17 = load ...
%18 = icmp eq i8* %17, null ; <i1> [#uses=1]
-> icmp eq i64 %indvar.i, 7
@yytable1095 = internal constant [84 x i8] c"\12\01(\05\06\07\08\09\0A\0B\0C\0D\0E1\0F\10\11266\1D: \10\11,-,0\03'\10\11B6\04\17&\18\1945\05\06\07\08\09\0A\0B\0C\0D\0E\1E\0F\10\11*\1A\1B\1C$3+>#%;<IJ=ADFEGH9KL\00\00\00C", align 32 ; <[84 x i8]*> [#uses=2]
%57 = getelementptr inbounds [84 x i8]* @yytable1095, i64 0, i64 %56 ; <i8*> [#uses=1]
%mode.0.in = getelementptr inbounds [9 x i32]* @mb_mode_table, i64 0, i64 %.pn ; <i32*> [#uses=1]
load ...
%64 = icmp eq i8 %58, 4 ; <i1> [#uses=1]
-> icmp eq i64 %.pn, 35 ; <i1> [#uses=0]
@gsm_DLB = internal constant [4 x i16] [i16 6554, i16 16384, i16 26214, i16 32767]
%scevgep.i = getelementptr [4 x i16]* @gsm_DLB, i64 0, i64 %indvar.i ; <i16*> [#uses=1]
%425 = load %scevgep.i
%426 = icmp eq i16 %425, -32768 ; <i1> [#uses=0]
-> false
llvm-svn: 92411
pointer to int casts that confuse later optimizations. See PR3351
for details.
This improves but doesn't complete fix 483.xalancbmk because llvm-gcc
does this xform in GCC's "fold" routine as well. Clang++ will do
better I guess.
llvm-svn: 92408
positive and negative forms of constants together. This
allows us to compile:
int foo(int x, int y) {
return (x-y) + (x-y) + (x-y);
}
into:
_foo: ## @foo
subl %esi, %edi
leal (%rdi,%rdi,2), %eax
ret
instead of (where the 3 and -3 were not factored):
_foo:
imull $-3, 8(%esp), %ecx
imull $3, 4(%esp), %eax
addl %ecx, %eax
ret
this started out as:
movl 12(%ebp), %ecx
imull $3, 8(%ebp), %eax
subl %ecx, %eax
subl %ecx, %eax
subl %ecx, %eax
ret
This comes from PR5359.
llvm-svn: 92381
non-templated IRBuilderBase class. Move that large CreateGlobalString
out of line, eliminating the need to #include GlobalVariable.h in IRBuilder.h
llvm-svn: 92227
SDISel. This optimization was causing simplifylibcalls to
introduce type-unsafe nastiness. This is the first step, I'll be
expanding the memcmp optimizations shortly, covering things that
we really really wouldn't want simplifylibcalls to do.
llvm-svn: 92098
load is needed when we have a small store into a large alloca (at which
point we get a load/insert/store sequence), but when you do a full-sized
store, this load ends up being dead.
This dead load is bad in really large nasty testcases where the load ends
up causing mem2reg to insert large chains of dependent phi nodes which only
ADCE can delete. Instead of doing this, just don't insert the dead load.
This fixes rdar://6864035
llvm-svn: 91917
missing check that an array reference doesn't go past the end of the array,
and remove some redundant checks for in-bound array and vector references
that are no longer needed.
llvm-svn: 91897
by merging all returns in a function into a single one, but simplifycfg
currently likes to duplicate the return (an unfortunate choice!)
llvm-svn: 91890
instead of stored. This reduces memdep memory usage, and also eliminates a bunch of
weakvh's. This speeds up gvn on gcc.c-torture/20001226-1.c from 23.9s to 8.45s (2.8x)
on a different machine than earlier.
llvm-svn: 91885
load to avoid even messing around with SSAUpdate at all. In this case (which
is very common, we can just use the input value directly).
This speeds up GVN time on gcc.c-torture/20001226-1.c from 36.4s to 16.3s,
which still isn't great, but substantially better and this is a simple speedup
that applies to lots of different cases.
llvm-svn: 91851
two-element arrays. After restructuring the SROA code, it was not safe to
do this without adding more checking. It is not clear that this special-case
has really been useful, and removing this simplifies the code quite a bit.
llvm-svn: 91828
implement some optimizations for MIN(MIN()) and MAX(MAX()) and
MIN(MAX()) etc. This substantially improves the code in PR5822 but
doesn't kick in much elsewhere. 2 max's were optimized in
pairlocalalign and one in smg2000.
llvm-svn: 91814
Use the presence of NSW/NUW to fold "icmp (x+cst), x" to a constant in
cases where it would otherwise be undefined behavior.
Surprisingly (to me at least), this triggers hundreds of the times in
a few benchmarks: lencode, ldecode, and 466.h264ref seem to *really*
like this.
llvm-svn: 91812
where instcombine would have to split a critical edge due to a
phi node of an invoke. Since instcombine can't change the CFG,
it has to bail out from doing the transformation.
llvm-svn: 91763
* change FindElementAndOffset to return a uint64_t instead of unsigned, and
to identify the type to be used for that result in a GEP instruction.
* move "isa<ConstantInt>" to be first in conditional.
* replace some dyn_casts with casts.
* add a comment about handling mem intrinsics.
llvm-svn: 91762
contains another loop, or an instruction. The loop form is
substantially more efficient on large loops than the typical
code it replaces.
llvm-svn: 91654
of 91296 that caused trouble -- the Processed list needs to be
preserved for the livetime of the pass, as AddUsersIfInteresting
is called from other passes.
llvm-svn: 91641
problem", this broke llvm-gcc bootstrap for release builds on
x86_64-apple-darwin10.
This reverts commit db22309800b224a9f5f51baf76071d7a93ce59c9.
llvm-svn: 91534
found last time. Instead of trying to modify the IR while iterating over it,
I've change it to keep a list of WeakVH references to dead instructions, and
then delete those instructions later. I also added some special case code to
detect and handle the situation when both operands of a memcpy intrinsic are
referencing the same alloca.
llvm-svn: 91459
isPodLike type trait. This is a generally useful type trait for
more than just DenseMap, and we really care about whether something
acts like a pod, not whether it really is a pod.
llvm-svn: 91421
While scanning through the uses of an alloca, keep track of the current offset
relative to the start of the alloca, and check memory references to see if
the offset & size correspond to a component within the alloca. This has the
nice benefit of unifying much of the code from isSafeUseOfAllocation,
isSafeElementUse, and isSafeUseOfBitCastedAllocation. The code to rewrite
the uses of a promoted alloca, after it is determined to be safe, is
reorganized in the same way.
Also, when rewriting GEP instructions, mark them as "in-bounds" since all the
indices are known to be safe.
llvm-svn: 91184
value size. This only manifested when memdep inprecisely returns clobber,
which is do to a caching issue in the PR5744 testcase. We can 'efficiently
emulate' this by using '-no-aa'
llvm-svn: 91004
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
llvm-svn: 90926
handle cases like this:
void test(int N, double* G) {
long j;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
where G[1] isn't live into the loop.
llvm-svn: 90041
array indexes. The "complex" case of SRoA still handles them, and correctly.
This fixes a weirdness where we'd correctly avoid transforming A[0][42] if
the 42 was too large, but we'd only do it if it was one gep, not two separate
ones.
llvm-svn: 90007
generates store to undef and some generates store to null as the idiom
for undefined behavior. Since simplifycfg zaps both, don't remove the
undefined behavior in instcombine.
llvm-svn: 89971
tests/Transforms/InstCombine/shufflemask-undef.ll. If
anyone cares, the use of 2*e here (and the equivalent
all over the place in instcombine) seems wrong, though
harmless: it should really be twice the length of the
input vector. I think shufflevector used to require
that the mask have the same length as the input, but I
don't think that's true any more. I don't care enough
about vectors to do anything about this...
llvm-svn: 89456
they are lowered to instruction sequences more complex than a simple
load, such that CodeGen cannot rematerialize them, a reload from a
spill slot is likely to be cheaper than the complex sequence.
llvm-svn: 89374
cannot be folded into target cmp instruction.
- Avoid a phase ordering issue where early cmp optimization would prevent the
later count-to-zero optimization.
- Add missing checks which could cause LSR to reuse stride that does not have
users.
- Fix a bug in count-to-zero optimization code which failed to find the pre-inc
iv's phi node.
- Remove, tighten, loosen some incorrect checks disable valid transformations.
- Quite a bit of code clean up.
llvm-svn: 86969
making the new LVI stuff smart enough to subsume some special
cases in the old code. Disable them when LVI is around, the
testcase still passes.
llvm-svn: 86951
start using them in a trivial way when -enable-jump-threading-lvi
is passed. enable-jump-threading-lvi will be my playground for
awhile.
llvm-svn: 86789
debug intrinsics, and an unconditional branch when possible. This
reuses the TryToSimplifyUncondBranchFromEmptyBlock function split
out of simplifycfg.
llvm-svn: 86722
just one level deep. On the testcase we go from getting this:
F1: ; preds = %T2
%F = and i1 true, %cond ; <i1> [#uses=1]
br i1 %F, label %X, label %Y
to a fully threaded:
F1: ; preds = %T2
br label %Y
This changes gets us to the point where we're forming (too many) switch
instructions on doug's strswitch testcase.
llvm-svn: 86646
except that the result may not be a constant. Switch jump threading to
use it so that it gets things like (X & 0) -> 0, which occur when phi preds
are deleted and the remaining phi pred was a zero.
llvm-svn: 86637
This patch forbids implicit conversion of DenseMap::const_iterator to
DenseMap::iterator which was possible because DenseMapIterator inherited
(publicly) from DenseMapConstIterator. Conversion the other way around is now
allowed as one may expect.
The template DenseMapConstIterator is removed and the template parameter
IsConst which specifies whether the iterator is constant is added to
DenseMapIterator.
Actually IsConst parameter is not necessary since the constness can be
determined from KeyT but this is not relevant to the fix and can be addressed
later.
Patch by Victor Zverovich!
llvm-svn: 86636
here:
1) We need to avoid processing sigma nodes as phi nodes for constraint generation.
2) We need to generate constraints for comparisons against constants properly.
This includes our first working ABCD test!
llvm-svn: 86498
graphs being produced. The cause was that we were incorrectly marking sigma instructions as
processed after handling the sigma-specific constraints for them, potentially neglecting to
process them as normal instructions as well.
Unfortunately, the testcase that inspired this still doesn't work because of a bug in the solver,
which is next on the list to debug.
llvm-svn: 86486
when both the source and dest are illegal types, since it would cause
the phi to grow (for example, we shouldn't transform test14b's phi to
a phi on i320). This fixes an infinite loop on i686 bootstrap with
phi slicing turned on, so turn it back on.
llvm-svn: 86483
not turn a PHI in a legal type into a PHI of an illegal type, and
add a new optimization that breaks up insane integer PHI nodes into
small pieces (PR3451).
llvm-svn: 86443
(eliminating some extends) if the new type of the
computation is legal or if both the source and dest
are illegal. This prevents instcombine from changing big
chains of computation into i64 on 32-bit targets for
example.
llvm-svn: 86398
predicates. This allows us to jump thread things like:
_ZN12StringSwitchI5ColorE4CaseILj7EEERS1_RAT__KcRKS0_.exit119:
%tmp1.i24166 = phi i8 [ 1, %bb5.i117 ], [ %tmp1.i24165, %_Z....exit ], [ %tmp1.i24165, %bb4.i114 ]
%toBoolnot.i87 = icmp eq i8 %tmp1.i24166, 0 ; <i1> [#uses=1]
%tmp4.i90 = icmp eq i32 %tmp2.i, 6 ; <i1> [#uses=1]
%or.cond173 = and i1 %toBoolnot.i87, %tmp4.i90 ; <i1> [#uses=1]
br i1 %or.cond173, label %bb4.i96, label %_ZN12...
Where it is "obvious" that when coming from %bb5.i117 that the 'and' is always
false. This triggers a surprisingly high number of times in the testsuite,
and gets us closer to generating good code for doug's strswitch testcase.
This also make a bunch of other code in jump threading redundant, I'll rip
out in the next patch. This survived an enable-checking llvm-gcc bootstrap.
llvm-svn: 86264
to EmitGEPOffset.
Implement some new transforms for optimizing
subtracts of two pointer to ints into the same vector. This happens
for C++ iterator idioms for example, stringmap takes a const char*
that points to the start and end of a string. Once inlined, we want
the pointer difference to turn back into a length.
This is rdar://7362831.
llvm-svn: 86021
functions that don't have local linkage. Basically, we need to be more
careful about propagating argument information to functions whose results
we aren't tracking. This fixes a miscompilation of
LLVMCConfigurationEmitter.cpp when built with an llvm-gcc that has ipsccp
enabled.
llvm-svn: 85923
function to calls of that function, regardless of whether it has local
linkage or has its address taken. Not escaping should only affect
whether we make an aggressive assumption about the arguments to a
function, not whether we can track the result of it.
llvm-svn: 85795
a DenseMap. Doing this required being aware of subtle iterator
invalidation issues, but it provides a big speedup. In a
release-asserts build, this sped up optimizing 403.gcc from
1.34s -> 0.79s (IPSCCP) and 1.11s -> 0.44s (SCCP).
This commit also conflates in a bunch of general cleanups, sorry.
llvm-svn: 85788
phis, it didn't preserve the alignment of the load. This is a missed
optimization of the alignment is high and a miscompilation when the
alignment is low.
llvm-svn: 85736
PHI operands by the predecessor order, sort them by the order used by the
first PHI in the block. This is still suffucient to expose duplicates.
llvm-svn: 85634
Checks on Demand algorithm which looks at arbitrary branches instead of loop
iterations. This is GSoC work by Andre Tavares with only editorial changes
applied!
llvm-svn: 85382
GEPs (more than one non-zero index) into simple GEPs (at most one
non-zero index). In some simple experiments using this it's not
uncommon to see 3% overall code size wins, because it exposes
redundancies that can be eliminated, however it's tricky to use
because instcombine aggressively undoes the work that this pass does.
llvm-svn: 85144
strides for now, because it doesn't handle them correctly. This fixes a
miscompile of SingleSource/Benchmarks/Misc-C++/ray.
This problem was usually hidden because indvars transforms such induction
variables into negations of canonical induction variables.
llvm-svn: 85118
used elsewhere - an exit block is a block outside the loop branched to
from within the loop. An exiting block is a block inside the loop that
branches out.
llvm-svn: 85019
Update all analysis passes and transforms to treat free calls just like FreeInst.
Remove RaiseAllocations and all its tests since FreeInst no longer needs to be raised.
llvm-svn: 84987
Analysis/ConstantFolding.cpp. This doesn't change the behavior of
instcombine but makes other clients of ConstantFoldInstruction
able to handle loads. This was partially extracted from Eli's patch
in PR3152.
llvm-svn: 84836
Most changes are cleanup, but there is 1 correctness fix:
I fixed InstCombine so that the icmp is removed only if the malloc call is removed (which requires explicit removal because the Worklist won't DCE any calls since they can have side-effects).
llvm-svn: 84772
"In the existing code, if the load and the value to replace it with are
of different types *and* target data is available, it tries to use the
target data to coerce the replacement value to the type of the load.
Otherwise, it skips all effort to handle the type mismatch and just
feeds the wrongly-typed replacement value to replaceAllUsesWith, which
triggers an assertion.
The patch replaces it with an outer if checking for type mismatch, and
an inner if-else that checks whether target data is available and, if
not, returns false rather than trying to replace the load."
Patch by Kenneth Uildriks!
llvm-svn: 84739
the estimated code size and the number of blocks when deciding whether to
do a non-trivial unswitch. This protects it from some very undesirable
worst-case behavior on large numbers of loop-unswitchable conditions, such
as in the testcase in PR5259.
llvm-svn: 84661
when the invoke had multiple return values: it set the lattice value only on the
extractvalue.
This caused the invoke's lattice value to remain the default (undefined), and
later propagated to extractvalue's operand, which incorrectly introduces
undefined behavior.
llvm-svn: 84637
Update testcases that rely on malloc insts being present.
Also prematurely remove MallocInst handling from IndMemRemoval and RaiseAllocations to help pass tests in this incremental step.
llvm-svn: 84292
don't bother every time going around the main worklist. This speeds up a
release-asserts opt -std-compile-opts on 403.gcc by about 4% (1.5s). It
seems to speed up the most expensive instances of instcombine by ~10%.
llvm-svn: 84171
instruction (which disqualifies stores, unreachable, etc) and at least the
first operand is a constant. This filters out a lot of obvious cases that
can't be folded. Also, switch the IRBuilder to a TargetFolder, which tries
harder.
llvm-svn: 84170
BasicBlocks, so that it doesn't blindly procede in the presence of
large individual BasicBlocks. This addresses a class of code-size
expansion problems.
llvm-svn: 83992
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
Try #3, this time with some unneeded debug info stuff removed
which was causing dead pointers to be added to the worklist.
llvm-svn: 83818
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83814
into a shuffle even if it was used by another insertelement. If the
visitation order of instcombine was wrong, this would turn a chain of
insertelements into a chain of shufflevectors, which was quite painful.
Since CollectShuffleElements handles these cases, the code can just
be nuked.
llvm-svn: 83810
input the the mul is a zext from bool, just that it is all zeros
other than the low bit. This fixes some phase ordering issues
that would cause us to miss some xforms in mul.ll when the worklist
is visited differently.
llvm-svn: 83794
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
llvm-svn: 83790
For now the metadata of sinked/hoisted instructions is still wrong, but that'll
be fixed when instructions will have debug metadata directly attached.
llvm-svn: 83786
done by condprop, but do it in a much more general form. The
basic idea is that we can do a limited form of tail duplication
in the case when we have a branch on a phi. Moving the branch
up in to the predecessor block makes instruction selection
much easier and encourages chained jump threadings.
llvm-svn: 83759
from GVN, this also speeds it up, inserts fewer PHI nodes (see the
testcase) and allows it to remove more loads (due to fewer PHI nodes
standing in the way).
llvm-svn: 83746
DemoteRegToStack. This makes it more efficient (because it isn't
creating a ton of load/stores that are eventually removed by a later
mem2reg), and more slightly more effective (because those load/stores
don't get in the way of threading).
llvm-svn: 83706
to declare that they preserve other passes without needing to pull in
additional header file or library dependencies. Convert MachineFunctionPass
and CodeGenLICM to make use of this.
llvm-svn: 83555
already on the worklist, and print Visited when an instruction is about to be
visited. Net, on one input, this reduced the output size by at least 9x.
llvm-svn: 83510
the new predicates I added) instead of going through a context and doing a
pointer comparison. Besides being cheaper, this allows a smart compiler
to turn the if sequence into a switch.
llvm-svn: 83297
that are phi nodes. Also tighten up FoldOpIntoPhi to treat constantexpr
operands to phis just like other variables, avoiding moving constantexpr
computations around.
Patch by Daniel Dunbar.
llvm-svn: 82913
from a piece of a large store when both are in the same block.
This allows clang to compile the testcase in PR4216 to this code:
_test_bitfield:
movl 4(%esp), %eax
movl %eax, %ecx
andl $-65536, %ecx
orl $32962, %eax
andl $40186, %eax
orl %ecx, %eax
ret
This is not ideal, but is a whole lot better than the code produced
by llvm-gcc:
_test_bitfield:
movw $-32574, %ax
orw 4(%esp), %ax
andw $-25350, %ax
movw %ax, 4(%esp)
movw 7(%esp), %cx
shlw $8, %cx
movzbl 6(%esp), %edx
orw %cx, %dx
movzwl %dx, %ecx
shll $16, %ecx
movzwl %ax, %eax
orl %ecx, %eax
ret
and dramatically better than that produced by gcc 4.2:
_test_bitfield:
pushl %ebx
call L3
"L00000000001$pb":
L3:
popl %ebx
movl 8(%esp), %eax
leal 0(,%eax,4), %edx
sarb $7, %dl
movl %eax, %ecx
andl $7168, %ecx
andl $-7201, %ebx
movzbl %dl, %edx
andl $1, %edx
sall $5, %edx
orl %ecx, %ebx
orl %edx, %ebx
andl $24, %eax
andl $-58336, %ebx
orl %eax, %ebx
orl $32962, %ebx
movl %ebx, %eax
popl %ebx
ret
llvm-svn: 82439
so that nonlocal and partially redundant loads can use it as well.
The testcase shows examples of craziness this can handle. This triggers
*many* times in 176.gcc.
llvm-svn: 82403
(and load -> load) when the base pointers must alias but when
they are different types. This occurs very very frequently in
176.gcc and other code that uses bitfields a lot.
llvm-svn: 82399
constants out of loops. These aren't covered by the regular LICM
pass, because in LLVM IR constants don't require separate
instructions. They're not always covered by the MachineLICM pass
either, because it doesn't know how to unfold folded constant-pool
loads. This is somewhat experimental at this point, and off by
default.
llvm-svn: 82076
more than one phi, since that leads to higher register pressure on
entry to the phi. This is especially problematic when the phi is in
a loop header, as it increases register pressure throughout the loop.
llvm-svn: 81993
loop exit edge -- new PHIs may be needed not only for the additional
splits that are made to preserve LoopSimplify form, but also for the
original split. Factor out the code that inserts new PHIs so that it
can be used for both. Remove LoopRotation.cpp's code for manually
updating LCSSA form, as it is now redundant. This fixes PR4934.
llvm-svn: 81363
that get created during loop unswitching, and fix SplitBlockPredecessors'
LCSSA updating code to create new PHIs instead of trying to just move
existing ones.
Also, optimize Loop::verifyLoop, since it gets called a lot. Use
searches on a sorted list of blocks instead of calling the "contains"
function, as is done in other places in the Loop class, since "contains"
does a linear search. Also, don't call verifyLoop from LoopSimplify or
LCSSA, as the PassManager is already calling verifyLoop as part of
LoopInfo's verifyAnalysis.
llvm-svn: 81221
extractelement operations into a bitcast of the pointer,
then a gep, then a scalar load. Disable this when the vector
only has one element, because it leads to infinite loops in
instcombine (PR4908).
This transformation seems like a really bad idea to me, as it
will likely disable CSE of vector load/stores etc and can be
better done in the code generator when profitable. This
goes all the way back to the first days of packed types,
r25299 specifically.
I'll let those people who care about the performance of vector
code decide what to do with this.
llvm-svn: 81185
- I think there are more instances of this, but I think they are fixed in Dan's
incoming patch. This one was preventing me from doing a bugpoint reduction
though.
llvm-svn: 81103
Constant uniquing tables. This allows distinct ConstantExpr objects
with the same operation and different flags.
Even though a ConstantExpr "a + b" is either always overflowing or
never overflowing (due to being a ConstantExpr), it's still necessary
to be able to represent it both with and without overflow flags at
the same time within the IR, because the safety of the flag may
depend on the context of the use. If the constant really does overflow,
it wouldn't ever be safe to use with the flag set, however the use
may be in code that is never actually executed.
This also makes it possible to merge all the flags tests into a single test.
llvm-svn: 80998
instead of a bool argument, and to do the dominator check itself.
This makes it eaiser to use when DominatorTree information is
available.
llvm-svn: 80920
simplifylibcalls optimization is thus valid for C++ but not C.
It's not important enough to worry about for C++ apps, so just
remove it.
rdar://7191924
llvm-svn: 80887
changes: SimplifyDemandedBits can't use the builder yet because it
has the wrong insertion point. This fixes a crash building
MultiSource/Benchmarks/PAQ8p
llvm-svn: 80537
is itself a bitcast. Since we have gep(bitcast(bitcast(y))) in this
case, just wait for the two bitcasts to get zapped. This prevents
instcombine from confusing some aliasing stuff, and allows it to
directly eliminate the load in the testcase.
llvm-svn: 80508
workslist and is set to insert new instructions before the current one.
Convert a bunch of stuff that used to call InsertNewInstBefore over to
use it, greatly simplifying code and making it more natural.
There is still a lot more to go, but this is a good start.
llvm-svn: 80492
if the operand is not an instruction.
Simplify most uses of AddOperandsToWorkList to use AddValue and
inline it into the one remaining callsite.
llvm-svn: 80488
former looks too much like AddUsersToWorkList and keeps
confusing me.
Remove AddSoonDeadInstToWorklist and change its two callers
to do the same thing in a simpler way.
llvm-svn: 80486
into their callers. simplify ReplaceInstUsesWith. Make
EraseInstFromFunction only add operands to the worklist if
there aren't too many of them (this was a scalability win
for crazy programs that was only infrequently enforced).
Switch more code to using EraseInstFromFunction instead of
duplicating it inline. Change some fcmp/icmp optimizations
to modify fcmp/icmp in place instead of creating a new one
and deleting the old one just to change the predicate.
llvm-svn: 80483
and introduce a new Instruction::isIdenticalTo which tests for full
identity, including the SubclassOptionalData flags. Also, fix the
Instruction::clone implementations to preserve the SubclassOptionalData
flags. Finally, teach several optimizations how to handle
SubclassOptionalData correctly, given these changes.
This fixes the counterintuitive behavior of isIdenticalTo not comparing
the full value, and clone not returning an identical clone, as well as
some subtle bugs that could be caused by these.
Thanks to Nick Lewycky for reporting this, and for an initial patch!
llvm-svn: 80038
sinking code, since they are special. If the loop preheader happens
to be the entry block of a function, don't sink static allocas
out of it. This fixes PR4775.
llvm-svn: 80010
the new load by the old load instead of by the extract element because
a store could have occurred between the load and extract element.
llvm-svn: 78891
- Part of optimal static profiling patch sequence by Andreas Neustifter.
- Store edge, block, and function information separately for each functions
(instead of in one giant map).
- Return frequencies as double instead of int, and use a sentinel value for
missing information.
llvm-svn: 78477
appropriate. Patch per report on llvmdev. No testcase because the
original report didn't come with a testcase, and I can't come up with a case
that actually fails.
llvm-svn: 77986
a Twine, e.g., for names).
- I am a little ambivalent about this; we don't want the string conversion of
utostr, but using overload '+' mixed with string and integer arguments is
sketchy. On the other hand, this particular usage is something of an idiom.
llvm-svn: 77579
- Some clients which used DOUT have moved to DEBUG. We are deprecating the
"magic" DOUT behavior which avoided calling printing functions when the
statement was disabled. In addition to being unnecessary magic, it had the
downside of leaving code in -Asserts builds, and of hiding potentially
unnecessary computations.
llvm-svn: 77019
- Yay for '-'s and simplifications!
- I kept StringMap::GetOrCreateValue for compatibility purposes, this can
eventually go away. Likewise the StringMapEntry Create functions still follow
the old style.
- NIFC.
llvm-svn: 76888
also apply to vectors. This allows us to compile this:
#include <emmintrin.h>
__m128i a(__m128 a, __m128 b) { return a==a & b==b; }
__m128i b(__m128 a, __m128 b) { return a!=a | b!=b; }
to:
_a:
cmpordps %xmm1, %xmm0
ret
_b:
cmpunordps %xmm1, %xmm0
ret
with clang instead of to a ton of horrible code.
llvm-svn: 76863
Getelementptrs that are defined to wrap are virtually useless to
optimization, and getelementptrs that are undefined on any kind
of overflow are too restrictive -- it's difficult to ensure that
all intermediate addresses are within bounds. I'm going to take
a different approach.
Remove a few optimizations that depended on this flag.
llvm-svn: 76437
insertelement/extractelement.
I'm not entirely sure this is precisely what we want to do: should we
prefer bitcast(insertelement) or insertelement(bitcast)? Similarly. should we
prefer extractelement(bitcast) or bitcast(extractelement)?
llvm-svn: 76345
all values belonging to the intersection will belong to the resulting range.
The former was inconsistent about that point (either way is fine, just pick
one.) This is part of PR4545.
llvm-svn: 76289
GEPOperator's hasNoPointer0verflow(), and make a few places in instcombine
that create GEPs that may overflow clear the NoOverflow value. Among
other things, this partially addresses PR2831.
llvm-svn: 76252
in a convenient manner, factoring out some common code from
InstructionCombining and ValueTracking. Move the contents of
BinaryOperators.h into Operator.h and use Operator to generalize them
to support ConstantExprs as well as Instructions.
llvm-svn: 76232
isSafeToSpeculativelyExecute. The new method is a bit closer to what
the callers actually care about in that it rejects more things callers
don't want. It also adds more precise handling for integer
division, and unifies code for analyzing the legality of a speculative
load.
llvm-svn: 76150
This adds location info for all llvm_unreachable calls (which is a macro now) in
!NDEBUG builds.
In NDEBUG builds location info and the message is off (it only prints
"UREACHABLE executed").
llvm-svn: 75640
(I think it's reasonably clear that we want to have a canonical form for
constructs like this; if anyone thinks that a select is not the best
canonical form, please tell me.)
llvm-svn: 75531
using the Curiously Recurring Template Pattern with LoopBase.
This will help further refactoring, and future functionality for
Loop. Also, Headers can now foward-declare Loop, instead of pulling
in LoopInfo.h or doing tricks.
llvm-svn: 75519
the changes are allowed by not calling this function for bitcasts.
The Instruction::AShr case is dead because
SimplifyDemandedInstructionBits handles that case.
llvm-svn: 75514
This involves temporarily hard wiring some parts to use the global context. This isn't ideal, but it's
the only way I could figure out to make this process vaguely incremental.
llvm-svn: 75445
Make llvm_unreachable take an optional string, thus moving the cerr<< out of
line.
LLVM_UNREACHABLE is now a simple wrapper that makes the message go away for
NDEBUG builds.
llvm-svn: 75379
per icmp predicate out of predsimplify and into ConstantRange.
Add another utility method that determines whether one range is a subset of
another. Combine with the former to determine whether icmp pred range, range
is known to be true or not.
llvm-svn: 75357
inserted to replace that value must dominate all of of the basic
blocks associated with the uses of the value in the PHI, not just
one of them.
llvm-svn: 74376
This helps it avoid reusing an instruction that doesn't dominate all
of the users, in cases where the original instruction was inserted
before all of the users were known. This may result in redundant
expansions of sub-expressions that depend on loop-unpredictable values
in some cases, however this isn't very common, and it primarily impacts
IndVarSimplify, so GVN can be expected to clean these up.
This eliminates the need for IndVarSimplify's FixUsesBeforeDefs,
which fixes several bugs.
llvm-svn: 74352
terminator, instead of after the last phi. This fixes a bug
exposed by ScalarEvolution analyzing more kinds of loops.
This fixes PR4436.
llvm-svn: 74072
trip counts in more cases.
Generalize ScalarEvolution's isLoopGuardedByCond code to recognize
And and Or conditions, splitting the code out into an
isNecessaryCond helper function so that it can evaluate Ands and Ors
recursively, and make SCEVExpander be much more aggressive about
hoisting instructions out of loops.
test/CodeGen/X86/pr3495.ll has an additional instruction now, but
it appears to be due to an arbitrary register allocation difference.
llvm-svn: 74048
now, this hasn't mattered, because ScalarEvolution hasn't been able
to compute trip counts for loops with multiple exits. But it will
soon.
llvm-svn: 73864
as if they were multiple uses of the same instruction. This interacts
well with the existing loadpre that j-t does to open up many new jump
threads earlier.
llvm-svn: 73768
casted induction variables in cases where the cast
isn't foldable. It ended up being a pessimization in
many cases. This could be fixed, but it would require
a bunch of complicated code in IVUsers' clients. The
advantages of this approach aren't visible enough to
justify it at this time.
llvm-svn: 73706
move loads back past a check that the load address
is valid, see new testcase. The test that went
in with 72661 has exactly this case, except that
the conditional it's moving past is checking
something else; I've settled for changing that
test to reference a global, not a pointer. It
may be possible to scan all the tests you pass and
make sure none of them are checking any component
of the address, but it's not trivial and I'm not
trying to do that here.
llvm-svn: 73632
failures.
To support this, add some utility functions to Type to help support
vector/scalar-independent code. Change ConstantInt::get and
ConstantFP::get to support vector types, and add an overload to
ConstantInt::get that uses a static IntegerType type, for
convenience.
Introduce a new getConstant method for ScalarEvolution, to simplify
common use cases.
llvm-svn: 73431
induction variable when the addrec to be expanded does not require
a wider type. This eliminates the need for IndVarSimplify to
micro-manage SCEV expansions, because SCEVExpander now
automatically expands them in the form that IndVarSimplify considers
to be canonical. (LSR still micro-manages its SCEV expansions,
because it's optimizing for the target, rather than for
other optimizations.)
Also, this uses the new getAnyExtendExpr, which has more clever
expression simplification logic than the IndVarSimplify code it
replaces, and this cleans up some ugly expansions in code such as
the included masked-iv.ll testcase.
llvm-svn: 73294
integer and floating-point opcodes, introducing
FAdd, FSub, and FMul.
For now, the AsmParser, BitcodeReader, and IRBuilder all preserve
backwards compatability, and the Core LLVM APIs preserve backwards
compatibility for IR producers. Most front-ends won't need to change
immediately.
This implements the first step of the plan outlined here:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt
llvm-svn: 72897
instcombine doesn't know when it's safe. To partially compensate
for this, introduce new code to do this transformation in
dagcombine, which can use UnsafeFPMath.
llvm-svn: 72872
RewriteStoreUserOfWholeAlloca deal with tail padding because
isSafeUseOfBitCastedAllocation expects them to. Otherwise, we crash
trying to erase the bitcast.
llvm-svn: 72688
rewrite the comparison if there is any implicit extension or truncation
on the induction variable. I'm planning for IVUsers to eventually take
over some of the work of this code, and for it to be generalized.
llvm-svn: 72496
in the case where a loop exit value cannot be computed, instead of only in
some cases while using SCEVCouldNotCompute in others. This simplifies
getSCEVAtScope's callers.
llvm-svn: 72375
leave the original comparison in place if it has other uses, since the
other uses won't be dominated by the new comparison instruction.
llvm-svn: 72369
Fix by clearing the rewriter cache before deleting the trivially dead
instructions.
Also make InsertedExpressions use an AssertingVH to catch these
bugs easier.
llvm-svn: 72364
assuming that the use of the value is in a block dominated by the
"normal" destination. LangRef.html and other documentation sources
don't explicitly guarantee this, but it seems to be assumed in
other places in LLVM at least.
This fixes an assertion failure on the included testcase, which
is derived from the Ada testsuite.
FixUsesBeforeDefs is a temporary measure which I'm looking to
replace with a more capable solution.
llvm-svn: 72266
Instcombine to be more aggressive about using SimplifyDemandedBits
on shift nodes. This allows a shift to be simplified to zero in the
included test case.
llvm-svn: 72204
of the comparison is defined inside the loop. This fixes a
use-before-def problem, because the transformation puts a use
of the RHS outside the loop.
llvm-svn: 72149
instructions. It attempts to create high-level multi-operand GEPs,
though in cases where this isn't possible it falls back to casting
the pointer to i8* and emitting a GEP with that. Using GEP instructions
instead of ptrtoint+arithmetic+inttoptr helps pointer analyses that
don't use ScalarEvolution, such as BasicAliasAnalysis.
Also, make the AddrModeMatcher more aggressive in handling GEPs.
Previously it assumed that operand 0 of a GEP would require a register
in almost all cases. It now does extra checking and can do more
matching if operand 0 of the GEP is foldable. This fixes a problem
that was exposed by SCEVExpander using GEPs.
llvm-svn: 72093
without one. Use it where we were using abs on
int64_t objects.
(I strongly suspect the casts to unsigned in the
fragments in LoopStrengthReduce are not doing whatever
the original intent was, but the obvious change to
uint64_t doesn't work. Maybe later.)
llvm-svn: 71612
and generalize it so that it can be used by IndVarSimplify. Implement the
base IndVarSimplify transformation code using IVUsers. This removes
TestOrigIVForWrap and associated code, as ScalarEvolution now has enough
builtin overflow detection and folding logic to handle all the same cases,
and more. Run "opt -iv-users -analyze -disable-output" on your favorite
loop for an example of what IVUsers does.
This lets IndVarSimplify eliminate IV casts and compute trip counts in
more cases. Also, this happens to finally fix the remaining testcases
in PR1301.
Now that IndVarSimplify is being more aggressive, it occasionally runs
into the problem where ScalarEvolutionExpander's code for avoiding
duplicate expansions makes it difficult to ensure that all expanded
instructions dominate all the instructions that will use them. As a
temporary measure, IndVarSimplify now uses a FixUsesBeforeDefs function
to fix up instructions inserted by SCEVExpander. Fortunately, this code
is contained, and can be easily removed once a more comprehensive
solution is available.
llvm-svn: 71535
method, fixing a crash on PR4146. While the store will
ultimately overwrite the "padded size" number of bits in memory,
the stored value may be a subset of this size. This function
only wants to handle the case where all bits are stored.
llvm-svn: 71224
Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/
CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/test/
CodeGen/X86/change-compare-stride-1.ll
Failed with exit(1) at line 2
while running: grep {cmpq $-478,} change-compare-stride-1.ll.tmp
child process exited abnormally
llvm-svn: 71013
CallbackVH, with fixes. allUsesReplacedWith need to
walk the def-use chains and invalidate all users of a
value that is replaced. SCEVs of users need to be
recalcualted even if the new value is equivalent. Also,
make forgetLoopPHIs walk def-use chains, since any
SCEV that depends on a PHI should be recalculated when
more information about that PHI becomes available.
llvm-svn: 70927
ThreadEdge directly. This shares the code, but is just a refactoring.
* Make JumpThreading compute the set of loop headers and avoid threading
across them. This prevents jump threading from forming irreducible
loops (goodness) but also prevents it from threading in other cases that
are beneficial (see the comment above FindFunctionBackedges).
llvm-svn: 70820
makes ScalarEvolution::deleteValueFromRecords, and it's code that
subtly needed to be called before ReplaceAllUsesWith, unnecessary.
It also makes ValueDeletionListener unnecessary.
llvm-svn: 70645
of returning a list of pointers to Values that are deleted. This was
unsafe, because the pointers in the list are, by nature of what
RecursivelyDeleteDeadInstructions does, always dangling. Replace this
with a simple callback mechanism. This may eventually be removed if
all clients can reasonably be expected to use CallbackVH.
Use this to factor out the dead-phi-cycle-elimination code from LSR
utility function, and generalize it to use the
RecursivelyDeleteTriviallyDeadInstructions utility function.
This makes LSR more aggressive about eliminating dead PHI cycles;
adjust tests to either be less trivial or to simply expect fewer
instructions.
llvm-svn: 70636
of LSR. This makes the AddUsersIfInteresting phase of LSR a pure
analysis instead of a phase that potentially does CFG modifications.
The conditions where this code would actually perform a split are
rare, and in the cases where it actually would do a split the split
is usually undone by CodeGenPrepare, and in cases where splits
actually survive into codegen, they appear to hurt more often than
they help.
llvm-svn: 70625
target hooks canLosslesslyBitCastTo and isTruncateFree. This allows
targets to avoid worrying about handling all combinations of integer
and pointer types.
llvm-svn: 70555
with the persistent insertion point, and change IndVars to make
use of it. This fixes a bug where IndVars was holding on to a
stale insertion point and forcing the SCEVExpander to continue to
use it.
This fixes PR4038.
llvm-svn: 69892
have pointer types, though in contrast to C pointer types, SCEV
addition is never implicitly scaled. This not only eliminates the
need for special code like IndVars' EliminatePointerRecurrence
and LSR's own GEP expansion code, it also does a better job because
it lets the normal optimizations handle pointer expressions just
like integer expressions.
Also, since LLVM IR GEPs can't directly index into multi-dimensional
VLAs, moving the GEP analysis out of client code and into the SCEV
framework makes it easier for clients to handle multi-dimensional
VLAs the same way as other arrays.
Some existing regression tests show improved optimization.
test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to
the point where if-conversion started kicking in; I turned it off
for this test to preserve the intent of the test.
llvm-svn: 69258
sext around sext(shorter IV + constant), using a
longer IV instead, when it can figure out the
add can't overflow. This comes up a lot in
subscripting; mainly affects 64 bit.
llvm-svn: 69123
strncat :(
strncat(foo, "bar", 99)
would be optimized to
memcpy(foo+strlen(foo), "bar", 100, 1)
instead of
memcpy(foo+strlen(foo), "bar", 4, 1)"
Patch by Benjamin Kramer!
llvm-svn: 68905
integer types, unless they are already strange. This prevents it from
turning the code produced by SROA into crazy libcalls and stuff that
the code generator can't handle. In the attached example, the result
was an i96 multiply that caused the x86 backend to assert.
Note that if TargetData had an idea of what the legal types are for
a target that this could be used to stop instcombine from introducing
i64 muls, as Scott wanted.
llvm-svn: 68598
to/from integer types that are not intptr_t to convert to intptr_t
then do an integer conversion to the dest type. This exposes the
cast to the optimizer.
llvm-svn: 67638
1. Make instcombine always canonicalize trunc x to i1 into an icmp(x&1). This
exposes the AND to other instcombine xforms and is more of what the code
generator expects.
2. Rewrite the remaining trunc pattern match to use 'match', which
simplifies it a lot.
llvm-svn: 67635
linkage: the value may be replaced with something
different at link time. (Frontends that want to
allow values to be loaded out of weak constants can
give their constants weak_odr linkage).
llvm-svn: 67407
and was deleting Instructions without clearing the
corresponding map entry. This led to nondeterministic
behavior if the same address got allocated to another
Instruction within a short time.
llvm-svn: 67306
it is not APInt clean, but even when it is it needs to be evaluated carefully
to determine whether it is actually profitable.
This fixes a crash on PR3806
llvm-svn: 67134
changes.
For InvokeInst now all arguments begin at op_begin().
The Callee, Cont and Fail are now faster to get by
access relative to op_end().
This patch introduces some temporary uglyness in CallSite.
Next I'll bring CallInst up to a similar scheme and then
the uglyness will magically vanish.
This patch also exposes all the reliance of the libraries
on InvokeInst's operand ordering. I am thinking of taking
care of that too.
llvm-svn: 66920
in the Ada testcase. Reverting this only covers up
the real problem, which is a nasty conceptual difficulty
in the phi elimination pass: when eliminating phi nodes
in landing pads, the register copies need to come before
the invoke, not at the end of the basic block which is
too late... See PR3784.
llvm-svn: 66826
allocations. Apparently the assumption is there is an
instruction (terminator?) following the allocation so I
am allowing the same assumption.
llvm-svn: 66716
use, check also for the case where it has two uses,
the other being a llvm.dbg.declare. This is needed so
debug info doesn't affect codegen.
llvm-svn: 65970
testsuite:
Running /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore/test/CodeGen/X86/dg.exp ...
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore/test/CodeGen/X86/nancvt.ll
Failed with exit(1) at line 2
while running: grep 2147027116 nancvt.ll.tmp | count 3
count: expected 3 lines and got 0.
child process exited abnormally
FAIL: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore/test/CodeGen/X86/vec_ins_extract.ll
Failed with exit(1) at line 1
while running: llvm-as < /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvmCore/test/CodeGen/X86/vec_ins_extract.ll | opt -scalarrepl -instcombine | llc -march=x86 -mcpu=yonah | not /usr/bin/grep sub.*esp
subl $28, %esp
subl $28, %esp
child process exited abnormally
And more.
llvm-svn: 65758
to more accurately describe what it does. Expand its doxygen comment
to describe what the backedge-taken count is and how it differs
from the actual iteration count of the loop. Adjust names and
comments in associated code accordingly.
llvm-svn: 65382
trip counts that use signed comparisons. It's not obviously the best
approach for preserving trip count information, and at any rate there
isn't anything in the tree right now that makes use of that, so for
now always using zero-extensions is preferable.
llvm-svn: 65347
so that ScalarEvolution doesn't hang onto a dangling Loop*, which
could be a problem if another Loop happens to get allocated at the
same address.
llvm-svn: 65323
memcpy to match the alignment of the destination. It isn't necessary
for making loads and stores handled like the SSE loadu/storeu
intrinsics, and it was causing a performance regression in
MultiSource/Applications/JM/lencod.
The problem appears to have been a memcpy that copies from some
highly aligned array into an alloca; the alloca was then being
assigned a large alignment, which required codegen to perform
dynamic stack-pointer re-alignment, which forced the enclosing
function to have a frame pointer, which led to increased spilling.
llvm-svn: 65289
as legality. Make load sinking and gep sinking more careful: we only
do it when it won't pessimize loads from the stack. This has the added
benefit of not producing code that is unanalyzable to SROA.
llvm-svn: 65209
addresses, part 1. This fixes an obvious logic bug. Previously if the only
in-loop use is a PHI, it would return AllUsesAreAddresses as true.
llvm-svn: 65178
reduction of address calculations down to basic pointer arithmetic.
This is currently off by default, as it needs a few other features
before it becomes generally useful. And even when enabled, full
strength reduction is only performed when it doesn't increase
register pressure, and when several other conditions are true.
This also factors out a bunch of exisiting LSR code out of
StrengthReduceStridedIVUsers into separate functions, and tidies
up IV insertion. This actually decreases register pressure even
in non-superhero mode. The change in iv-users-in-other-loops.ll
is an example of this; there are two more adds because there are
two fewer leas, and there is less spilling.
llvm-svn: 65108
trip count value when the original loop iteration condition is
signed and the canonical induction variable won't undergo signed
overflow. This isn't required for correctness; it just preserves
more information about original loop iteration values.
Add a getTruncateOrSignExtend method to ScalarEvolution,
following getTruncateOrZeroExtend.
llvm-svn: 64918
are multiple IV's in a loop, some of them may under go signed
or unsigned wrapping even if the IV that's used in the loop
exit condition doesn't. Restrict sign-extension-elimination
and zero-extension-elimination to only those that operate on
the original loop-controlling IV.
llvm-svn: 64866
modified in a way that may effect the trip count calculation. Change
IndVars to use this method when it rewrites pointer or floating-point
induction variables instead of using a doInitialization method to
sneak these changes in before ScalarEvolution has a chance to see
the loop. This eliminates the need for LoopPass to depend on
ScalarEvolution.
llvm-svn: 64810
Enhance instcombine to use the preferred field of
GetOrEnforceKnownAlignment in more cases, so that regular IR operations are
optimized in the same way that the intrinsics currently are.
llvm-svn: 64623
when I was looking at functions used by python.
Highlights include, better largefile support (64-bit file sizes on 32-bit
systems), fputs string is nocapture, popen/pclose added (popen being noalias
return), modf and frexp and friends. Also added some missing 'break' statements
and combined identical sections.
llvm-svn: 64615
- Test for signed and unsigned wrapping conditions, instead of just
testing for non-negative induction ranges.
- Handle loops with GT comparisons, in addition to LT comparisons.
- Support more cases of induction variables that don't start at 0.
llvm-svn: 64532
addrec in a different loop to check the value being added to
the accumulated Start value, not the Start value before it has
the new value added to it. This prevents LSR from going crazy
on the included testcase. Dale, please review.
llvm-svn: 64440
loop induction on LP64 targets. When the induction variable is
used in addressing, IndVars now is usually able to inserst a
64-bit induction variable and eliminates the sign-extending cast.
This is also useful for code using C "short" types for
induction variables on targets with 32-bit addressing.
Inserting a wider induction variable is easy; the tricky part is
determining when trunc(sext(i)) expressions are no-ops. This
requires range analysis of the loop trip count. A common case is
when the original loop iteration starts at 0 and exits when the
induction variable is signed-less-than a fixed value; this case
is now handled.
This replaces IndVarSimplify's OptimizeCanonicalIVType. It was
doing the same optimization, but it was limited to loops with
constant trip counts, because it was running after the loop
rewrite, and the information about the original induction
variable is lost by that point.
Rename ScalarEvolution's executesAtLeastOnce to
isLoopGuardedByCond, generalize it to be able to test for
ICMP_NE conditions, and move it to be a public function so that
IndVars can use it.
llvm-svn: 64407
accessed at least once as a vector. This prevents it from
compiling the example in not-a-vector into:
define double @test(double %A, double %B) {
%tmp4 = insertelement <7 x double> undef, double %A, i32 0
%tmp = insertelement <7 x double> %tmp4, double %B, i32 4
%tmp2 = extractelement <7 x double> %tmp, i32 4
ret double %tmp2
}
instead, producing the integer code. Producing vectors when they
aren't otherwise in the program is dangerous because a lot of other
code treats them carefully and doesn't want to break them down.
OTOH, many things want to break down tasty i448's.
llvm-svn: 63638
With the new world order, it can handle cases where the first
store into the alloca is an element of the vector, instead of
requiring the first analyzed store to have the vector type
itself. This allows us to un-xfail
test/CodeGen/X86/vec_ins_extract.ll.
llvm-svn: 63590
turn icmp eq a+x, b+x into icmp eq a, b if a+x or b+x has other uses. This
may have been increasing register pressure leading to the bzip2 slowdown.
llvm-svn: 63487
improvements to the EvaluateInDifferentType code. This code works
by just inserted a bunch of new code and then seeing if it is
useful. Instcombine is not allowed to do this: it can only insert
new code if it is useful, and only when it is converging to a more
canonical fixed point. Now that we iterate when DCE makes progress,
this causes an infinite loop when the code ends up not being used.
llvm-svn: 63483
simplifydemandedbits to simplify instructions with *multiple
uses* in contexts where it can get away with it. This allows
it to simplify the code in multi-use-or.ll into a single 'add
double'.
This change is particularly interesting because it will cover
up for some common codegen bugs with large integers created due
to the recent SROA patch. When working on fixing those bugs,
this should be disabled.
llvm-svn: 63481
Now, if it detects that "V" is the same as some other value,
SimplifyDemandedBits returns the new value instead of RAUW'ing it immediately.
This has two benefits:
1) simpler code in the recursive SimplifyDemandedBits routine.
2) it allows future fun stuff in instcombine where an operation has multiple
uses and can be simplified in one context, but not all.
#2 isn't implemented yet, this patch should have no functionality change.
llvm-svn: 63479
be able to handle *ANY* alloca that is poked by loads and stores of
bitcasts and GEPs with constant offsets. Before the code had a number
of annoying limitations and caused it to miss cases such as storing into
holes in structs and complex casts (as in bitfield-sroa) where we had
unions of bitfields etc. This also handles a number of important cases
that are exposed due to the ABI lowering stuff we do to pass stuff by
value.
One case that is pretty great is that we compile
2006-11-07-InvalidArrayPromote.ll into:
define i32 @func(<4 x float> %v0, <4 x float> %v1) nounwind {
%tmp10 = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %v1)
%tmp105 = bitcast <4 x i32> %tmp10 to i128
%tmp1056 = zext i128 %tmp105 to i256
%tmp.upgrd.43 = lshr i256 %tmp1056, 96
%tmp.upgrd.44 = trunc i256 %tmp.upgrd.43 to i32
ret i32 %tmp.upgrd.44
}
which turns into:
_func:
subl $28, %esp
cvttps2dq %xmm1, %xmm0
movaps %xmm0, (%esp)
movl 12(%esp), %eax
addl $28, %esp
ret
Which is pretty good code all things considering :).
One effect of this is that SROA will start generating arbitrary bitwidth
integers that are a multiple of 8 bits. In the case above, we got a
256 bit integer, but the codegen guys assure me that it can handle the
simple and/or/shift/zext stuff that we're doing on these operations.
This addresses rdar://6532315
llvm-svn: 63469
There is now a direct way from value-use-iterator to incoming block in PHINode's API.
This way we avoid the iterator->index->iterator trip, and especially the costly
getOperandNo() invocation. Additionally there is now an assertion that the iterator
really refers to one of the PHI's Uses.
llvm-svn: 62869
putc, puts, perror, vscanf and vsscanf from getting annotations.
Add annotations for eight printf functions, memalign, pread and pwrite.
On Linux, llvm-gcc sometimes renames strdup, getc, putc, strtok_r, scanf and
sscanf. Match the alternate function names.
Fix a crash annotating opendir.
Don't mark fsetpos's second parameter as nocapture. It's supposed to be
captured.
Do mark fopen's path and mode strings as nocapture. Mark ferror as readonly,
but not fileno which may set errno.
llvm-svn: 62456
- Looking at the number of sign bits of the a sext instruction to determine whether new trunc + sext pair should be added when its source is being evaluated in a different type.
llvm-svn: 62263
my earlier patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Also, the mechanism for keeping SCEV's corresponding to GEP's
no longer works, as the GEP might change after its SCEV
is remembered, invalidating the SCEV, and we might get a bad
SCEV value when looking up the GEP again for a later loop.
This also couldn't happen before, as we weren't recursing
into GEP's outside the loop.
Also, when we build an expression that involves a (possibly
non-affine) IV from a different loop as well as an IV from
the one we're interested in (containsAddRecFromDifferentLoop),
don't recurse into that. We can't do much with it and will
get in trouble if we try to create new non-affine IVs or something.
More testcases are coming.
llvm-svn: 62212
canonicalization transform based on duncan's comments:
1) improve the comment about %.
2) within our index loop make sure the offset stays
within the *type size*, instead of within the *abi size*.
This allows us to reason explicitly about landing in tail
padding and means that issues like non-zero offsets into
[0 x foo] types don't occur anymore.
llvm-svn: 62045
loads from allocas that cover the entire aggregate. This handles
some memcpy/byval cases that are produced by llvm-gcc. This triggers
a few times in kc++ (with std::pair<std::_Rb_tree_const_iterator
<kc::impl_abstract_phylum*>,bool>) and once in 176.gcc (with %struct..0anon).
llvm-svn: 61915
integer to a (transitive) bitcast the alloca and if that integer
has the full size of the alloca, then it clobbers the whole thing.
Handle this by extracting pieces out of the stored integer and
filing them away in the SROA'd elements.
This triggers fairly frequently because the CFE uses integers to
pass small structs by value and the inliner exposes these. For
example, in kimwitu++, I see a bunch of these with i64 stores to
"%struct.std::pair<std::_Rb_tree_const_iterator<kc::impl_abstract_phylum*>,bool>"
In 176.gcc I see a few i32 stores to "%struct..0anon".
In the testcase, this is a difference between compiling test1 to:
_test1:
subl $12, %esp
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
movl (%esp), %eax
addl 4(%esp), %eax
addl $12, %esp
ret
vs:
_test1:
movl 8(%esp), %eax
addl 4(%esp), %eax
ret
The second half of this will be to handle loads of the same form.
llvm-svn: 61853
Finalization occurs after all the FunctionPasses in the group have run, which
is clearly not what we want.
This also means that we have to make sure that we apply the right param
attributes when creating a new function.
Also, add a missed optimization: strdup and strndup. NoCapture and
NoAlias return!
llvm-svn: 61658
my last patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Also, the mechanism for keeping SCEV's corresponding to GEP's
no longer works, as the GEP might change after its SCEV
is remembered, invalidating the SCEV, and we might get a bad
SCEV value when looking up the GEP again for a later loop.
This also couldn't happen before, as we weren't recursing
into GEP's outside the loop.
I owe some testcases for this, want to get it in for nightly runs.
llvm-svn: 61362
- Use SplitBlockPredecessors to factor out common predecessors of the critical edge destination. This is disabled for now due to some regressions.
llvm-svn: 61248
my last patch to this file.
The issue there was that all uses of an IV inside a loop
are actually references to Base[IV*2], and there was one
use outside that was the same but LSR didn't see the base
or the scaling because it didn't recurse into uses outside
the loop; thus, it used base+IV*scale mode inside the loop
instead of pulling base out of the loop. This was extra bad
because register pressure later forced both base and IV into
memory. Doing that recursion, at least enough
to figure out addressing modes, is a good idea in general;
the change in AddUsersIfInteresting does this. However,
there were side effects....
It is also possible for recursing outside the loop to
introduce another IV where there was only 1 before (if
the refs inside are not scaled and the ref outside is).
I don't think this is a common case, but it's in the testsuite.
It is right to be very aggressive about getting rid of
such introduced IVs (CheckForIVReuse and the handling of
nonzero RewriteFactor in StrengthReduceStridedIVUsers).
In the testcase in question the new IV produced this way
has both a nonconstant stride and a nonzero base, neither
of which was handled before. (This patch does not handle
all the cases where this can happen.) And when inserting
new code that feeds into a PHI, it's right to put such
code at the original location rather than in the PHI's
immediate predecessor(s) when the original location is outside
the loop (a case that couldn't happen before)
(RewriteInstructionToUseNewBase); better to avoid making
multiple copies of it in this case.
Everything above is exercised in
CodeGen/X86/lsr-negative-stride.ll (and ifcvt4 in ARM which is
the same IR).
llvm-svn: 61178
CFG when there is exactly one predecessor where the load is not available.
This is designed to not increase code size but still eliminate partially
redundant loads. This fires 1765 times on 403.gcc even though it doesn't
do critical edge splitting yet (the most common reason for it to fail).
llvm-svn: 61027
cleans up the generated code a bit. This should have the added benefit of
not randomly renaming functions/globals like my previous patch did. :)
llvm-svn: 61023
llvm[2]: Linking Release executable opt (without symbols)
...
Undefined symbols:
"llvm::APFloat::IEEEsingle", referenced from:
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEsingleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
"llvm::APFloat::IEEEdouble", referenced from:
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(Constants.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(AsmWriter.o)
__ZN4llvm7APFloat10IEEEdoubleE$non_lazy_ptr in libLLVMCore.a(ConstantFold.o)
ld: symbol(s) not found
This is in release mode. To replicate, compile llvm and llvm-gcc in optimized
mode. Then build llvm, in optimized mode, with the newly created compiler.
llvm-svn: 60977
of a pointer. This allows is to catch more equivalencies. For example,
the type_lists_compatible_p function used to require two iterations of
the gvn pass (!) to delete its 18 redundant loads because the first pass
would CSE all the addressing computation cruft, which would unblock the
second memdep/gvn passes from recognizing them. This change allows
memdep/gvn to catch all 18 when run just once on the function (as is
typical :) instead of just 3.
On all of 403.gcc, this bumps up the # reundandancies found from:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
to:
63 gvn - Number of instructions PRE'd
154137 gvn - Number of instructions deleted
50185 gvn - Number of loads deleted
+120 loads deleted isn't bad.
llvm-svn: 60799
MemDep::getNonLocalPointerDependency method. There are
some open issues with this (missed optimizations) and
plenty of future work, but this does allow GVN to eliminate
*slightly* more loads (49246 vs 49033).
Switching over now allows simplification of the other code
path in memdep.
llvm-svn: 60780
doesn't do its own local caching, and is slightly more aggressive about
free/store dse (see testcase). This eliminates the last external client
of MemDep::getDependenceFrom().
llvm-svn: 60619
loops when they can be subsumed into addressing modes.
Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)
llvm-svn: 60608
1. Merge the 'None' result into 'Normal', making loads
and stores return their dependencies on allocations as Normal.
2. Split the 'Normal' result into 'Clobber' and 'Def' to
distinguish between the cases when memdep knows the value is
produced from when we just know if may be changed.
3. Move some of the logic for determining whether readonly calls
are CSEs into memdep instead of it being in GVN. This still
leaves verification that the arguments are hte same to GVN to
let it know about value equivalences in different contexts.
4. Change memdep's call/call dependency analysis to use
getModRefInfo(CallSite,CallSite) instead of doing something
very weak. This only really matters for things like DSA, but
someday maybe we'll have some other decent context sensitive
analyses :)
5. This reimplements the guts of memdep to handle the new results.
6. This simplifies GVN significantly:
a) readonly call CSE is slightly simpler
b) I eliminated the "getDependencyFrom" chaining for load
elimination and load CSE doesn't have to worry about
volatile (they are always clobbers) anymore.
c) GVN no longer does any 'lastLoad' caching, leaving it to
memdep.
7. The logic in DSE is simplified a bit and sped up. A potentially
unsafe case was eliminated.
llvm-svn: 60607
This fixes many bugs. I will add more test cases in a separate check-in.
Some day, the code that manipulates CFG and updates dom. info could use refactoring help.
llvm-svn: 60554