Commit Graph

1910 Commits

Author SHA1 Message Date
Chris Lattner c482a9e31a Implement Transforms/InstCombine/bswap.ll, turning common shift/and/or bswap
idioms into bswap intrinsics.

llvm-svn: 28803
2006-06-15 19:07:26 +00:00
Chris Lattner 0c4f5a655a Fix Transforms/LoopUnswitch/2006-06-13-SingleEntryPHI.ll, a loop unswitch
bug exposed by the recent lcssa work.

llvm-svn: 28779
2006-06-14 04:46:17 +00:00
Owen Anderson fd0a3d6e5c Reapply my 6/9 changes. The bug Evan saw no longer occurs.
llvm-svn: 28759
2006-06-12 21:49:21 +00:00
Evan Cheng 1b6e310e6f Back out Owen's 6/9 changes. They broke MultiSource/Benchmarks/Prolangs-C/bison (and perhaps others).
llvm-svn: 28747
2006-06-11 09:32:57 +00:00
Owen Anderson b1dc1d44f8 Add LCSSA as a requirement for LoopUnswitch, and assert that LoopUnswitch preserves
LCSSA.

llvm-svn: 28739
2006-06-09 18:40:32 +00:00
Evan Cheng 398f70292c RewriteExpr, either the new PHI node of induction variable or the
post-increment value, should be first cast to the appropriated type (to the
type of the common expr). Otherwise, the rewrite of a use based on (common +
iv) may end up with an incorrect type.

llvm-svn: 28735
2006-06-09 00:12:42 +00:00
Reid Spencer d4b795902c Fix a spello in a comment.
llvm-svn: 28714
2006-06-07 21:24:10 +00:00
Chris Lattner 95cebb082f Fix a bug in a recent patch. This fixes UnitTests/Vector/Altivec/casts.c on
PPC/altivec

llvm-svn: 28698
2006-06-06 22:26:02 +00:00
Chris Lattner 540886f0ae Remove unneeded hook. Patch by Anton K. Thanks!
llvm-svn: 28664
2006-06-02 19:11:46 +00:00
Chris Lattner f905a7b994 Silence a -pedantic warning.
llvm-svn: 28632
2006-06-01 17:16:21 +00:00
Chris Lattner 1df0e98ac2 Swap the order of operands created here. For +&|^, the order doesn't matter,
but for sub, it really does!  Fix fixes a miscompilation of fibheap_cut in
llvmgcc4.

llvm-svn: 28600
2006-05-31 21:14:00 +00:00
Chris Lattner dab43b2b0e Implement Transforms/InstCombine/store.ll:test2.
llvm-svn: 28503
2006-05-26 19:19:20 +00:00
Chris Lattner 0e47716e69 Transform things like (splat(splat)) -> splat
llvm-svn: 28490
2006-05-26 00:29:06 +00:00
Chris Lattner 12249be286 Introduce a helper function that simplifies interpretation of shuffle masks.
No functionality change.

llvm-svn: 28489
2006-05-25 23:48:38 +00:00
Chris Lattner 99155be33f Turn (cast (shuffle (cast)) -> shuffle (cast) if it reduces the # casts in
the program.  This exposes more opportunities for the instcombiner, and implements
vec_shuffle.ll:test6

llvm-svn: 28487
2006-05-25 23:24:33 +00:00
Chris Lattner 83f6578b0c extract element from a shuffle vector can be trivially turned into an
extractelement from the SV's source.  This implement vec_shuffle.ll:test[45]

llvm-svn: 28485
2006-05-25 22:53:38 +00:00
Chris Lattner d0622b6894 Silence a bogus gcc warning
llvm-svn: 28422
2006-05-20 23:14:03 +00:00
Chris Lattner e4cb4768fa Declare that lowerinvoke doesn't interact with other lowering passes.
Patch written by Domagoj Babic!

llvm-svn: 28367
2006-05-17 21:05:27 +00:00
Evan Cheng 18d0438148 Backing out last check-in for now. It's causing an infinite loop gccas lencode.
llvm-svn: 28284
2006-05-14 06:46:03 +00:00
Chris Lattner 3987a8532d Add/Sub/Mul are safe to promote here as well. Incrementing a single-bit
bitfield now gives this code:

_plus:
        lwz r2, 0(r3)
        rlwimi r2, r2, 0, 1, 31
        xoris r2, r2, 32768
        stw r2, 0(r3)
        blr

instead of this:

_plus:
        lwz r2, 0(r3)
        srwi r4, r2, 31
        slwi r4, r4, 31
        addis r4, r4, -32768
        rlwimi r2, r4, 0, 0, 0
        stw r2, 0(r3)
        blr

this can obviously still be improved.

llvm-svn: 28275
2006-05-13 02:16:08 +00:00
Chris Lattner 1ebbe6a22e Implement simple promotion for cast elimination in instcombine. This is
currently very limited, but can be extended in the future.  For example,
we now compile:

uint %test30(uint %c1) {
        %c2 = cast uint %c1 to ubyte
        %c3 = xor ubyte %c2, 1
        %c4 = cast ubyte %c3 to uint
        ret uint %c4
}

to:

_xor:
        movzbl 4(%esp), %eax
        xorl $1, %eax
        ret

instead of:

_xor:
        movb $1, %al
        xorb 4(%esp), %al
        movzbl %al, %eax
        ret

More impressively, we now compile:

struct B { unsigned bit : 1; };
void xor(struct B *b) { b->bit = b->bit ^ 1; }

To (X86/PPC):

_xor:
        movl 4(%esp), %eax
        xorl $-2147483648, (%eax)
        ret
_xor:
        lwz r2, 0(r3)
        xoris r2, r2, 32768
        stw r2, 0(r3)
        blr

instead of (X86/PPC):

_xor:
        movl 4(%esp), %eax
        movl (%eax), %ecx
        movl %ecx, %edx
        shrl $31, %edx
        # TRUNCATE movb %dl, %dl
        xorb $1, %dl
        movzbl %dl, %edx
        andl $2147483647, %ecx
        shll $31, %edx
        orl %ecx, %edx
        movl %edx, (%eax)
        ret

_xor:
        lwz r2, 0(r3)
        srwi r4, r2, 31
        xori r4, r4, 1
        rlwimi r2, r4, 31, 0, 0
        stw r2, 0(r3)
        blr

This implements InstCombine/cast.ll:test30.

llvm-svn: 28273
2006-05-13 02:06:03 +00:00
Chris Lattner 1443bc52be Refactor some code, making it simpler.
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.

This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.

llvm-svn: 28224
2006-05-11 17:11:52 +00:00
Chris Lattner a36ee4ea34 Two changes:
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
   blocks (due to constants in conditional branches/switches) to the worklist.
   This causes them to be deleted before instcombine starts up, leading to
   better optimization.

2. In the prepass over instructions, do trivial constprop/dce as we go.  This
   has the effect of improving the effectiveness of #1.  In addition, it
   *significantly* speeds up instcombine on test cases with large amounts of
   constant folding code (for example, that produced by code specialization
   or partial evaluation).  In one example, it speeds up instcombine from
   0.0589s to 0.0224s with a release build (a 2.6x speedup).

llvm-svn: 28215
2006-05-10 19:00:36 +00:00
Chris Lattner 4fe87d67c4 Patch to make some xforms preserve each other. Patch contributed by
Domagoj Babic!

llvm-svn: 28181
2006-05-09 04:13:41 +00:00
Chris Lattner 1d441adfbf Move some code around.
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated.  If one or
both doesn't, then this xform doesn't remove a cast.

This fixes Transforms/InstCombine/2006-05-06-Infloop.ll

llvm-svn: 28141
2006-05-06 09:00:16 +00:00
Chris Lattner e745c7de0e Fix an infinite loop compiling oggenc last night.
llvm-svn: 28128
2006-05-05 20:51:30 +00:00
Chris Lattner 3af1053488 Implement InstCombine/cast.ll:test29
llvm-svn: 28126
2006-05-05 06:39:07 +00:00
Chris Lattner fb29692055 Fix Transforms/InstCombine/2006-05-04-DemandedBitCrash.ll
llvm-svn: 28101
2006-05-04 17:33:35 +00:00
Chris Lattner 2d3a02725d Add pass ID's for various passes, so they can be AddRequiredID. Patch by
Domagoj Babic!

llvm-svn: 28048
2006-05-02 04:24:36 +00:00
Chris Lattner 655d08fda8 Fix InstCombine/2006-04-28-ShiftShiftLongLong.ll
llvm-svn: 28019
2006-04-28 22:21:41 +00:00
Chris Lattner e63d808b6e Fix Transforms/Reassociate/2006-04-27-ReassociateVector.ll
llvm-svn: 28007
2006-04-28 04:14:49 +00:00
Chris Lattner b6cb64b7e6 Add support for inserting undef into a vector. This implements
Transforms/InstCombine/vec_insert_to_shuffle.ll

llvm-svn: 27997
2006-04-27 21:14:21 +00:00
Chris Lattner dae49df407 Fix Transforms/ScalarRepl/2006-04-20-PromoteCrash.ll
llvm-svn: 27912
2006-04-20 20:48:50 +00:00
Andrew Lenharth f89e630b2f Make code match cvs commit message :)
llvm-svn: 27881
2006-04-20 15:41:37 +00:00
Andrew Lenharth 61eae29ad6 If we can convert the return pointer type into an integer that IntPtrType
can be converted to losslessly, we can continue the conversion to a direct call.

llvm-svn: 27880
2006-04-20 14:56:47 +00:00
Chris Lattner 36dd7c98d1 Turn x86 unaligned load/store intrinsics into aligned load/store instructions
if the pointer is known aligned.

llvm-svn: 27781
2006-04-17 22:26:56 +00:00
Chris Lattner 9095186deb Fix a bug in the 'shuffle(undef,x,mask) -> shuffle(x, undef,mask')' xform
Make the insert/extract elt -> shuffle code more aggressive.

This fixes CodeGen/PowerPC/vec_shuffle.ll

llvm-svn: 27728
2006-04-16 00:51:47 +00:00
Chris Lattner 34cebe785d Canonicalize shuffle(undef,x,mask) -> shuffle(x, undef,mask').
llvm-svn: 27727
2006-04-16 00:03:56 +00:00
Chris Lattner 39fac448d6 significant cleanups to code that uses insert/extractelt heavily. This builds
maximal shuffles out of them where possible.

llvm-svn: 27717
2006-04-15 01:39:45 +00:00
Chris Lattner 3323ce165d Teach scalarrepl to promote unions of vectors and floats, producing
insert/extractelement operations.  This implements
Transforms/ScalarRepl/vector_promote.ll

llvm-svn: 27710
2006-04-14 21:42:41 +00:00
Reid Spencer 13a1a7a4a6 Get rid of a signed/unsigned compare warning.
llvm-svn: 27625
2006-04-12 19:28:15 +00:00
Chris Lattner b19a5c661b Turn casts into getelementptr's when possible. This enables SROA to be more
aggressive in some cases where LLVMGCC 4 is inserting casts for no reason.

This implements InstCombine/cast.ll:test27/28.

llvm-svn: 27620
2006-04-12 18:09:35 +00:00
Chris Lattner 2d37f920ad Implement vec_shuffle.ll:test3
llvm-svn: 27573
2006-04-10 23:06:36 +00:00
Chris Lattner fbb77a408b Implement InstCombine/vec_shuffle.ll:test[12]
llvm-svn: 27571
2006-04-10 22:45:52 +00:00
Chris Lattner 17bd60588c Add supprot for shufflevector
llvm-svn: 27513
2006-04-08 01:19:12 +00:00
Chris Lattner e79d249c29 Lower vperm(x,y, mask) -> shuffle(x,y,mask) if mask is constant. This allows
us to compile oh-so-realistic stuff like this:

 vec_vperm(A, B, (vector unsigned char){14});

to:
        vspltb v0, v0, 14

instead of:

        vspltisb v0, 14
        vperm v0, v2, v1, v0

llvm-svn: 27452
2006-04-06 19:19:17 +00:00
Chris Lattner caba72b6ff vector casts of casts are eliminable. Transform this:
%tmp = cast <4 x uint> %tmp to <4 x int>                ; <<4 x int>> [#uses=1]
        %tmp = cast <4 x int> %tmp to <4 x float>               ; <<4 x float>> [#uses=1]

into:

        %tmp = cast <4 x uint> %tmp to <4 x float>              ; <<4 x float>> [#uses=1]

llvm-svn: 27355
2006-04-02 05:43:13 +00:00
Chris Lattner ebca476b27 Allow transforming this:
%tmp = cast <4 x uint>* %testData to <4 x int>*         ; <<4 x int>*> [#uses=1]
        %tmp = load <4 x int>* %tmp             ; <<4 x int>> [#uses=1]

to this:

        %tmp = load <4 x uint>* %testData               ; <<4 x uint>> [#uses=1]
        %tmp = cast <4 x uint> %tmp to <4 x int>                ; <<4 x int>> [#uses=1]

llvm-svn: 27353
2006-04-02 05:37:12 +00:00
Chris Lattner f42d0aeda1 Turn altivec lvx/stvx intrinsics into loads and stores. This allows the
elimination of one load from this:

int AreSecondAndThirdElementsBothNegative( vector float *in ) {
#define QNaN 0x7FC00000
const vector unsigned int testData = (vector unsigned int)( QNaN, 0, 0, QNaN );
vector float test = vec_ld( 0, (float*) &testData );
return ! vec_any_ge( test, *in );
}

Now generating:

_AreSecondAndThirdElementsBothNegative:
        mfspr r2, 256
        oris r4, r2, 49152
        mtspr 256, r4
        li r4, lo16(LCPI1_0)
        lis r5, ha16(LCPI1_0)
        addi r6, r1, -16
        lvx v0, r5, r4
        stvx v0, 0, r6
        lvx v1, 0, r3
        vcmpgefp. v0, v0, v1
        mfcr r3, 2
        rlwinm r3, r3, 27, 31, 31
        xori r3, r3, 1
        cntlzw r3, r3
        srwi r3, r3, 5
        mtspr 256, r2
        blr

llvm-svn: 27352
2006-04-02 05:30:25 +00:00
Chris Lattner 6cf4914fd4 Fix InstCombine/2006-04-01-InfLoop.ll
llvm-svn: 27330
2006-04-01 22:05:01 +00:00
Chris Lattner dcd0792622 Fold A^(B&A) -> (B&A)^A
Fold (B&A)^A == ~B & A

This implements InstCombine/xor.ll:test2[56]

llvm-svn: 27328
2006-04-01 08:03:55 +00:00
Chris Lattner 8d1d8d364c If we can look through vector operations to find the scalar version of an
extract_element'd value, do so.

llvm-svn: 27323
2006-03-31 23:01:56 +00:00
Chris Lattner 92346c315e extractelement(undef,x) -> undef
llvm-svn: 27300
2006-03-31 18:25:14 +00:00
Chris Lattner 612fa8e6f3 Fix Transforms/InstCombine/2006-03-30-ExtractElement.ll
llvm-svn: 27261
2006-03-30 22:02:40 +00:00
Chris Lattner d70d9f5b24 Don't crash on packed logical ops
llvm-svn: 27125
2006-03-25 21:58:26 +00:00
Chris Lattner f365f5f0c1 Fix spello
llvm-svn: 27052
2006-03-24 07:14:34 +00:00
Chris Lattner 5821a6a17a add the actual cost to the debug info
llvm-svn: 27051
2006-03-24 07:14:00 +00:00
Jim Laskey 83f99115db Can't combine anymore - we don't have a chain through llvm.dbg intrinsics.
llvm-svn: 26992
2006-03-23 18:10:42 +00:00
Chris Lattner 7d80b4f366 silence a bogus gcc warning
llvm-svn: 26953
2006-03-22 17:27:24 +00:00
Chris Lattner d783c76c18 Teach cee to propagate through switch statements. This implements
Transforms/CorrelatedExprs/switch.ll

Patch contributed by Eric Kidd!

llvm-svn: 26872
2006-03-19 19:37:24 +00:00
Evan Cheng c28282bd87 - Fixed a bogus if condition.
- Added more debugging info.
- Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride.

llvm-svn: 26841
2006-03-18 08:03:12 +00:00
Evan Cheng f09f0ebd48 Sort StrideOrder so we can process the smallest strides first. This allows
for more IV reuses.

llvm-svn: 26837
2006-03-18 00:44:49 +00:00
Evan Cheng 4520698820 Allow users of iv / stride to be rewritten with expression that is a multiply
of a smaller stride even if they have a common loop invariant expression part.

llvm-svn: 26828
2006-03-17 19:52:23 +00:00
Evan Cheng 3df447d354 For each loop, keep track of all the IV expressions inserted indexed by
stride. For a set of uses of the IV of a stride which is a multiple
of another stride, do not insert a new IV expression. Rather, reuse the
previous IV and rewrite the uses as uses of IV expression multiplied by
the factor.

e.g.
x = 0 ...; x ++
y = 0 ...; y += 4
then use of y can be rewritten as use of 4*x for x86.

llvm-svn: 26803
2006-03-16 21:53:05 +00:00
Chris Lattner c5f866bb4a Implement a FIXME, recusively reassociating
A*A*B + A*A*C   -->   A*(A*B+A*C)   -->   A*(A*(B+C))

This implements Reassociate/mul-factor3.ll

llvm-svn: 26757
2006-03-14 16:04:29 +00:00
Chris Lattner 2fc319d444 extract some code into a method, no functionality change
llvm-svn: 26755
2006-03-14 07:11:11 +00:00
Chris Lattner d6bde46d85 Promote shifts by a constant to multiplies so that we can reassociate
(x<<1)+(y<<1) -> (X+Y)<<1.  This implements
Transforms/Reassociate/shift-factor.ll

llvm-svn: 26753
2006-03-14 06:55:18 +00:00
Evan Cheng c567c4efbb Added target lowering hooks which LSR consults to make more intelligent
transformation decisions.

llvm-svn: 26738
2006-03-13 23:14:23 +00:00
Chris Lattner fc34f8bb48 Fix a miscompilation of 188.ammp with the new CFE. 188.ammp is accessing
arrays out of range in a horrible way, but we shouldn't break it anyway.
Details in the comments.

llvm-svn: 26606
2006-03-08 01:05:29 +00:00
Chris Lattner 53ef5a032c Teach the alignment handling code to look through constant expr casts and GEPs
llvm-svn: 26580
2006-03-07 01:28:57 +00:00
Chris Lattner 82f2ef20b6 Teach instcombine to increase the alignment of memset/memcpy/memmove when
the pointer is known to come from either a global variable, alloca or
malloc.  This allows us to compile this:

  P = malloc(28);
  memset(P, 0, 28);

into explicit stores on PPC instead of a memset call.

llvm-svn: 26577
2006-03-06 20:18:44 +00:00
Chris Lattner 6bc98653c2 Make vector narrowing more effective, implementing
Transforms/InstCombine/vec_narrow.ll.  This add support for narrowing
extract_element(insertelement) also.

llvm-svn: 26538
2006-03-05 00:22:33 +00:00
Chris Lattner 4c065091d8 Add factoring of multiplications, e.g. turning A*A+A*B into A*(A+B).
Testcase here: Transforms/Reassociate/mulfactor.ll

llvm-svn: 26524
2006-03-04 09:31:13 +00:00
Chris Lattner 32c01df299 Canonicalize (X+C1)*C2 -> X*C2+C1*C2
This implements Transforms/InstCombine/add.ll:test31

llvm-svn: 26519
2006-03-04 06:04:02 +00:00
Chris Lattner 681ef2f083 Change this to work with renamed intrinsics.
llvm-svn: 26484
2006-03-03 01:34:17 +00:00
Chris Lattner 85dda9a2bd Generalize the REM folding code to handle another case Nick Lewycky
pointed out: realize the AND can provide factors and look through Casts.

llvm-svn: 26469
2006-03-02 06:50:58 +00:00
Chris Lattner c5b6c9a12a Fix a regression in a patch from a couple of days ago. This fixes
Transforms/InstCombine/2006-02-28-Crash.ll

llvm-svn: 26427
2006-02-28 19:47:20 +00:00
Chris Lattner b70f141893 Implement rem.ll:test[7-9] and PR712
llvm-svn: 26415
2006-02-28 05:49:21 +00:00
Chris Lattner 2a7c7b8bab Simplify some code now that the RHS of a rem can't be 0
llvm-svn: 26413
2006-02-28 05:40:55 +00:00
Chris Lattner 0de4a8d7b7 Rearrange some code, fold "rem X, 0", implementing rem.ll:test6
llvm-svn: 26411
2006-02-28 05:30:45 +00:00
Chris Lattner c7bfed0f7b Merge two almost-identical pieces of code.
Make this code more powerful by using ComputeMaskedBits instead of looking
for an AND operand.  This lets us fold this:

int %test23(int %a) {
        %tmp.1 = and int %a, 1
        %tmp.2 = seteq int %tmp.1, 0
        %tmp.3 = cast bool %tmp.2 to int  ;; xor tmp1, 1
        ret int %tmp.3
}

into: xor (and a, 1), 1
llvm-svn: 26396
2006-02-27 02:38:23 +00:00
Chris Lattner f5c8a0b83f Fold (A^B) == A -> B == 0
and  (A-B) == A  ->  B == 0

llvm-svn: 26394
2006-02-27 01:44:11 +00:00
Chris Lattner f78df7c14d Fold (X|C1)^C2 -> X^(C1|C2) when possible. This implements
InstCombine/or.ll:test23.

llvm-svn: 26385
2006-02-26 19:57:54 +00:00
Chris Lattner b580d26e7d Fix a problem that Nate noticed that boils down to an over conservative check
in the code that does "select C, (X+Y), (X-Y) --> (X+(select C, Y, (-Y)))".
We now compile this loop:

LBB1_1: ; no_exit
        add r6, r2, r3
        subf r3, r2, r3
        cmpwi cr0, r2, 0
        addi r7, r5, 4
        lwz r2, 0(r5)
        addi r4, r4, 1
        blt cr0, LBB1_4 ; no_exit
LBB1_3: ; no_exit
        mr r3, r6
LBB1_4: ; no_exit
        cmpwi cr0, r4, 16
        mr r5, r7
        bne cr0, LBB1_1 ; no_exit

into this instead:

LBB1_1: ; no_exit
        srawi r6, r2, 31
        add r2, r2, r6
        xor r6, r2, r6
        addi r7, r5, 4
        lwz r2, 0(r5)
        addi r4, r4, 1
        add r3, r3, r6
        cmpwi cr0, r4, 16
        mr r5, r7
        bne cr0, LBB1_1 ; no_exit

llvm-svn: 26356
2006-02-24 18:05:58 +00:00
Chris Lattner e5521db5bc Fix Regression/Transforms/LoopUnswitch/2006-02-22-UnswitchCrash.ll, which
caused SPASS to fail building last night.

We can't trivially unswitch a loop if the exit block has phi nodes in it,
because we don't know which predecessor to use.

llvm-svn: 26320
2006-02-22 23:55:00 +00:00
Chris Lattner 8a5a324dac Add some comments, simplify some code, and fix a bug that caused rewriting
to rewrite with the wrong value.

llvm-svn: 26311
2006-02-22 06:37:14 +00:00
Chris Lattner c2e3a7a4ce improved support for branch folding, still not enabled.
llvm-svn: 26289
2006-02-18 07:57:38 +00:00
Jeff Cohen 0add83e969 Fix bugs identified by VC++.
llvm-svn: 26287
2006-02-18 03:20:33 +00:00
Chris Lattner 19fa8ac938 Implement deletion of dead blocks, currently disabled.
llvm-svn: 26285
2006-02-18 02:42:34 +00:00
Chris Lattner cb853de534 a previous patch completely disabled trivial unswitching, this fixees it.
Thanks to nate for pointing this out :)

llvm-svn: 26280
2006-02-18 01:32:04 +00:00
Chris Lattner 29f771ba21 initial trivial support for folding branches that have now-constant destinations.
llvm-svn: 26279
2006-02-18 01:27:45 +00:00
Chris Lattner 8e44ff50b0 When unswitching a loop, make sure to update loop info with exit blocks in
the right loop.

llvm-svn: 26277
2006-02-18 00:55:32 +00:00
Chris Lattner baddba41c7 Fix loops where the header has an exit, fixing a loop-unswitch crash on crafty
llvm-svn: 26258
2006-02-17 06:39:56 +00:00
Chris Lattner 6fd136239b start of some new simplification code, not thoroughly tested, use at your own
risk :)

llvm-svn: 26248
2006-02-17 00:31:07 +00:00
Nate Begeman 8a77efe4f7 Rework the SelectionDAG-based implementations of SimplifyDemandedBits
and ComputeMaskedBits to match the new improved versions in instcombine.
Tested against all of multisource/benchmarks on ppc.

llvm-svn: 26238
2006-02-16 21:11:51 +00:00
Chris Lattner fa335f6083 Change SplitBlock to increment a BasicBlock::iterator, not an Instruction*. Apparently they do different things :)
This fixes a testcase that nate reduced from spass.

Also included are a couple minor code changes that don't affect the generated
code at all.

llvm-svn: 26235
2006-02-16 19:36:22 +00:00
Jeff Cohen 55f63f1b53 Fix VC++ warning.
llvm-svn: 26228
2006-02-16 04:07:37 +00:00
Chris Lattner ff42e81028 fix a bug where we unswitched the wrong way
llvm-svn: 26225
2006-02-16 01:24:41 +00:00
Chris Lattner fdff0bb43e Implement trivial unswitching for switch stmts. This allows us to trivial
unswitch this loop on 2 before sweating to unswitch on 1/3.

void test4(int N, int i, int C, int*P, int*Q) {
  int j;
  for (j = 0; j < N; ++j) {
    switch (C) {                // general unswitching.
    default: P[i+j] = 0; break;
    case 1: Q[i+j] = 0; break;
    case 3: P[i+j] = Q[i+j]; break;
    case 2: break;              //  TRIVIAL UNSWITCH on C==2
    }
  }
}

llvm-svn: 26223
2006-02-15 22:52:05 +00:00
Chris Lattner e5cb76d744 make "trivial" unswitching significantly more general. It can now handle
this for example:

  for (j = 0; j < N; ++j) {     // trivial unswitch
    if (C)
      P[i+j] = 0;
  }

turning it into the obvious code without bothering to duplicate an empty loop.

llvm-svn: 26220
2006-02-15 22:03:36 +00:00
Chris Lattner 65152d80ec Checking the wrong value. This caused us to emit silly code like
Y = seteq bool X, true
instead of just using X :)

llvm-svn: 26215
2006-02-15 19:05:52 +00:00
Chris Lattner 01db04efb0 more refactoring, no functionality change.
llvm-svn: 26194
2006-02-15 01:44:42 +00:00
Chris Lattner b0cbe7106e pull some code out into a function
llvm-svn: 26191
2006-02-15 00:07:43 +00:00
Chris Lattner 0b8ec1a132 Use statistics to keep track of what flavors of loops we are unswitching
llvm-svn: 26157
2006-02-14 01:01:41 +00:00
Chris Lattner 8b10ab3002 Implement Instcombine/and.ll:test34
llvm-svn: 26155
2006-02-13 23:07:23 +00:00
Chris Lattner 7d8522884b If any of the sign extended bits are demanded, the input sign bit is demanded
for a sign extension.

This fixes InstCombine/2006-02-13-DemandedMiscompile.ll and Ptrdist/bc.

llvm-svn: 26152
2006-02-13 22:41:07 +00:00
Chris Lattner 68e7475777 Be careful not to request or look at bits shifted in from outside the size
of the input.  This fixes the mediabench/gsm/toast failure last night.

llvm-svn: 26138
2006-02-13 06:09:08 +00:00
Chris Lattner f5b4ef7f58 remove some more dead special case code
llvm-svn: 26135
2006-02-12 08:07:37 +00:00
Chris Lattner 5b2edb1fca Eliminate special case hacks that are superceded by general purpose hacks
llvm-svn: 26134
2006-02-12 08:02:11 +00:00
Chris Lattner ee0f280743 Three changes:
1. Teach GetConstantInType to handle boolean constants.
2. Teach instcombine to fold (compare X, CST) when X has known 0/1 bits.
   Testcase here: set.ll:test22
3. Improve the "(X >> c1) & C2 == 0" folding code to allow a noop cast
   between the shift and and.  More aggressive bitfolding for other reasons
   was turning signed shr's into unsigned shr's, leaving the noop cast in
   the way.

llvm-svn: 26131
2006-02-12 02:07:56 +00:00
Chris Lattner 0157e7f55b Port the recent innovations in ComputeMaskedBits to SimplifyDemandedBits.
This allows us to simplify on conditions where bits are not known, but they
are not demanded either!  This also fixes a couple of bugs in
ComputeMaskedBits that were exposed during this work.

In the future, swaths of instcombine should be removed, as this code
subsumes a bunch of ad-hockery.

llvm-svn: 26122
2006-02-11 09:31:47 +00:00
Chris Lattner fbadd7e1ee implement unswitching of loops with switch stmts and selects in them
llvm-svn: 26114
2006-02-11 00:43:37 +00:00
Chris Lattner f1b151684d Update PHI nodes in successors of exit blocks.
llvm-svn: 26113
2006-02-10 23:26:14 +00:00
Chris Lattner fe4151efe7 Reform the unswitching code in terms of edge splitting, not block splitting.
llvm-svn: 26112
2006-02-10 23:16:39 +00:00
Chris Lattner ec6b40a093 Fix a case where UnswitchTrivialCondition broke critical edges with
phi's in the successors

llvm-svn: 26108
2006-02-10 19:08:15 +00:00
Chris Lattner 6e263155a6 add some notes, move some code around. Implement unswitching of loops
with branches on partially invariant computations.

llvm-svn: 26104
2006-02-10 02:30:37 +00:00
Chris Lattner 4935417a84 Move code around to be more logical, no functionality change.
llvm-svn: 26103
2006-02-10 02:01:22 +00:00
Chris Lattner 3fc3148b85 When unswitching a trivial loop, do admit we are doing it! :)
llvm-svn: 26102
2006-02-10 01:36:35 +00:00
Chris Lattner ed7a67b0de Implement unconditional unswitching of 'trivial' loops, those loops that contain
branches in their entry block that control whether or not the loop is a noop or not.

llvm-svn: 26101
2006-02-10 01:24:09 +00:00
Chris Lattner 4f0e66df6a Simplify control flow a bit, note that unswitch preserves canonical loop form
llvm-svn: 26098
2006-02-09 22:15:42 +00:00
Chris Lattner 8976219850 Make the threshold a parameter
llvm-svn: 26093
2006-02-09 20:15:48 +00:00
Chris Lattner 2826e0511b Simplify the loop-unswitch pass, by not even trying to unswitch loops with
uses of loop values outside the loop.  We need loop-closed SSA form to do
this right, or to use SSA rewriting if we really care.

llvm-svn: 26089
2006-02-09 19:14:52 +00:00
Chris Lattner 24cd2fa269 Fix 80-column violations
llvm-svn: 26088
2006-02-09 07:41:14 +00:00
Chris Lattner 4534dd59a3 Enhance MVIZ in three ways:
1. Teach it new tricks: in particular how to propagate through signed shr and sexts.
2. Teach it to return a bitset of known-1 and known-0 bits, instead of just zero.
3. Teach instcombine (AND X, C) to fold when we know all C bits of X.

This implements Regression/Transforms/InstCombine/bittest.ll, and allows
future things to be simplified.

llvm-svn: 26087
2006-02-09 07:38:58 +00:00
Chris Lattner ab2dc4d70d Simplify some code, reducing calls to MaskedValueIsZero. Implement a minor
optimization where we reduce the number of bits in AND masks when possible.

llvm-svn: 26056
2006-02-08 07:34:50 +00:00
Chris Lattner 5997cf9381 Use EraseInstFromFunction in a few cases to put the uses of the removed
instruction onto the worklist (in case they are now dead).

Add a really trivial local DSE implementation to help out bitfield code.
We now fold this:

struct S {
    unsigned char a : 1, b : 1, c : 1, d : 2, e : 3;
    S();
};

S::S() : a(0), b(0), c(1), d(0), e(6) {}

to this:

void %_ZN1SC1Ev(%struct.S* %this) {
entry:
        %tmp.1 = getelementptr %struct.S* %this, int 0, uint 0
        store ubyte 38, ubyte* %tmp.1
        ret void
}

much earlier (in gccas instead of only in gccld after DSE runs).

llvm-svn: 26050
2006-02-08 03:25:32 +00:00
Chris Lattner 06a0ed1ee0 Implement some more interesting select sccp cases. This implements:
test/Regression/Transforms/SCCP/select.ll

llvm-svn: 26049
2006-02-08 02:38:11 +00:00
Chris Lattner ddba3289b5 Fix a problem in my patch yesterday, causing a miscompilation of 176.gcc
llvm-svn: 26045
2006-02-08 01:20:23 +00:00
Chris Lattner 44314827d6 Fix Transforms/InstCombine/2006-02-07-SextZextCrash.ll
llvm-svn: 26040
2006-02-07 19:07:40 +00:00
Chris Lattner 92a6865321 Generalize MaskedValueIsZero into a ComputeMaskedNonZeroBits function, which
is just as efficient as MVIZ and is also more general.

Fix a few minor bugs introduced in recent patches

llvm-svn: 26036
2006-02-07 08:05:22 +00:00
Chris Lattner c3ebf40031 Make MaskedValueIsZero take a uint64_t instead of a ConstantIntegral as a
mask.  This allows the code to be simpler and more efficient.

Also, generalize some of the cases in MVIZ a bit, making it slightly more aggressive.

llvm-svn: 26035
2006-02-07 07:27:52 +00:00
Chris Lattner 77defbae0a Use Type::getIntegralTypeMask() to simplify some code
llvm-svn: 26034
2006-02-07 07:00:41 +00:00
Chris Lattner 2590e511d8 Implement the beginnings of a facility for simplifying expressions based on
'demanded bits', inspired by Nate's work in the dag combiner.  This isn't
complete, but needs to unrelated instcombiner changes to continue.

llvm-svn: 26033
2006-02-07 06:56:34 +00:00
Chris Lattner 2e90b732fa Turn A % (C << N), where C is 2^k, into A & ((C << N)-1) [urem only].
Turn A / (C1 << N), where C1 is "1<<C2" into A >> (N+C2) [udiv only].

Tested with: rem.ll:test5, div.ll:test10

llvm-svn: 26003
2006-02-05 07:54:04 +00:00
Chris Lattner d30c4991a1 Use SCEVExpander::InsertCastOfTo instead of our own code. This reduces
#LLVM LOC, and auto-cse's cast instructions.

llvm-svn: 25974
2006-02-04 09:52:43 +00:00
Chris Lattner 2959f0003e Fix two significant bugs in LSR:
1. When rewriting code in outer loops, sometimes we would insert code into
   inner loops that is invariant in that loop.
2. Notice that 4*(2+x) is 8+4*x and use that to simplify expressions.

This is a performance neutral change.

llvm-svn: 25964
2006-02-04 07:36:50 +00:00
Jeff Cohen 15a8c15a1f Improve compatibility with VC2005, patch by Morten Ofstad!
llvm-svn: 25661
2006-01-26 20:41:32 +00:00
Chris Lattner c0f633a598 Fix Regression/Transforms/ScalarRepl/2006-01-24-IllegalUnionPromoteCrash.ll
llvm-svn: 25587
2006-01-24 19:36:27 +00:00
Chris Lattner c597b8a55e Make iostream #inclusion explicit
llvm-svn: 25514
2006-01-22 23:32:06 +00:00
Chris Lattner e154abf9b3 Implement casts.ll:test26: a cast from float -> double -> integer, doesn't
need the float->double part.

llvm-svn: 25452
2006-01-19 07:40:22 +00:00
Robert Bocchino 6dce25019d Lowerpacked and SCCP support for the insertelement operation.
llvm-svn: 25406
2006-01-17 20:06:55 +00:00
Chris Lattner 307b7ea15f fix a crash due to missing parens
llvm-svn: 25363
2006-01-16 19:47:21 +00:00
Chris Lattner 0de2c7d3d8 This pass has never worked correctly. Remove.
llvm-svn: 25349
2006-01-16 01:06:00 +00:00
Chris Lattner ef530c24c1 FunctionPass's cannot do IPO things.
llvm-svn: 25315
2006-01-14 19:30:35 +00:00
Robert Bocchino a83529678e Added instcombine support for extractelement.
llvm-svn: 25299
2006-01-13 22:48:06 +00:00
Chris Lattner 503221f5c5 Do a simple instcombine xforms to delete llvm.stackrestore cases.
llvm-svn: 25294
2006-01-13 21:28:09 +00:00
Chris Lattner c66b223b28 Simplify this a tiny bit by using the new IntrinsicInst functionality.
llvm-svn: 25292
2006-01-13 20:11:04 +00:00
Chris Lattner cb36710ff9 Switch these to using ETForest instead of DominatorSet to compute itself.
Patch written by Daniel Berlin!

llvm-svn: 25202
2006-01-11 05:10:20 +00:00
Chris Lattner 48e4a2ebd8 Switch this to using ETForest instead of DominatorSet to compute itself.
Patch written by Daniel Berlin!

llvm-svn: 25201
2006-01-11 05:09:40 +00:00
Robert Bocchino bd518d153b Added lower packed support for the extractelement operation.
llvm-svn: 25180
2006-01-10 19:05:05 +00:00
Chris Lattner 9cbfbc21bb fix some 176.gcc miscompilation from my previous patch.
llvm-svn: 25137
2006-01-07 01:32:28 +00:00
Chris Lattner 330628a6d8 silence some bogus gcc warnings on fenris
llvm-svn: 25130
2006-01-06 17:59:59 +00:00
Chris Lattner eb372a0276 Enhance the shift-shift folding code to allow a no-op cast to occur in between
the shifts.

This allows us to fold this (which is the 'integer add a constant' sequence
from cozmic's scheme compmiler):

int %x(uint %anf-temporary776) {
        %anf-temporary777 = shr uint %anf-temporary776, ubyte 1
        %anf-temporary800 = cast uint %anf-temporary777 to int
        %anf-temporary804 = shl int %anf-temporary800, ubyte 1
        %anf-temporary805 = add int %anf-temporary804, -2
        %anf-temporary806 = or int %anf-temporary805, 1
        ret int %anf-temporary806
}

into this:

int %x(uint %anf-temporary776) {
        %anf-temporary776 = cast uint %anf-temporary776 to int
        %anf-temporary776.mask1 = add int %anf-temporary776, -2
        %anf-temporary805 = or int %anf-temporary776.mask1, 1
        ret int %anf-temporary805
}

note that instcombine already knew how to eliminate the AND that the two
shifts fold into.  This is tested by InstCombine/shift.ll:test26

-Chris

llvm-svn: 25128
2006-01-06 07:52:12 +00:00
Chris Lattner b330939d90 Simplify the code a bit more
llvm-svn: 25126
2006-01-06 07:22:22 +00:00
Chris Lattner 145539343f Extract a bunch of code out of visitShiftInst into FoldShiftByConstant. No
functionality changes.

llvm-svn: 25125
2006-01-06 07:12:35 +00:00
Duraid Madina 7a3ad6cae2 getting there...
llvm-svn: 25021
2005-12-26 13:48:44 +00:00
Chris Lattner 8c9e14620f Fix Transforms/ScalarRepl/2005-12-14-UnionPromoteCrash.ll, a crash on undefined
behavior in 126.gcc on big-endian systems.

llvm-svn: 24708
2005-12-14 17:23:59 +00:00
Chris Lattner 3b0a62d8a5 Implement a little hack for parity with GCC on crafty. This speeds up
186.crafty by about 16% (from 15.109s to 13.045s) on my system.

This turns allocas with unions/casts into scalars.  For example crafty has
something like this:

    union doub {
      unsigned short i[4];
      long long d;
    };
int f(long long a) {
  return ((union doub){.d=a}).i[1];
}

Instead of generating loads and stores to an alloca, we now promote the
whole thing to a scalar long value.

This implements: Transforms/ScalarRepl/AggregatePromote.ll

llvm-svn: 24667
2005-12-12 07:19:13 +00:00
Chris Lattner 077200737c getRawValue zero extens for unsigned values, use getsextvalue so that we
know that small negative values fit into the immediate field of addressing
modes.

llvm-svn: 24608
2005-12-05 18:23:57 +00:00
Chris Lattner dc4ffef633 Fix a bug where we didn't realize that vaarg reads memory. This fixes
Transforms/DeadStoreElimination/2005-11-30-vaarg.ll

llvm-svn: 24545
2005-11-30 19:38:22 +00:00
Andrew Lenharth 5fc3794e71 since reg2mem requires it, might as well mention that it preserves it
llvm-svn: 24491
2005-11-25 16:04:54 +00:00
Andrew Lenharth 061029dee2 Reg2Mem is something a pass may depend on, so allow that
llvm-svn: 24488
2005-11-22 22:14:23 +00:00
Andrew Lenharth 71b09bbb07 turns out, demotion and invokes and critical edges don't mix
llvm-svn: 24487
2005-11-22 21:45:19 +00:00
Chris Lattner 9c37f23645 Fix a crash building 176.gcc due to my recent patch, which only fixed
half the problem.

llvm-svn: 24414
2005-11-18 18:30:47 +00:00
Chris Lattner bca0be812d This was checking the wrong GEP expression. Fixing this fixes a gccas crash
compiling mysql reported by Ted Kremenek.

llvm-svn: 24402
2005-11-17 19:35:42 +00:00
Andrew Lenharth d9c13b1336 the pain isn't gone unless the phinodes are spilled too
llvm-svn: 24288
2005-11-10 19:39:09 +00:00
Andrew Lenharth 8e66c0c8a9 this works with backedges to the existing entry block alot better
llvm-svn: 24270
2005-11-10 17:35:34 +00:00
Andrew Lenharth 4130a4f061 The pass everyone has been waiting for!
Reg2Mem

for fun you can opt -reg2mem -mem2reg

llvm-svn: 24267
2005-11-10 01:58:38 +00:00
Nate Begeman 848622f87f Add support alignment of allocation instructions.
Add support for specifying alignment and size of setjmp jmpbufs.

No targets currently do anything with this information, nor is it presrved
in the bytecode representation.  That's coming up next.

llvm-svn: 24196
2005-11-05 09:21:28 +00:00
Chris Lattner 16b29e9562 Implement Transforms/TailCallElim/return-undef.ll, a trivial case
that has been sitting in my inbox since May 18. :)

llvm-svn: 24194
2005-11-05 08:21:11 +00:00
Chris Lattner dd0c174082 Turn sdiv into udiv if both operands have a clear sign bit. This occurs
a few times in crafty:

OLD:    %tmp.36 = div int %tmp.35, 8            ; <int> [#uses=1]
NEW:    %tmp.36 = div uint %tmp.35, 8           ; <uint> [#uses=0]
OLD:    %tmp.19 = div int %tmp.18, 8            ; <int> [#uses=1]
NEW:    %tmp.19 = div uint %tmp.18, 8           ; <uint> [#uses=0]
OLD:    %tmp.117 = div int %tmp.116, 8          ; <int> [#uses=1]
NEW:    %tmp.117 = div uint %tmp.116, 8         ; <uint> [#uses=0]
OLD:    %tmp.92 = div int %tmp.91, 8            ; <int> [#uses=1]
NEW:    %tmp.92 = div uint %tmp.91, 8           ; <uint> [#uses=0]

Which all turn into shrs.

llvm-svn: 24190
2005-11-05 07:40:31 +00:00
Chris Lattner e9ff0eaf5b Turn srem -> urem when neither input has their sign bit set. This triggers
8 times in vortex, allowing the srems to be turned into shrs:

OLD:    %tmp.104 = rem int %tmp.5.i37, 16               ; <int> [#uses=1]
NEW:    %tmp.104 = rem uint %tmp.5.i37, 16              ; <uint> [#uses=0]
OLD:    %tmp.98 = rem int %tmp.5.i24, 16                ; <int> [#uses=1]
NEW:    %tmp.98 = rem uint %tmp.5.i24, 16               ; <uint> [#uses=0]
OLD:    %tmp.91 = rem int %tmp.5.i19, 8         ; <int> [#uses=1]
NEW:    %tmp.91 = rem uint %tmp.5.i19, 8                ; <uint> [#uses=0]
OLD:    %tmp.88 = rem int %tmp.5.i14, 8         ; <int> [#uses=1]
NEW:    %tmp.88 = rem uint %tmp.5.i14, 8                ; <uint> [#uses=0]
OLD:    %tmp.85 = rem int %tmp.5.i9, 1024               ; <int> [#uses=2]
NEW:    %tmp.85 = rem uint %tmp.5.i9, 1024              ; <uint> [#uses=0]
OLD:    %tmp.82 = rem int %tmp.5.i, 512         ; <int> [#uses=2]
NEW:    %tmp.82 = rem uint %tmp.5.i1, 512               ; <uint> [#uses=0]
OLD:    %tmp.48.i = rem int %tmp.5.i.i161, 4            ; <int> [#uses=1]
NEW:    %tmp.48.i = rem uint %tmp.5.i.i161, 4           ; <uint> [#uses=0]
OLD:    %tmp.20.i2 = rem int %tmp.5.i.i, 4              ; <int> [#uses=1]
NEW:    %tmp.20.i2 = rem uint %tmp.5.i.i, 4             ; <uint> [#uses=0]

it also occurs 9 times in gcc, but with odd constant divisors (1009 and 61)
so the payoff isn't as great.

llvm-svn: 24189
2005-11-05 07:28:37 +00:00
Andrew Lenharth 662295587d make this 64 bit clean, fixed test30 of /Regression/Transforms/InstCombine/add.ll
llvm-svn: 24158
2005-11-02 18:35:40 +00:00
Chris Lattner 09efd4e5b6 Limit the search depth of MaskedValueIsZero to 6 instructions, to avoid
bad cases.  This fixes Markus's second testcase in PR639, and should
seal it for good.

llvm-svn: 24123
2005-10-31 18:35:52 +00:00
Chris Lattner 27d351f159 This pass is now obsolete since all targets have moved to the SelectionDAG
infrastructure and the simple isels have been removed.

llvm-svn: 24090
2005-10-29 05:33:46 +00:00
Chris Lattner 8f663e8bbc Pull some code out into a function, give it the ability to see through +.
This allows us to turn code like malloc(4*x+4) -> malloc int, (x+1)

llvm-svn: 24081
2005-10-29 04:36:15 +00:00
Chris Lattner 8270c33606 Remove a special case, allowing the general case to handle it. No functionality
change.

llvm-svn: 24076
2005-10-29 03:19:53 +00:00
Chris Lattner b9d3ca5c3c Fix a bit of backwards logic that broke exptree and smg2000
llvm-svn: 24056
2005-10-28 16:27:35 +00:00
Chris Lattner c4f67e67d2 Do not sink any instruction with side effects, including vaarg. This fixes
PR640

llvm-svn: 24046
2005-10-27 17:13:11 +00:00
Chris Lattner c6372cca78 Fix typo
llvm-svn: 24033
2005-10-27 06:26:26 +00:00
Chris Lattner 0fe7551bc0 Teach instcombine to promote stuff like (cast (malloc sbyte, 8*X) to int*)
into: malloc int, (2*X)

llvm-svn: 24032
2005-10-27 06:24:46 +00:00
Chris Lattner b3ecf96900 Promote cases like cast (malloc sbyte, 100) to int* into
(malloc [25 x int]) directly without having to convert to
(malloc [100 x sbyte]) first.

llvm-svn: 24031
2005-10-27 06:12:00 +00:00
Chris Lattner bb17180a23 Minor change to this file to support obscure cases with constant array amounts
llvm-svn: 24030
2005-10-27 05:53:56 +00:00
Chris Lattner 38a1b00a0f fold nested and's early to avoid inefficiencies in MaskedValueIsZero. This
fixes a very slow compile in PR639.

llvm-svn: 24011
2005-10-26 17:18:16 +00:00
Jeff Cohen 2b8cbf319c Update Visual Studio projects to reflect moved file.
llvm-svn: 23998
2005-10-26 05:36:51 +00:00
Chris Lattner 46705b2f2d Handle allocations that, even after removing dead uses, still have more than
one use (but one is a cast).  This handles the very common case of:

 X = alloc [n x byte]
 Y = cast X to somethingbetter
 seteq X, null

In order to avoid infinite looping when there are multiple casts, we only
allow this if the xform is strictly increasing the alignment of the
allocation.

llvm-svn: 23961
2005-10-24 06:35:18 +00:00
Chris Lattner 355ecc09f8 Fix a bug where we would 'promote' an allocation from one type to another
where the second has less alignment required.  If we had explicit alignment
support in the IR, we could handle this case, but we can't until we do.

llvm-svn: 23960
2005-10-24 06:26:18 +00:00
Chris Lattner ac87beb03a Before promoting a malloc type, remove dead uses. This makes instcombine
more effective at promoting these allocations, catching them earlier in the
compile process.

llvm-svn: 23959
2005-10-24 06:22:12 +00:00
Chris Lattner 216be91817 Pull some code out into a function, no functionality change
llvm-svn: 23958
2005-10-24 06:03:58 +00:00
Chris Lattner bde3845548 DONT_BUILD_RELINKED is gone and implied by BUILD_ARCHIVE now
llvm-svn: 23940
2005-10-24 02:26:13 +00:00
Chris Lattner 8c087e962c Only build .a file versions of these libraries, instead of .a and .o versions.
This should speed up build times.

llvm-svn: 23933
2005-10-24 01:59:48 +00:00
Chris Lattner bd77fac034 Make sure that anything using the ADCE pass pulls in the UnifyFunctionExitNodes
code

llvm-svn: 23931
2005-10-24 01:40:23 +00:00
Jeff Cohen 11e26b52b2 When a function takes a variable number of pointer arguments, with a zero
pointer marking the end of the list, the zero *must* be cast to the pointer
type.  An un-cast zero is a 32-bit int, and at least on x86_64, gcc will
not extend the zero to 64 bits, thus allowing the upper 32 bits to be
random junk.

The new END_WITH_NULL macro may be used to annotate a such a function
so that GCC (version 4 or newer) will detect the use of un-casted zero
at compile time.

llvm-svn: 23888
2005-10-23 04:37:20 +00:00
Chris Lattner 5df0e36e98 My previous patch was too conservative. Reject FP and void types, but do
allow pointer types.

llvm-svn: 23859
2005-10-21 05:45:41 +00:00
Chris Lattner 0c0b38bb4c Do NOT touch FP ops with LSR. This fixes a testcase Nate sent me from an
inner loop like this:

LBB_RateConvertMono8AltiVec_2:  ; no_exit
        lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
        lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
        fmr f3, f3
        fadd f0, f2, f0
        fadd f3, f0, f3
        fcmpu cr0, f3, f1
        bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit

to an inner loop like this:

LBB_RateConvertMono8AltiVec_1:  ; no_exit
        fsub f2, f2, f1
        fcmpu cr0, f2, f1
        fmr f0, f2
        bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit

Doh! good catch!

llvm-svn: 23838
2005-10-20 04:47:10 +00:00
Chris Lattner da1b152c43 Make this work for FP constantexprs
llvm-svn: 23773
2005-10-17 20:18:38 +00:00
Chris Lattner 7fde91e365 Oops, X+0.0 isn't foldable, but X+-0.0 is.
llvm-svn: 23772
2005-10-17 17:56:38 +00:00
Chris Lattner 32979336a7 relax this a bit, as we only support the default rounding mode
llvm-svn: 23771
2005-10-17 17:49:32 +00:00
Chris Lattner 192cd18f53 Fix (hopefully the last) issue where LSR is nondeterminstic. When pulling
out CSE's of base expressions it could build a result whose order was
nondet.

llvm-svn: 23698
2005-10-11 18:41:04 +00:00
Chris Lattner 5c9d63da31 Fix another problem where LSR was being nondeterminstic. Also remove elements
from the end of a vector instead of the beginning

llvm-svn: 23697
2005-10-11 18:30:57 +00:00
Chris Lattner b7a3894e7c Fix another lsr-is-nondeterministic case
llvm-svn: 23695
2005-10-11 18:17:57 +00:00
Chris Lattner 03b9eb506c Make MaskedValueIsZero a bit more aggressive
llvm-svn: 23677
2005-10-09 22:08:50 +00:00
Chris Lattner 62010c450f Fix funky xcode indentation
llvm-svn: 23674
2005-10-09 06:36:35 +00:00
Chris Lattner eb4be8b942 Hrm, you didn't see this.
llvm-svn: 23673
2005-10-09 06:24:02 +00:00
Chris Lattner 4ea0a3eaac Fix a source of non-determinism in the backend: the order of processing
IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.

llvm-svn: 23672
2005-10-09 06:20:55 +00:00
Jeff Cohen 572910c9a2 Remove useless variable.
llvm-svn: 23656
2005-10-07 05:28:29 +00:00
Chris Lattner f07a587c79 Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. In
particular, it should realize that phi's use their values in the pred block
not the phi block itself.  This change turns our em3d loop from this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_6    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
        addi r3, r2, 1
        blr
LBB_test_6:     ; loopexit
        or r3, r2, r2
        blr

into:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r6, r6
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        or r2, r6, r6
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r2, r2
        blr


Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow.  If it were doing its job right, it could turn the
code into this:

_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r6, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r6, r6
        blr

... which I'll work on next. :)

llvm-svn: 23604
2005-10-03 02:50:05 +00:00
Chris Lattner e4ed42a426 Refactor some code into a function
llvm-svn: 23603
2005-10-03 01:04:44 +00:00
Chris Lattner 360928dbed This break is bogus and I have no idea why it was there. Basically it prevents
memoizing code when IV's are used by phinodes outside of loops.  In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):

        li r6, 0
        or r7, r6, r6
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r7, r7
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r2, r7, 1
        addi r7, r7, 1
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

Now we get:

        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit

this was noticed in em3d.

llvm-svn: 23602
2005-10-03 00:37:33 +00:00
Chris Lattner 8fcce170cf when checking if we should move a split edge block outside of a loop,
check the presplit pred, not the post-split pred.  This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.

llvm-svn: 23601
2005-10-03 00:31:52 +00:00
Jeff Cohen f8a5e5ae6e Fix VC++ warnings.
llvm-svn: 23579
2005-10-01 03:57:14 +00:00
Chris Lattner a554c9470b Insert stores after phi nodes in the normal dest. This fixes
LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 23525
2005-09-29 17:44:20 +00:00
Chris Lattner 3b63bb375c add a note about a way to improve this code further, that I won't be getting
to right now.

llvm-svn: 23485
2005-09-27 22:44:59 +00:00
Chris Lattner e285f5ed8f Avoid spilling stack slots... to stack slots.
llvm-svn: 23478
2005-09-27 21:33:12 +00:00
Chris Lattner 87eb249300 Completely rewrite 'correct' eh support. This changes how setjmp insertion
is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function.  This patch has the following perks:

1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
   setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
   forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
   now about 4% faster than GCC.

Further improvements are also possible.

llvm-svn: 23477
2005-09-27 21:18:17 +00:00
Chris Lattner 92233d2175 Make the pass name simpler
llvm-svn: 23476
2005-09-27 21:10:32 +00:00
Chris Lattner 02ae21e1e0 Eliminate GetGEPGlobalInitializer in favor of the more powerful
ConstantFoldLoadThroughGEPConstantExpr function in the utils lib.

llvm-svn: 23446
2005-09-26 05:28:52 +00:00
Chris Lattner 0b011ec8e2 Factor the GetGEPGlobalInitializer out of this pass and into Transforms/Utils
as ConstantFoldLoadThroughGEPConstantExpr.

llvm-svn: 23445
2005-09-26 05:28:06 +00:00
Chris Lattner 0b3557f54a Move MaskedValueIsZero up.
Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll

llvm-svn: 23428
2005-09-24 23:43:33 +00:00
Chris Lattner b4b2530a1a Refactor this code a bit and make it more general. This now compiles:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) { b.j += x; }

To:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        slwi r3, r3, 6
        add r3, r4, r3
        rlwimi r3, r4, 0, 26, 14
        stw r3, 0(r2)
        blr


instead of:

_plus2:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 26, 21, 31
        add r3, r5, r3
        rlwimi r4, r3, 6, 15, 25
        stw r4, 0(r2)
        blr

by eliminating an 'and'.

I'm pretty sure this is as small as we can go :)

llvm-svn: 23386
2005-09-18 07:22:02 +00:00
Chris Lattner 797dee7705 Compile
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus2 (unsigned int x) {
  b.j += x;
}

to:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        and %ECX, 131008
        mov %EDX, DWORD PTR [%ESP + 4]
        shl %EDX, 6
        add %EDX, %ECX
        and %EDX, 131008
        and %EAX, -131009
        or %EDX, %EAX
        mov DWORD PTR [b], %EDX
        ret

instead of:

plus2:
        mov %EAX, DWORD PTR [b]
        mov %ECX, %EAX
        shr %ECX, 6
        and %ECX, 2047
        add %ECX, DWORD PTR [%ESP + 4]
        shl %ECX, 6
        and %ECX, 131008
        and %EAX, -131009
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23385
2005-09-18 06:30:59 +00:00
Chris Lattner 01f56c68e9 Generalize this transform, using MaskedValueIsZero, allowing us to compile:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) { b.k += x; }

To:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        add DWORD PTR [b], %EAX
        ret

instead of:

plus3:
        mov %EAX, DWORD PTR [%ESP + 4]
        shl %EAX, 17
        mov %ECX, DWORD PTR [b]
        add %EAX, %ECX
        and %EAX, -131072
        and %ECX, 131071
        or %ECX, %EAX
        mov DWORD PTR [b], %ECX
        ret

llvm-svn: 23384
2005-09-18 06:02:59 +00:00
Chris Lattner 4ebc8ab4e0 fix typeo
llvm-svn: 23383
2005-09-18 05:25:20 +00:00
Chris Lattner e5b23a6d67 Remove unintentionally committed code
llvm-svn: 23382
2005-09-18 05:12:51 +00:00
Chris Lattner 27cb9dbd35 implement shift.ll:test25. This compiles:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus3 (unsigned int x) {
  b.k += x;
}

to:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r3, 0(r2)
        rlwinm r4, r3, 0, 0, 14
        add r4, r4, r3
        rlwimi r4, r3, 0, 15, 31
        stw r4, 0(r2)
        blr

instead of:

_plus3:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        srwi r5, r4, 17
        add r3, r5, r3
        slwi r3, r3, 17
        rlwimi r3, r4, 0, 15, 31
        stw r3, 0(r2)
        blr

llvm-svn: 23381
2005-09-18 05:12:10 +00:00
Chris Lattner af517574ce Implement add.ll:test29. Codegening:
struct S { unsigned int i : 6, j : 11, k : 15; } b;
void plus1 (unsigned int x) {
  b.i += x;
}

as:
_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        add r3, r4, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

instead of:

_plus1:
        lis r2, ha16(L_b$non_lazy_ptr)
        lwz r2, lo16(L_b$non_lazy_ptr)(r2)
        lwz r4, 0(r2)
        rlwinm r5, r4, 0, 26, 31
        add r3, r5, r3
        rlwimi r3, r4, 0, 0, 25
        stw r3, 0(r2)
        blr

llvm-svn: 23379
2005-09-18 04:24:45 +00:00
Chris Lattner 027eaf01cf remove debug output
llvm-svn: 23377
2005-09-18 03:50:25 +00:00
Chris Lattner 1521298993 Implement or.ll:test21. This teaches instcombine to be able to turn this:
struct {
   unsigned int bit0:1;
   unsigned int ubyte:31;
} sdata;

void foo() {
  sdata.ubyte++;
}

into this:

foo:
        add DWORD PTR [sdata], 2
        ret

instead of this:

foo:
        mov %EAX, DWORD PTR [sdata]
        mov %ECX, %EAX
        add %ECX, 2
        and %ECX, -2
        and %EAX, 1
        or %EAX, %ECX
        mov DWORD PTR [sdata], %EAX
        ret

llvm-svn: 23376
2005-09-18 03:42:07 +00:00
Chris Lattner a393e4d4b3 Fix the regression last night compiling povray
llvm-svn: 23348
2005-09-14 17:32:56 +00:00
Chris Lattner 2a8932960d Add a simple xform to simplify array accesses with casts in the way.
This is useful for 178.galgel where resolution of dope vectors (by the
optimizer) causes the scales to become apparent.

llvm-svn: 23328
2005-09-13 18:36:04 +00:00
Chris Lattner fd018c8dfe Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.
This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.

llvm-svn: 23327
2005-09-13 02:09:55 +00:00
Chris Lattner 567b81f0d2 Add a helper function, allowing us to simplify some code a bit, changing
indentation, no functionality change

llvm-svn: 23325
2005-09-13 00:40:14 +00:00
Chris Lattner 219175c84d Implement a simple xform to turn code like this:
if () { store A -> P; } else { store B -> P; }

into a PHI node with one store, in the most trival case.  This implements
load.ll:test10.

llvm-svn: 23324
2005-09-12 23:23:25 +00:00
Chris Lattner e0bfdf1485 Another load-peephole optimization: do gcse when two loads are next to
each other.  This implements InstCombine/load.ll:test9

llvm-svn: 23322
2005-09-12 22:21:03 +00:00
Chris Lattner b990f7d8ed Implement a trivial form of store->load forwarding where the store and the
load are exactly consequtive.  This is picked up by other passes, but this
triggers thousands of times in fortran programs that use static locals
(and is thus a compile-time speedup).

llvm-svn: 23320
2005-09-12 22:00:15 +00:00
Chris Lattner 8048b85e8f Fix a regression from last night, which caused this pass to create invalid
code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.

Also use a new LoopInfo method to simplify some code.

This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll

llvm-svn: 23318
2005-09-12 17:11:27 +00:00
Chris Lattner a67648396a _test:
li r2, 0
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r2, 1
        stw r2, 0(r4)
        blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):

_test:
        li r2, 0                 **** IV starts at 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2            **** Copy for loop exit
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2           **** IV+2
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2       ****  IV+2
        stw r2, 0(r4)
        blr

And now generated code like this:

_test:
        li r2, 1               *** IV starts at 1
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701     *** IV.postinc + 0
        blt cr0, LBB_test_1
LBB_test_2:     ; loopexit.2.loopexit
        stw r2, 0(r4)          *** IV.postinc + 0
        blr

llvm-svn: 23313
2005-09-12 06:04:47 +00:00
Chris Lattner 530fe6ab30 implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.
We used to emit this code for it:

_test:
        li r2, 1     ;; Value tying up a register for the whole loop
        li r5, 0
LBB_test_1:     ; no_exit.2
        or r6, r5, r5
        li r5, 0
        stw r5, 0(r3)
        addi r5, r6, 1
        addi r3, r3, 4
        add r7, r2, r5  ;; should be addi r7, r5, 1
        cmpwi cr0, r7, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r6, 2
        stw r2, 0(r4)
        blr

now we emit this:

_test:
        li r2, 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2   ;; whoa, fold those adds!
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2
        stw r2, 0(r4)
        blr

more improvement coming.

llvm-svn: 23306
2005-09-10 01:18:45 +00:00
Chris Lattner b5e381a8cf Fix a problem that Dan Berlin noticed, where reassociation would not succeed
in building maximal expressions before simplifying them.  In particular, i
cases like this:

X-(A+B+X)

the code would consider A+B+X to be a maximal expression (not understanding
that the single use '-' would be turned into a + later), simplify it (a noop)
then later get simplified again.

Each of these simplify steps is where the cost of reassociation comes from,
so this patch should speed up the already fast pass a bit.

Thanks to Dan for noticing this!

llvm-svn: 23214
2005-09-02 07:07:58 +00:00
Chris Lattner 9fe263aa75 Avoid creating garbage instructions, just move the old add instruction
to where we need it when converting -(A+B+C) -> -A + -B + -C.

llvm-svn: 23213
2005-09-02 06:38:04 +00:00
Chris Lattner d1325da091 add some assertions and fix problems where reassociate could access the
Ops vector out of range

llvm-svn: 23211
2005-09-02 05:23:22 +00:00
Chris Lattner 8ca5b2a6d2 Fix Regression/Transforms/Reassociate/2005-08-24-Crash.ll
llvm-svn: 23019
2005-08-24 17:55:32 +00:00
Chris Lattner ea7dfd53d6 Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash
on 177.mesa

llvm-svn: 22843
2005-08-17 21:22:41 +00:00
Chris Lattner 2bf7cb5213 Use a new helper to split critical edges, making the code simpler.
Do not claim to not change the CFG.  We do change the cfg to split critical
edges.  This isn't causing us a problem now, but could likely do so in the
future.

llvm-svn: 22824
2005-08-17 06:35:16 +00:00
Chris Lattner 5cf983ee0f Fix a bad case in gzip where we put lots of things in registers across the
loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability

llvm-svn: 22798
2005-08-16 00:38:11 +00:00
Chris Lattner 47d3ec3525 Ooops, don't forget to clear this. The real inner loop is now:
.LBB_foo_3:     ; no_exit.1
        lfd f2, 0(r9)
        lfd f3, 8(r9)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r9)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfd f2, 0(r9)
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22782
2005-08-13 07:42:01 +00:00
Chris Lattner 5949d49032 Recursively scan scev expressions for common subexpressions. This allows us
to handle nested loops much better, for example, by being able to tell that
these two expressions:

{( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>

{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

Have the following common part that can be shared:
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>

This allows us to codegen an important inner loop in 168.wupwise as:

.LBB_foo_4:     ; no_exit.1
        lfd f2, 16(r9)
        fmul f3, f0, f2
        fmul f2, f1, f2
        fadd f4, f3, f2
        stfd f4, 8(r9)
        fsub f2, f3, f2
        stfd f2, 16(r9)
        addi r8, r8, 1
        addi r9, r9, 16
        cmpw cr0, r8, r4
        ble .LBB_foo_4  ; no_exit.1

instead of:

.LBB_foo_3:     ; no_exit.1
        lfdx f2, r6, r9
        add r10, r6, r9
        lfd f3, 8(r10)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r10)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfdx f2, r6, r9
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1

llvm-svn: 22781
2005-08-13 07:27:18 +00:00
Chris Lattner 79396539d3 remove dead code. The exit block list is computed on demand, thus does not
need to be updated.  This code is a relic from when it did.

llvm-svn: 22775
2005-08-13 01:30:36 +00:00
Chris Lattner 8447b49526 When splitting critical edges, make sure not to leave the new block in the
middle of the loop.  This turns a critical loop in gzip into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

instead of this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_3:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
        add r2, r11, r27
        add r8, r12, r27
        b .LBB_test_9   ; loopexit
.LBB_test_5:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
        add r2, r9, r27
        add r8, r10, r27
        b .LBB_test_9   ; loopexit
.LBB_test_7:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry

Next up, improve the code for the loop.

llvm-svn: 22769
2005-08-12 22:22:17 +00:00
Chris Lattner 4fec86d348 Fix a FIXME: if we are inserting code for a PHI argument, split the critical
edge so that the code is not always executed for both operands.  This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops).  On gzip, for
example, we turn this ugly code:

.LBB_test_1:    ; loopentry
        add r27, r3, r28
        lhz r27, 3(r27)
        add r26, r4, r28
        lhz r26, 3(r26)
        add r25, r30, r28    ;; Only live if exiting the loop
        add r24, r29, r28    ;; Only live if exiting the loop
        cmpw cr0, r27, r26
        bne .LBB_test_5 ; loopexit

into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_2:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


into this:

.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_t_3:    ; shortcirc_next.0
.LBB_test_3:    ; shortcirc_next.0
        ...
        blt .LBB_test_1


Next step: get the block out of the loop so that the loop is all
fall-throughs again.

llvm-svn: 22766
2005-08-12 22:06:11 +00:00
Chris Lattner 62df798919 remove some trickiness that broke yacr2 and some other programs last night
llvm-svn: 22751
2005-08-10 17:15:20 +00:00
Chris Lattner f83ce5faee Make loop-simplify produce better loops by turning PHI nodes like X = phi [X, Y]
into just Y.  This often occurs when it seperates loops that have collapsed loop
headers.  This implements LoopSimplify/phi-node-simplify.ll

llvm-svn: 22746
2005-08-10 02:07:32 +00:00
Chris Lattner 677d85784a Allow indvar simplify to canonicalize ANY affine IV, not just affine IVs with
constant stride.  This implements Transforms/IndVarsSimplify/variable-stride-ivs.ll

llvm-svn: 22744
2005-08-10 01:12:06 +00:00
Chris Lattner edff91a49a Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.
For code like this:

void foo(float *a, float *b, int n, int stride_a, int stride_b) {
  int i;
  for (i=0; i<n; i++)
      a[i*stride_a] = b[i*stride_b];
}

we now emit:

.LBB_foo2_2:    ; no_exit
        lfs f0, 0(r4)
        stfs f0, 0(r3)
        addi r7, r7, 1
        add r4, r2, r4
        add r3, r6, r3
        cmpw cr0, r7, r5
        blt .LBB_foo2_2 ; no_exit

instead of:

.LBB_foo_2:     ; no_exit
        mullw r8, r2, r7     ;; multiply!
        slwi r8, r8, 2
        lfsx f0, r4, r8
        mullw r8, r2, r6     ;; multiply!
        slwi r8, r8, 2
        stfsx f0, r3, r8
        addi r2, r2, 1
        cmpw cr0, r2, r5
        blt .LBB_foo_2  ; no_exit

loops with variable strides occur pretty often.  For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.

Now we can allow indvars to turn functions written like this:

void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
  int i, ai = 0, bi = 0;
  for (i=0; i<n; i++)
    {
      a[ai] = b[bi];
      ai += stride_a;
      bi += stride_b;
    }
}

into code like the above for better analysis.  With this patch, they generate
identical code.

llvm-svn: 22740
2005-08-10 00:45:21 +00:00
Chris Lattner dde7dc525e Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll
by being more careful about updating PHI nodes

llvm-svn: 22739
2005-08-10 00:35:32 +00:00
Chris Lattner c6c4d99a21 Fix some 80 column violations.
Once we compute the evolution for a GEP, tell SE about it.  This allows users
of the GEP to know it, if the users are not direct.  This allows us to compile
this testcase:

void fbSolidFillmmx(int w, unsigned char *d) {
    while (w >= 64) {
        *(unsigned long long *) (d +  0) = 0;
        *(unsigned long long *) (d +  8) = 0;
        *(unsigned long long *) (d + 16) = 0;
        *(unsigned long long *) (d + 24) = 0;
        *(unsigned long long *) (d + 32) = 0;
        *(unsigned long long *) (d + 40) = 0;
        *(unsigned long long *) (d + 48) = 0;
        *(unsigned long long *) (d + 56) = 0;
        w -= 64;
        d += 64;
    }
}

into:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r2, 0
        stw r2, 0(r4)
        stw r2, 4(r4)
        stw r2, 8(r4)
        stw r2, 12(r4)
        stw r2, 16(r4)
        stw r2, 20(r4)
        stw r2, 24(r4)
        stw r2, 28(r4)
        stw r2, 32(r4)
        stw r2, 36(r4)
        stw r2, 40(r4)
        stw r2, 44(r4)
        stw r2, 48(r4)
        stw r2, 52(r4)
        stw r2, 56(r4)
        stw r2, 60(r4)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

instead of:

.LBB_fbSolidFillmmx_2:  ; no_exit
        li r11, 0
        stw r11, 0(r4)
        stw r11, 4(r4)
        stwx r11, r10, r4
        add r12, r10, r4
        stw r11, 4(r12)
        stwx r11, r9, r4
        add r12, r9, r4
        stw r11, 4(r12)
        stwx r11, r8, r4
        add r12, r8, r4
        stw r11, 4(r12)
        stwx r11, r7, r4
        add r12, r7, r4
        stw r11, 4(r12)
        stwx r11, r6, r4
        add r12, r6, r4
        stw r11, 4(r12)
        stwx r11, r5, r4
        add r12, r5, r4
        stw r11, 4(r12)
        stwx r11, r2, r4
        add r12, r2, r4
        stw r11, 4(r12)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit

llvm-svn: 22737
2005-08-09 23:39:36 +00:00
Chris Lattner 02742710f3 SCEVAddExpr::get() of an empty list is invalid.
llvm-svn: 22724
2005-08-09 01:13:47 +00:00
Chris Lattner a091ff1764 Implement: LoopStrengthReduce/share_ivs.ll
Two changes:
  * Only insert one PHI node for each stride.  Other values are live in
    values.  This cannot introduce higher register pressure than the
    previous approach, and can take advantage of reg+reg addressing modes.
  * Factor common base values out of uses before moving values from the
    base to the immediate fields.  This improves codegen by starting the
    stride-specific PHI node out at a common place for each IV use.

As an example, we used to generate this for a loop in swim:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfd f0, 0(r8)
        stfd f0, 0(r3)
        lfd f0, 0(r6)
        stfd f0, 0(r7)
        lfd f0, 0(r2)
        stfd f0, 0(r5)
        addi r9, r9, 1
        addi r2, r2, 8
        addi r5, r5, 8
        addi r6, r6, 8
        addi r7, r7, 8
        addi r8, r8, 8
        addi r3, r3, 8
        cmpw cr0, r9, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

now we emit:

.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfdx f0, r8, r2
        stfdx f0, r9, r2
        lfdx f0, r5, r2
        stfdx f0, r7, r2
        lfdx f0, r3, r2
        stfdx f0, r6, r2
        addi r10, r10, 1
        addi r2, r2, 8
        cmpw cr0, r10, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1

As another more dramatic example, we used to emit this:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfd f0, 8(r21)
        lfd f4, 8(r3)
        lfd f5, 8(r27)
        lfd f6, 8(r22)
        lfd f7, 8(r5)
        lfd f8, 8(r6)
        lfd f9, 8(r30)
        lfd f10, 8(r11)
        lfd f11, 8(r12)
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfd f0, 8(r4)
        lfd f0, 8(r25)
        lfd f5, 8(r26)
        lfd f6, 8(r23)
        lfd f9, 8(r28)
        lfd f10, 8(r10)
        lfd f12, 8(r9)
        lfd f13, 8(r29)
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfd f0, 8(r24)
        lfd f0, 8(r8)
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfd f0, 8(r2)
        addi r20, r20, 1
        addi r2, r2, 8
        addi r8, r8, 8
        addi r10, r10, 8
        addi r12, r12, 8
        addi r6, r6, 8
        addi r29, r29, 8
        addi r28, r28, 8
        addi r26, r26, 8
        addi r25, r25, 8
        addi r24, r24, 8
        addi r5, r5, 8
        addi r23, r23, 8
        addi r22, r22, 8
        addi r3, r3, 8
        addi r9, r9, 8
        addi r11, r11, 8
        addi r30, r30, 8
        addi r27, r27, 8
        addi r21, r21, 8
        addi r4, r4, 8
        cmpw cr0, r20, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

we now emit:

.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfdx f0, r21, r20
        lfdx f4, r3, r20
        lfdx f5, r27, r20
        lfdx f6, r22, r20
        lfdx f7, r5, r20
        lfdx f8, r6, r20
        lfdx f9, r30, r20
        lfdx f10, r11, r20
        lfdx f11, r12, r20
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfdx f0, r4, r20
        lfdx f0, r25, r20
        lfdx f5, r26, r20
        lfdx f6, r23, r20
        lfdx f9, r28, r20
        lfdx f10, r10, r20
        lfdx f12, r9, r20
        lfdx f13, r29, r20
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfdx f0, r24, r20
        lfdx f0, r8, r20
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfdx f0, r2, r20
        addi r19, r19, 1
        addi r20, r20, 8
        cmpw cr0, r19, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1

llvm-svn: 22722
2005-08-09 00:18:09 +00:00
Chris Lattner 37c24cc98c Suck the base value out of the UsersToProcess vector into the BasedUser
class to simplify the code.  Fuse two loops.

llvm-svn: 22721
2005-08-08 22:56:21 +00:00
Chris Lattner 37ed895bf1 Split MoveLoopVariantsToImediateField out from MoveImmediateValues. The
first is a correctness thing, and the later is an optzn thing.  This also
is needed to support a future change.

llvm-svn: 22720
2005-08-08 22:32:34 +00:00
Chris Lattner 9f269e40c9 Use the new 'moveBefore' method to simplify some code. Really, which is
easier to understand?  :)

llvm-svn: 22706
2005-08-08 19:11:57 +00:00
Chris Lattner 14203e85b2 Not all constants are legal immediates in load/store instructions.
llvm-svn: 22704
2005-08-08 06:25:50 +00:00
Chris Lattner c70bbc0c41 Implement LoopStrengthReduce/share_code_in_preheader.ll by having one
rewriter for all code inserted into the preheader, which is never flushed.

llvm-svn: 22702
2005-08-08 05:47:49 +00:00
Chris Lattner 9bfa6f8784 Implement a simple optimization for the termination condition of the loop.
The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.

On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:

_foo:
        li r2, 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        cmpw cr0, r2, r4
        bne .LBB_foo_1  ; no_exit
        blr

instead of:

_foo:
        li r2, 1                ;; IV starts at 1, not 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r5, r2, 1
        cmpw cr0, r2, r4
        or r2, r5, r5           ;; Reg-reg copy, extra live range
        bne .LBB_foo_1  ; no_exit
        blr

This implements LoopStrengthReduce/exit_compare_live_range.ll

llvm-svn: 22699
2005-08-08 05:28:22 +00:00
Chris Lattner 2c14cf7b74 Add some simple folds that occur in bitfield cases. Fix a minor bug in
isHighOnes, where it would consider 0 to have high ones.

llvm-svn: 22693
2005-08-07 07:03:10 +00:00
Chris Lattner 134ebd0801 Fix typoCVS: ----------------------------------------------------------------------
llvm-svn: 22692
2005-08-07 07:00:52 +00:00
Chris Lattner f4dd8c445c * Use the new PHINode::hasConstantValue method to simplify some code
* Teach this code to move allocas out of the loop when tail call eliminating
  a call marked 'tail'.  This implements TailCallElim/move_alloca_for_tail_call.ll
* Do not perform this transformation if a call is marked 'tail' and if there
  are allocas that we cannot move out of the loop in #2.  Doing so would increase
  the stack usage of the function.  This implements fixes
  PR615 and TailCallElim/dont-tce-tail-marked-call.ll.

llvm-svn: 22690
2005-08-07 04:27:41 +00:00
Chris Lattner 11e7a5eda7 Make sure to clean CastedPointers after casts are potentially deleted.
This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas

llvm-svn: 22673
2005-08-05 01:30:11 +00:00
Chris Lattner 9f9c260b8c now that hasConstantValue defaults to only returning values that dominate
the PHI node, this ugly code can vanish.

llvm-svn: 22672
2005-08-05 01:04:30 +00:00
Chris Lattner 257efb2ad3 This code can handle non-dominating instructions
llvm-svn: 22667
2005-08-05 00:57:45 +00:00
Nate Begeman b392321cae Fix a fixme in CondPropagate.cpp by moving a PhiNode optimization into
BasicBlock's removePredecessor routine.  This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp

llvm-svn: 22664
2005-08-04 23:24:19 +00:00
Chris Lattner 45f8b6e7aa Modify how immediates are removed from base expressions to deal with the fact
that the symbolic evaluator is not always able to use subtraction to remove
expressions.  This makes the code faster, and fixes the last crash on 178.galgel.
Finally, add a statistic to see how many phi nodes are inserted.

On 178.galgel, we get the follow stats:

2562 loop-reduce  - Number of PHIs inserted
3927 loop-reduce  - Number of GEPs strength reduced

llvm-svn: 22662
2005-08-04 22:34:05 +00:00
Chris Lattner a6d7c355bc * Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase
method.
* Fix a crash on 178.galgel, where we would insert expressions before PHI
  nodes instead of into the PHI node predecessor blocks.

llvm-svn: 22657
2005-08-04 20:03:32 +00:00
Chris Lattner 0f7c0fa2a7 Fix a case that caused this to crash on 178.galgel
llvm-svn: 22653
2005-08-04 19:26:19 +00:00
Chris Lattner acc42c4df1 Teach LSR about loop-variant expressions, such as loops like this:
for (i = 0; i < N; ++i)
    A[i][foo()] = 0;

here we still want to strength reduce the A[i] part, even though foo() is
l-v.

This also simplifies some of the 'CanReduce' logic.

This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll

llvm-svn: 22652
2005-08-04 19:08:16 +00:00
Nate Begeman 456044b724 Remove some more dead code.
llvm-svn: 22650
2005-08-04 18:13:56 +00:00
Chris Lattner eaf24725b2 Refactor this code substantially with the following improvements:
1. We only analyze instructions once, guaranteed
  2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
     something much simpler.

The next step is to handle expressions that are not all indvar+loop-invariant
values (e.g. handling indvar+loopvariant).

llvm-svn: 22649
2005-08-04 17:40:30 +00:00
Chris Lattner 6f286b760f refactor some code
llvm-svn: 22643
2005-08-04 01:19:13 +00:00
Chris Lattner 6510749050 invert to if's to make the logic simpler
llvm-svn: 22641
2005-08-04 00:40:47 +00:00
Chris Lattner a0102fbc4f When processing outer loops and we find uses of an IV in inner loops, make
sure to handle the use, just don't recurse into it.

This permits us to generate this code for a simple nested loop case:

.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r29, 44(r1)
        stw r30, 40(r1)
        mflr r11
        stw r11, 56(r1)
        lis r2, ha16(L_A$non_lazy_ptr)
        lwz r30, lo16(L_A$non_lazy_ptr)(r2)
        li r29, 1
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        li r2, 1
        or r3, r30, r30
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r3)
        stfd f0, 0(r3)
        addi r4, r2, 1
        addi r3, r3, 8
        cmpwi cr0, r2, 100
        or r2, r4, r4
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r30, r30, 800
        addi r2, r29, 1
        cmpwi cr0, r29, 100
        or r29, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 40(r1)
        lwz r29, 44(r1)
        lwz r1, 0(r1)
        blr

instead of this:

_foo:
.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r28, 44(r1)                   ;; uses an extra register.
        stw r29, 40(r1)
        stw r30, 36(r1)
        mflr r11
        stw r11, 56(r1)
        li r30, 1
        li r29, 0
        or r28, r29, r29
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        mulli r2, r28, 800           ;; unstrength-reduced multiply
        lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        mulli r4, r29, 800           ;; unstrength-reduced multiply
        addi r3, r3, 8
        add r3, r4, r3
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8                 ;; multiple stride 8 IV's
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r28, r28, 1               ;;; Many IV's with stride 1
        addi r29, r29, 1
        addi r2, r30, 1
        cmpwi cr0, r30, 100
        or r30, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 36(r1)
        lwz r29, 40(r1)
        lwz r28, 44(r1)
        lwz r1, 0(r1)
        blr

llvm-svn: 22640
2005-08-04 00:14:11 +00:00
Chris Lattner fc62470466 Teach loop-reduce to see into nested loops, to pull out immediate values
pushed down by SCEV.

In a nested loop case, this allows us to emit this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        li r3, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
        stfd f0, 0(r2)
        addi r4, r3, 1
        addi r2, r2, 8
        cmpwi cr0, r3, 100
        or r3, r4, r4
        bne .LBB_foo_2  ; no_exit.1

instead of this:

        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        addi r3, r3, 8
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1

llvm-svn: 22639
2005-08-03 23:44:42 +00:00
Chris Lattner bb78c97e24 improve debug output
llvm-svn: 22638
2005-08-03 23:30:08 +00:00
Chris Lattner db23c74e5e Move from Stage 0 to Stage 1.
Only emit one PHI node for IV uses with identical bases and strides (after
moving foldable immediates to the load/store instruction).

This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
us to generate this PPC code for test1:

        or r30, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r30)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

instead of this code:

        or r30, r3, r3
        or r29, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r29)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8        ;; Two iv's with step of 8
        addi r29, r29, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop

llvm-svn: 22635
2005-08-03 22:51:21 +00:00
Chris Lattner 430d0022df Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to
unify some parallel vectors and get field names more descriptive than
"first" and "second".  This isn't lisp afterall :)

llvm-svn: 22633
2005-08-03 22:21:05 +00:00
Chris Lattner 84e9baa925 Fix a nasty dangling pointer issue. The ScalarEvolution pass would keep a
map from instruction* to SCEVHandles.  When we delete instructions, we have
to tell it about it.  We would run into nasty cases where new instructions
were reallocated at old instruction addresses and get the old map values.
Bad bad bad :(

llvm-svn: 22632
2005-08-03 21:36:09 +00:00
Chris Lattner 3de05cc930 The correct fix for PR612, which also fixes
Transforms/LowerInvoke/2005-08-03-InvokeWithPHIUse.ll

llvm-svn: 22628
2005-08-03 18:51:44 +00:00
Chris Lattner f8a81a9886 When inserting code, make sure not to insert it before PHI nodes. This
fixes PR612 and Transforms/LowerInvoke/2005-08-03-InvokeWithPHI.ll

llvm-svn: 22626
2005-08-03 18:34:29 +00:00
Chris Lattner 22d00a8e90 Update to use the new MathExtras.h support for log2 computation.
Patch contributed by Jim Laskey!

llvm-svn: 22592
2005-08-02 19:16:58 +00:00
Chris Lattner 351b891cbc Like the comment says, do not insert cast instructions before phi nodes
llvm-svn: 22586
2005-08-02 03:31:14 +00:00
Chris Lattner 75a44e154e add a comment, make a check more lenient
llvm-svn: 22581
2005-08-02 02:52:02 +00:00
Chris Lattner dcce49e006 Simplify for loop, clear a per-loop map after processing each loop
llvm-svn: 22580
2005-08-02 02:44:31 +00:00
Chris Lattner 9ef1294210 Add a comment
Make LSR ignore GEP's that have loop variant base values, as we currently
cannot codegen them

llvm-svn: 22576
2005-08-02 01:32:29 +00:00
Chris Lattner 564900e5e5 Fix an iterator invalidation problem
llvm-svn: 22575
2005-08-02 00:41:11 +00:00
Jeff Cohen 546fd5944e Keep tabs and trailing spaces out.
llvm-svn: 22565
2005-07-30 18:33:25 +00:00
Jeff Cohen c500991055 Fix VC++ build problems.
llvm-svn: 22564
2005-07-30 18:22:27 +00:00
Nate Begeman 17a0e2afea Ack, typo
llvm-svn: 22560
2005-07-30 00:21:31 +00:00
Nate Begeman e68bcd1946 Commit a new LoopStrengthReduce pass that can use scalar evolutions and
target data to decide which loop induction variables to strength reduce
and how to do so.  This work is mostly by Chris Lattner, with tweaks by
me to get it working on some of MultiSource.

llvm-svn: 22558
2005-07-30 00:15:07 +00:00
Nate Begeman 2bca4d9b7b Break SCEVExpander out of IndVarSimplify into its own .h/.cpp file so that
other passes may use it.

llvm-svn: 22557
2005-07-30 00:12:19 +00:00
Jeff Cohen 5f4ef3c5a8 Eliminate all remaining tabs and trailing spaces.
llvm-svn: 22523
2005-07-27 06:12:32 +00:00
Chris Lattner 18aa4d8196 Do not let MaskedValueIsZero consider undef to be zero, for reasons
explained in the comment.

This fixes UnitTests/2003-09-18-BitFieldTest on darwin

llvm-svn: 22483
2005-07-20 18:49:28 +00:00
Chris Lattner 247aef884c When transforming &A[i] < &A[j] -> i < j, make sure to perform the comparison
as a signed compare.  This patch may fix PR597, but is correct in any case.

llvm-svn: 22465
2005-07-18 23:07:33 +00:00
Chris Lattner 4ed40f7c6f Fix a problem that instcombine would hit when dealing with unreachable code.
Because the instcombine has to scan the entire function when it starts up
to begin with, we might as well do it in DFO so we can nuke unreachable code.

This fixes: Transforms/InstCombine/2005-07-07-DeadPHILoop.ll

llvm-svn: 22348
2005-07-07 20:40:38 +00:00
Andrew Lenharth cf52eb2b99 prevent va_arg from being hoisted from a loop
llvm-svn: 22265
2005-06-20 13:36:33 +00:00
Andrew Lenharth 9144ec4764 core changes for varargs
llvm-svn: 22254
2005-06-18 18:34:52 +00:00
Reid Spencer 4fdd96c4e0 Clean up some uninitialized variables and missing return statements that
GCC 4.0.0 compiler (sometimes incorrectly) warns about under release build.

llvm-svn: 22249
2005-06-18 17:37:34 +00:00
Chris Lattner 2ceb6ee576 This is not true: (X != 13 | X < 15) -> X < 15
It is actually always true.  This fixes PR586 and
Transforms/InstCombine/2005-06-16-SetCCOrSetCCMiscompile.ll

llvm-svn: 22236
2005-06-17 03:59:17 +00:00
Chris Lattner 73bcba5f61 Don't crash when dealing with INTMIN. This fixes PR585 and
Transforms/InstCombine/2005-06-16-RangeCrash.ll

llvm-svn: 22234
2005-06-17 02:05:55 +00:00
Chris Lattner c53cb9d3ff avoid constructing out of range shift amounts.
llvm-svn: 22230
2005-06-17 01:29:28 +00:00
Chris Lattner 89dc4f16f5 Fix PR583 and testcase Transforms/InstCombine/2005-06-15-DivSelectCrash.ll
llvm-svn: 22227
2005-06-16 04:55:52 +00:00
Chris Lattner 252a845e30 Fix PR571, removing code that does just the WRONG thing :)
llvm-svn: 22225
2005-06-16 03:00:08 +00:00
Chris Lattner 104002bee3 Fix a bug in my previous patch. Do not get the shift amount type (which
is always ubyte, get the type being shifted).  This unbreaks espresso

llvm-svn: 22224
2005-06-16 01:52:07 +00:00
Chris Lattner df81539278 Fix PR582. The rewriter can move casts around, which invalidated the
BB iterator.  This fixes Transforms/IndVarsSimplify/2005-06-15-InstMoveCrash.ll

llvm-svn: 22221
2005-06-15 21:29:31 +00:00
Chris Lattner 19b57f55aa Fix PR577 and testcase InstCombine/2005-06-15-ShiftSetCCCrash.ll.
Do not perform undefined out of range shifts.

llvm-svn: 22217
2005-06-15 20:53:31 +00:00
Reid Spencer a299d6f701 Put the hack back in that removes features, causes regressions to fail, but
allows test programs to succeed. Actual fix for this is forthcoming.

llvm-svn: 22213
2005-06-15 18:25:30 +00:00
Reid Spencer 6d231e55fa Unbreak several InstCombine regression checks introduced by a hack to
fix the bzip2 test. A better hack is needed.

llvm-svn: 22209
2005-06-13 06:41:26 +00:00
Chris Lattner 1609a541cd Fix a 64-bit problem, passing (int)0 through ... instead of (void*)0
llvm-svn: 22206
2005-06-09 03:32:54 +00:00
Andrew Lenharth ffe65458e7 hack to fix bzip2 (bug 571)
llvm-svn: 22192
2005-06-04 12:43:56 +00:00
Chris Lattner 05c703ea85 preserve calling conventions when hacking on code
llvm-svn: 22024
2005-05-14 12:25:32 +00:00
Chris Lattner 61d9d81770 calling a function with the wrong CC is undefined, turn it into an unreachable
instruction.  This is useful for catching optimizers that don't preserve
calling conventions

llvm-svn: 21928
2005-05-13 07:09:09 +00:00
Chris Lattner ca968393ab When lowering invokes to calls, amke sure to preserve the calling conv. This
fixes Ptrdist/anagram with x86 llcbeta

llvm-svn: 21925
2005-05-13 06:27:02 +00:00
Chris Lattner ae186e012c Prefer int 0 instead of long 0 for GEP arguments.
llvm-svn: 21924
2005-05-13 06:10:12 +00:00
Chris Lattner 31c667e234 Fix Reassociate/shifttest.ll
llvm-svn: 21839
2005-05-10 03:39:25 +00:00
Chris Lattner bfc796f622 If a function contains no allocas, all of the calls in it are trivially
suitable for tail calls.

llvm-svn: 21836
2005-05-09 23:51:13 +00:00
Chris Lattner b62f5082c5 implement and.ll:test33
llvm-svn: 21809
2005-05-09 04:58:36 +00:00
Chris Lattner df3332660f Implement Reassociate/mul-neg-add.ll
llvm-svn: 21788
2005-05-08 21:41:35 +00:00
Chris Lattner c4f8e2b0ed Bail out earlier
llvm-svn: 21786
2005-05-08 21:33:47 +00:00
Chris Lattner 877b114037 Teach reassociate that 0-X === X*-1
llvm-svn: 21785
2005-05-08 21:28:52 +00:00
Chris Lattner 9f284e0a3c Fix PR557 and basictest[34].ll.
This makes reassociate realize that loads should be treated as unmovable, and
gives distinct ranks to distinct values defined in the same basic block, allowing
reassociate to do its thing.

llvm-svn: 21783
2005-05-08 20:57:04 +00:00
Chris Lattner 9187f3905e Add debugging information
llvm-svn: 21781
2005-05-08 20:09:57 +00:00
Chris Lattner 08582be283 eliminate gotos
llvm-svn: 21780
2005-05-08 19:48:43 +00:00
Chris Lattner 5847e5e10c Improve reassociation handling of inverses, implementing inverses.ll.
llvm-svn: 21778
2005-05-08 18:59:37 +00:00
Chris Lattner 4922118dc4 clean up and modernize this pass.
llvm-svn: 21776
2005-05-08 18:45:26 +00:00
Chris Lattner b18dbbfff5 Strength reduce SAR into SHR if there is no way sign bits could be shifted
in.  This tends to get cases like this:

  X = cast ubyte to int
  Y = shr int X, ...

Tested by: shift.ll:test24

llvm-svn: 21775
2005-05-08 17:34:56 +00:00
Chris Lattner e1850b86b6 Refactor some code
llvm-svn: 21772
2005-05-08 00:19:31 +00:00
Chris Lattner 6e2086d7e4 Handle some simple cases where we can see that values get annihilated.
llvm-svn: 21771
2005-05-08 00:08:33 +00:00
Chris Lattner 4294cec0f1 Fix a miscompilation of crafty by clobbering the "A" variable.
llvm-svn: 21770
2005-05-07 23:49:08 +00:00
Chris Lattner 1e5065052a Rewrite the guts of the reassociate pass to be more efficient and logical. Instead
of trying to do local reassociation tweaks at each level, only process an expression
tree once (at its root).  This does not improve the reassociation pass in any real way.

llvm-svn: 21768
2005-05-07 21:59:39 +00:00
Chris Lattner cea579932d Convert shifts to muls to assist reassociation. This implements
Reassociate/shifttest.ll

llvm-svn: 21761
2005-05-07 04:24:13 +00:00
Chris Lattner f43e974abd Simplify the code and rearrange it. No major functionality changes here.
llvm-svn: 21759
2005-05-07 04:08:02 +00:00
Chris Lattner 6aacb0f9da Preserve tail marker
llvm-svn: 21737
2005-05-06 06:48:21 +00:00
Chris Lattner ef298a3b8a Teach instcombine propagate zeroness through shl instructions, implementing
and.ll:test31

llvm-svn: 21717
2005-05-06 04:53:20 +00:00
Chris Lattner 873804168e Implement shift.ll:test23. If we are shifting right then immediately truncating
the result, turn signed shift rights into unsigned shift rights if possible.

This leads to later simplification and happens *often* in 176.gcc.  For example,
this testcase:

struct xxx { unsigned int code : 8; };
enum codes { A, B, C, D, E, F };
int foo(struct xxx *P) {
  if ((enum codes)P->code == A)
     bar();
}

used to be compiled to:

int %foo(%struct.xxx* %P) {
        %tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0           ; <uint*> [#uses=1]
        %tmp.2 = load uint* %tmp.1              ; <uint> [#uses=1]
        %tmp.3 = cast uint %tmp.2 to int                ; <int> [#uses=1]
        %tmp.4 = shl int %tmp.3, ubyte 24               ; <int> [#uses=1]
        %tmp.5 = shr int %tmp.4, ubyte 24               ; <int> [#uses=1]
        %tmp.6 = cast int %tmp.5 to sbyte               ; <sbyte> [#uses=1]
        %tmp.8 = seteq sbyte %tmp.6, 0          ; <bool> [#uses=1]
        br bool %tmp.8, label %then, label %UnifiedReturnBlock

Now it is compiled to:

        %tmp.1 = getelementptr %struct.xxx* %P, int 0, uint 0           ; <uint*> [#uses=1]
        %tmp.2 = load uint* %tmp.1              ; <uint> [#uses=1]
        %tmp.2 = cast uint %tmp.2 to sbyte              ; <sbyte> [#uses=1]
        %tmp.8 = seteq sbyte %tmp.2, 0          ; <bool> [#uses=1]
        br bool %tmp.8, label %then, label %UnifiedReturnBlock

which is the difference between this:

foo:
        subl $4, %esp
        movl 8(%esp), %eax
        movl (%eax), %eax
        shll $24, %eax
        sarl $24, %eax
        testb %al, %al
        jne .LBBfoo_2

and this:

foo:
        subl $4, %esp
        movl 8(%esp), %eax
        movl (%eax), %eax
        testb %al, %al
        jne .LBBfoo_2

This occurs 3243 times total in the External tests, 215x in povray,
6x in each f2c'd program, 1451x in 176.gcc, 7x in crafty, 20x in perl,
25x in gap, 3x in m88ksim, 25x in ijpeg.

Maybe this will cause a little jump on gcc tommorow :)

llvm-svn: 21715
2005-05-06 04:18:52 +00:00
Chris Lattner 7208616ec0 Implement xor.ll:test22
llvm-svn: 21713
2005-05-06 02:07:39 +00:00
Chris Lattner 4c2d3781aa implement and.ll:test30 and set.ll:test21
llvm-svn: 21712
2005-05-06 01:53:19 +00:00
Chris Lattner dd1e562ec3 implement or.ll:test20
llvm-svn: 21709
2005-05-06 00:58:50 +00:00
Chris Lattner 809dfac421 Instcombine: cast (X != 0) to int, cast (X == 1) to int -> X iff X has only the low bit set.
This implements set.ll:test20.

This triggers 2x on povray, 9x on mesa, 11x on gcc, 2x on crafty, 1x on eon,
6x on perlbmk and 11x on m88ksim.

It allows us to compile these two functions into the same code:

struct s { unsigned int bit : 1; };
unsigned foo(struct s *p) {
  if (p->bit)
    return 1;
  else
    return 0;
}
unsigned bar(struct s *p) { return p->bit; }

llvm-svn: 21690
2005-05-04 19:10:26 +00:00
John Criswell f42ed7bdaf Fixed a comment.
llvm-svn: 21653
2005-05-02 14:47:42 +00:00
Chris Lattner a816eee427 Implement getelementptr.ll:test11
llvm-svn: 21647
2005-05-01 04:42:15 +00:00
Chris Lattner a9d84e3388 Check for volatile loads only once.
Implement load.ll:test7

llvm-svn: 21645
2005-05-01 04:24:53 +00:00
Chris Lattner bd43b9db9d Fix the compile failures from last night.
llvm-svn: 21565
2005-04-26 14:40:41 +00:00
Chris Lattner a21bf8d1be implement getelementptr.ll:test10
llvm-svn: 21541
2005-04-25 20:17:30 +00:00
Chris Lattner 2f1457fd83 Eliminate cases where we could << by 64, which is undefined in C.
llvm-svn: 21500
2005-04-24 17:46:05 +00:00
Chris Lattner d6f636a340 Implement xor.ll:test21: select (not C), A, B -> select C, B, A
llvm-svn: 21495
2005-04-24 07:30:14 +00:00
Chris Lattner d1f46d3bf9 Use getPrimitiveSizeInBits() instead of getPrimitiveSize()*8
Completely rework the 'setcc (cast x to larger), y' code.  This code has
the advantage of implementing setcc.ll:test19 (being more general than
the previous code) and being correct in all cases.

This allows us to unxfail 2004-11-27-SetCCForCastLargerAndConstant.ll,
and close PR454.

llvm-svn: 21491
2005-04-24 06:59:08 +00:00
Jeff Cohen 82639853c0 Eliminate tabs and trailing spaces
llvm-svn: 21480
2005-04-23 21:38:35 +00:00
Chris Lattner 77c32c34d7 Generalize the setcc -> PHI and Select folding optimizations to work with
any constant RHS, not just a constant integer RHS.  This implements
select.ll:test17

llvm-svn: 21470
2005-04-23 15:31:55 +00:00
Misha Brukman b1c9317bb4 Remove trailing whitespace
llvm-svn: 21427
2005-04-21 23:48:37 +00:00
Chris Lattner 374e659466 Instcombine this:
%shortcirc_val = select bool %tmp.1, bool true, bool %tmp.4             ; <bool> [#uses=1]
        %tmp.6 = cast bool %shortcirc_val to int                ; <int> [#uses=1]

into this:

        %shortcirc_val = or bool %tmp.1, %tmp.4         ; <bool> [#uses=1]
        %tmp.6 = cast bool %shortcirc_val to int                ; <int> [#uses=1]

not this:

        %tmp.4.cast = cast bool %tmp.4 to int           ; <int> [#uses=1]
        %tmp.6 = select bool %tmp.1, int 1, int %tmp.4.cast             ; <int> [#uses=1]

llvm-svn: 21389
2005-04-21 05:43:13 +00:00
Chris Lattner 8cb10a1775 Wrap some long lines.
Make IPSCCP strip off dead constant exprs that are using functions, making
them appear as though their address is taken.  This allows us to propagate
some more pool descriptors, lowering the overhead of pool alloc.

llvm-svn: 21363
2005-04-19 19:16:19 +00:00
Chris Lattner 5c219469a0 Eliminate a broken transformation, fixing PR548
llvm-svn: 21354
2005-04-19 06:04:18 +00:00
Chris Lattner ee84413730 silence a bogus warning
llvm-svn: 21320
2005-04-18 05:26:21 +00:00
Chris Lattner 16a50fd0a0 a new simple pass, which will be extended to be more useful in the future.
This pass forward branches through conditions when it can show that the
conditions is either always true or false for a predecessor.  This currently
only handles the most simple cases of this, but is successful at threading
across 2489 branches and 65 switch instructions in 176.gcc, which isn't bad.

llvm-svn: 21306
2005-04-15 19:28:32 +00:00
Chris Lattner 4236261930 Fix bug: InstCombine/2005-05-07-UDivSelectCrash.ll
llvm-svn: 21152
2005-04-08 04:03:26 +00:00
Chris Lattner 4706046e68 Implement the following xforms:
(X-Y)-X --> -Y
A + (B - A) --> B
(B - A) + A --> B

llvm-svn: 21138
2005-04-07 17:14:51 +00:00
Chris Lattner c7f3c1a00e Implement InstCombine/add.ll:test28, transforming C1-(X+C2) --> (C1-C2)-X.
This occurs several dozen times in specint2k, particularly in crafty and gcc
apparently.

llvm-svn: 21136
2005-04-07 16:28:01 +00:00
Chris Lattner a9be4490d8 Transform X-(X+Y) == -Y and X-(Y+X) == -Y
llvm-svn: 21134
2005-04-07 16:15:25 +00:00
Chris Lattner ecfa9b5810 disable this transformation in the one obscure case that really pessimizes
pointer analysis.

llvm-svn: 20916
2005-03-29 06:37:47 +00:00
Alkis Evlogimenos 9ead0d7b4c Rename createPromoteMemoryToRegister() to
createPromoteMemoryToRegisterPass() to be consistent with other pass
creation functions.

llvm-svn: 20885
2005-03-28 02:01:12 +00:00
Chris Lattner 514e843e89 Enhance loopsimplify to preserve alias analysis instead of clobbering it.
This prevents crashes on some programs when using -ds-aa -licm.

llvm-svn: 20831
2005-03-25 06:37:22 +00:00
Chris Lattner faf7791fea Fix a bug where LICM was not updating AA information properly when sinking
a pointer value out of a loop causing it to be duplicated.

llvm-svn: 20828
2005-03-25 00:22:36 +00:00
Chris Lattner 1c790bf656 enable -debug-only=licm
llvm-svn: 20788
2005-03-23 21:00:12 +00:00
Chris Lattner 531f9e92d4 This mega patch converts us from using Function::a{iterator|begin|end} to
using Function::arg_{iterator|begin|end}.  Likewise Module::g* -> Module::global_*.

This patch is contributed by Gabor Greif, thanks!

llvm-svn: 20597
2005-03-15 04:54:21 +00:00
Chris Lattner 8c79559443 fix a bug where we thought arguments were constants :(
llvm-svn: 20506
2005-03-06 22:52:29 +00:00
Chris Lattner 2ce303b406 Fix Regression/Transforms/LoopStrengthReduce/dont_insert_redundant_ops.ll,
hopefully not breaking too many other things.

llvm-svn: 20505
2005-03-06 22:36:12 +00:00
Chris Lattner 45403e5052 implement Transforms/LoopStrengthReduce/invariant_value_first_arg.ll
llvm-svn: 20501
2005-03-06 22:06:22 +00:00
Chris Lattner d3874fad44 minor simplifications of the code.
llvm-svn: 20497
2005-03-06 21:58:22 +00:00
Chris Lattner dd3ec92085 trivial simplification
llvm-svn: 20494
2005-03-06 21:35:38 +00:00
Chris Lattner 238f6df546 Fix a bug where we could corrupt a parent loop's header info if we unrolled
a nested loop.  This fixes Transforms/LoopUnroll/2005-03-06-BadLoopInfoUpdate.ll
and PR532

llvm-svn: 20493
2005-03-06 20:57:32 +00:00
Jeff Cohen 4abcea3a69 Reformat comments to fix 80 columns.
llvm-svn: 20467
2005-03-05 22:45:40 +00:00
Jeff Cohen be37fa07fd Reuse induction variables created for strength-reduced GEPs by other similar GEPs.
llvm-svn: 20466
2005-03-05 22:40:34 +00:00
Chris Lattner cfe2822cdf Do not compute 1ULL << 64, which is undefined. This fixes Ptrdist/ks on the
sparc, and testcase Regression/Transforms/InstCombine/2005-03-04-ShiftOverflow.ll

llvm-svn: 20445
2005-03-04 23:21:33 +00:00
Jeff Cohen a2c59b7423 Add support for not strength reducing GEPs where the element size is a small
power of two.  This emphatically includes the zeroeth power of two.

llvm-svn: 20429
2005-03-04 04:04:26 +00:00
Chris Lattner ef1e989e4f Add an optional argument to lower to a specific constant value instead of
to a "sizeof" expression.

llvm-svn: 20414
2005-03-03 01:03:43 +00:00
Jeff Cohen 8ea6f9e821 Fixed the following LSR bugs:
* Loop invariant code does not dominate the loop header, but rather
    the end of the loop preheader.

  * The base for a reduced GEP isn't a constant unless all of its
    operands (preceding the induction variable) are constant.

  * Allow induction variable elimination for the simple case after all.

Also made changes recommended by Chris for properly deleting
instructions.

llvm-svn: 20383
2005-03-01 03:46:11 +00:00
Jeff Cohen dcaa48b5c4 Fix crash in LSR due to attempt to remove original induction variable. However,
for reasons explained in the comments, I also deactivated this code as it needs
more thought.

llvm-svn: 20367
2005-02-28 00:08:56 +00:00
Jeff Cohen fd63d3af0d PHI nodes were incorrectly placed when more than one GEP is reduced in a loop.
llvm-svn: 20360
2005-02-27 21:08:04 +00:00
Jeff Cohen 39751c3b7c First pass at improved Loop Strength Reduction. Still not yet ready for prime time.
llvm-svn: 20358
2005-02-27 19:37:07 +00:00
Chris Lattner 52e931b37d Remove use of bind_obj
llvm-svn: 20276
2005-02-22 23:22:58 +00:00
Chris Lattner 7b5d9e2217 Do not mark obviously unreachable blocks live when processing PHI nodes,
and handle incomplete control dependences correctly.  This fixes:

Regression/Transforms/ADCE/dead-phi-edge.ll
  -> a missed optimization

Regression/Transforms/ADCE/dead-phi-edge.ll
  -> a compiler crash distilled from QT4

llvm-svn: 20227
2005-02-17 19:28:49 +00:00
Chris Lattner 31f3382b3b Fix the second bug attached to PR504.
llvm-svn: 20181
2005-02-14 20:11:45 +00:00
Chris Lattner e616fea3bc Fix for testcase Transforms/IndVarsSimplify/2005-02-11-InvokeCrash.ll
and PR504.

llvm-svn: 20129
2005-02-12 03:26:49 +00:00
Chris Lattner 82b42c5d85 API change.
llvm-svn: 19959
2005-02-01 01:23:49 +00:00
Chris Lattner 72684fecf8 Implement InstCombine/cast.ll:test25, a case that occurs many times
in spec

llvm-svn: 19953
2005-01-31 05:51:45 +00:00
Chris Lattner 31f486c775 Implement the trivial cases in InstCombine/store.ll
llvm-svn: 19950
2005-01-31 05:36:43 +00:00
Chris Lattner fe1b0b8b24 Implement Transforms/InstCombine/cast-load-gep.ll, which allows us to devirtualize
11 indirect calls in perlbmk.

llvm-svn: 19947
2005-01-31 04:50:46 +00:00
Chris Lattner d8e20188c6 Adjust to changes in instruction interfaces.
llvm-svn: 19900
2005-01-29 00:39:08 +00:00
Chris Lattner cd517ff0c7 * add some DEBUG statements
* Properly compile this:

struct a {};
int test() {
  struct a b[2];
  if (&b[0] != &b[1])
    abort ();
  return 0;
}

to 'return 0', not abort().

llvm-svn: 19875
2005-01-28 19:32:01 +00:00
Chris Lattner 9e2c7facb2 Get rid of a several dozen more and instructions in specint.
llvm-svn: 19786
2005-01-23 20:26:55 +00:00
Chris Lattner fc4429e7c1 Handle comparisons of gep instructions that have different typed indices
as long as they are the same size.

llvm-svn: 19734
2005-01-21 23:06:49 +00:00
Chris Lattner 411336fe04 Add two optimizations. The first folds (X+Y)-X -> Y
The second folds operations into selects, e.g. (select C, (X+Y), (Y+Z))
-> (Y+(select C, X, Z)

This occurs a few times across spec, e.g.

         select    add/sub
mesa:    83        0
povray:  5         2
gcc      4         2
parser   0         22
perlbmk  13        30
twolf    0         3

llvm-svn: 19706
2005-01-19 21:50:18 +00:00
Chris Lattner 715364364b Delete PHI nodes that are not dead but are locked in a cycle of single
useness.

llvm-svn: 19629
2005-01-17 05:10:15 +00:00
Chris Lattner 03f06f11aa Move code out of indentation one level to make it easier to read.
Disable the xform for < > cases.  It turns out that the following is being
miscompiled:

bool %test(sbyte %S) {
        %T = cast sbyte %S to uint
        %V = setgt uint %T, 255
        ret bool %V
}

llvm-svn: 19628
2005-01-17 03:20:02 +00:00
Chris Lattner 51726c47fe Fix some bugs in an xform added yesterday. This fixes Prolangs-C/allroots.
llvm-svn: 19553
2005-01-14 17:35:12 +00:00
Chris Lattner 7aa41cfa88 Fix a compile crash on spiff
llvm-svn: 19552
2005-01-14 17:17:59 +00:00
Chris Lattner 4fa89827e2 if two gep comparisons only differ by one index, compare that index directly.
This allows us to better optimize begin() -> end() comparisons in common cases.

llvm-svn: 19542
2005-01-14 00:20:05 +00:00
Chris Lattner d35d210ea0 Do not overrun iterators. This fixes a 176.gcc crash
llvm-svn: 19541
2005-01-13 23:26:48 +00:00
Chris Lattner a04c904c4c Turn select C, (X+Y), (X-Y) --> (X+(select C, Y, (-Y))). This occurs in
the 'sim' program and probably elsewhere.  In sim, it comes up for cases
like this:

#define round(x) ((x)>0.0 ? (x)+0.5 : (x)-0.5)
double G;
void T(double X) { G = round(X); }

(it uses the round macro a lot).  This changes the LLVM code from:

        %tmp.1 = setgt double %X, 0.000000e+00          ; <bool> [#uses=1]
        %tmp.4 = add double %X, 5.000000e-01            ; <double> [#uses=1]
        %tmp.6 = sub double %X, 5.000000e-01            ; <double> [#uses=1]
        %mem_tmp.0 = select bool %tmp.1, double %tmp.4, double %tmp.6
        store double %mem_tmp.0, double* %G

to:

        %tmp.1 = setgt double %X, 0.000000e+00          ; <bool> [#uses=1]
        %mem_tmp.0.p = select bool %tmp.1, double 5.000000e-01, double -5.000000e-01
        %mem_tmp.0 = add double %mem_tmp.0.p, %X
        store double %mem_tmp.0, double* %G
        ret void

llvm-svn: 19537
2005-01-13 22:52:24 +00:00
Chris Lattner 81e8417614 Implement an optimization for == and != comparisons like this:
_Bool test2(int X, int Y) {
  return &arr[X][Y] == arr;
}

instead of generating this:

bool %test2(int %X, int %Y) {
        %tmp.3.idx = mul int %X, 160            ; <int> [#uses=1]
        %tmp.3.idx1 = shl int %Y, ubyte 2               ; <int> [#uses=1]
        %tmp.3.offs2 = sub int 0, %tmp.3.idx            ; <int> [#uses=1]
        %tmp.7 = seteq int %tmp.3.idx1, %tmp.3.offs2            ; <bool> [#uses=1]
        ret bool %tmp.7
}


generate this:

bool %test2(int %X, int %Y) {
        seteq int %X, 0         ; <bool>:0 [#uses=1]
        seteq int %Y, 0         ; <bool>:1 [#uses=1]
        %tmp.7 = and bool %0, %1                ; <bool> [#uses=1]
        ret bool %tmp.7
}

This idiom occurs in C++ programs when iterating from begin() to end(),
in a vector or array.  For example, we now compile this:

void test(int X, int Y) {
  for (int *i = arr; i != arr+100; ++i)
    foo(*i);
}

to this:

no_exit:                ; preds = %entry, %no_exit
	...
        %exitcond = seteq uint %indvar.next, 100                ; <bool> [#uses=1]
        br bool %exitcond, label %return, label %no_exit



instead of this:

no_exit:                ; preds = %entry, %no_exit
	...
        %inc5 = getelementptr [100 x [40 x int]]* %arr, int 0, int 0, int %inc.rec              ; <int*> [#uses=1]
        %tmp.8 = seteq int* %inc5, getelementptr ([100 x [40 x int]]* %arr, int 0, int 100, int 0)              ; <bool> [#uses=1]
        %indvar.next = add uint %indvar, 1              ; <uint> [#uses=1]
        br bool %tmp.8, label %return, label %no_exit

llvm-svn: 19536
2005-01-13 22:25:21 +00:00
Chris Lattner 4cb9fa373b Fix some bugs in code I didn't mean to check in.
llvm-svn: 19534
2005-01-13 20:40:58 +00:00
Chris Lattner 0798af33a5 Fix a crash compiling 129.compress
llvm-svn: 19533
2005-01-13 20:14:25 +00:00
Chris Lattner fdfe3e49fe Fix uint64_t -> unsigned VS warnings.
llvm-svn: 19381
2005-01-08 19:42:22 +00:00
Chris Lattner 47f395cd85 Silence VS warnings.
llvm-svn: 19380
2005-01-08 19:37:20 +00:00
Chris Lattner ce274ce93d Silence warnings
llvm-svn: 19379
2005-01-08 19:34:41 +00:00
Jeff Cohen 677babc4d4 Add more missing createXxxPass functions.
llvm-svn: 19370
2005-01-08 17:21:40 +00:00
Jeff Cohen eca0d0f2da Put createLoopUnswitchPass() into proper namespace
llvm-svn: 19306
2005-01-06 05:47:18 +00:00
Chris Lattner 86102b8ad5 This is a bulk commit that implements the following primary improvements:
* We can now fold cast instructions into select instructions that
    have at least one constant operand.
  * We now optimize expressions more aggressively based on bits that are
    known to be zero.  These optimizations occur a lot in code that uses
    bitfields even in simple ways.
  * We now turn more cast-cast sequences into AND instructions.  Before we
    would only do this if it if all types were unsigned.  Now only the
    middle type needs to be unsigned (guaranteeing a zero extend).
  * We transform sign extensions into zero extensions in several cases.

This corresponds to these test/Regression/Transforms/InstCombine testcases:
  2004-11-22-Missed-and-fold.ll
  and.ll: test28-29
  cast.ll: test21-24
  and-or-and.ll
  cast-cast-to-and.ll
  zeroext-and-reduce.ll

llvm-svn: 19220
2005-01-01 16:22:27 +00:00
Chris Lattner 13516fe2e7 Fix PR491 and testcase Transforms/DeadStoreElimination/2004-12-28-PartialStore.ll
llvm-svn: 19180
2004-12-29 04:36:02 +00:00
Chris Lattner b17f3e13ec Adjust to new interfaces
llvm-svn: 18958
2004-12-15 07:22:25 +00:00
Chris Lattner 9ad0d55025 Constant exprs are not efficiently negatable in practice. This disables
turning X - (constantexpr) into X + (-constantexpr) among other things.

llvm-svn: 18935
2004-12-14 20:08:06 +00:00
Chris Lattner 8f430a3b59 Get rid of getSizeOf, using ConstantExpr::getSizeOf instead.
do not insert a prototype for malloc of: void* malloc(uint): on 64-bit u
targets this is not correct.  Instead of prototype it as void *malloc(...),
and pass the correct intptr_t through the "...".

Finally, fix Regression/CodeGen/SparcV9/2004-12-13-MallocCrash.ll, by not
forming constantexpr casts from pointer to uint.

llvm-svn: 18908
2004-12-13 20:00:02 +00:00
Chris Lattner a199e3c1e2 Change indentation of a whole bunch of code, no real changes here.
llvm-svn: 18843
2004-12-12 23:49:37 +00:00
Chris Lattner 14d07db44d More substantial simplifications and speedups. This makes ADCE about 20% faster
in some cases.

llvm-svn: 18842
2004-12-12 23:40:17 +00:00
Chris Lattner 9115eb3024 More minor microoptimizations
llvm-svn: 18841
2004-12-12 22:44:30 +00:00
Chris Lattner d4298781c1 Remove some more set operations
llvm-svn: 18840
2004-12-12 22:22:18 +00:00
Chris Lattner a538439bf0 Reduce number of set operations.
llvm-svn: 18839
2004-12-12 22:16:13 +00:00
Chris Lattner bf5b7cf638 Optimize div/rem + select combinations more.
In particular, implement div.ll:test10 and rem.ll:test4.

llvm-svn: 18838
2004-12-12 21:48:58 +00:00
Chris Lattner 88deefa303 Simplify code and do not invalidate iterators.
This fixes a crash compiling TimberWolfMC that was exposed due to recent
optimizer changes.

llvm-svn: 18831
2004-12-12 18:23:20 +00:00
Chris Lattner cbc0161d1f If one side of and/or is known to be 0/-1, it doesn't matter
if the other side is overdefined.

This allows us to fold conditions like:  if (X < Y || Y > Z) in some cases.

llvm-svn: 18807
2004-12-11 23:15:19 +00:00
Chris Lattner 2f687fd9d6 Two bug fixes:
1. Actually increment the Statistic for the GV elim optzn
 2. When resolving undef branches, only resolve branches in executable blocks,
    avoiding marking a bunch of completely dead blocks live.  This has a big
    impact on the quality of the generated code.

With this patch, we positively rip up vortex, compiling Ut_MoveBytes to a
single memcpy call. In vortex we get this:

     12 ipsccp           - Number of globals found to be constant
    986 ipsccp           - Number of arguments constant propagated
   1378 ipsccp           - Number of basic blocks unreachable
   8919 ipsccp           - Number of instructions removed

llvm-svn: 18796
2004-12-11 06:05:53 +00:00
Chris Lattner 8525ebe465 Do not delete the entry block to a function.
llvm-svn: 18795
2004-12-11 05:32:19 +00:00
Chris Lattner 91dbae6fee Implement Transforms/SCCP/ipsccp-gvar.ll, by tracking values stored to
non-address-taken global variables.

llvm-svn: 18790
2004-12-11 05:15:59 +00:00
Chris Lattner 99e1295645 Fix a bug where we could delete dead invoke instructions with uses.
In functions where we fully constant prop the return value, replace all
ret instructions with 'ret undef'.

llvm-svn: 18786
2004-12-11 02:53:57 +00:00
Chris Lattner bae4b64553 Implement SCCP/ipsccp-conditional.ll, by totally deleting dead blocks.
llvm-svn: 18781
2004-12-10 22:29:08 +00:00
Chris Lattner 7285f43836 Fix SCCP/2004-12-10-UndefBranchBug.ll
llvm-svn: 18776
2004-12-10 20:41:50 +00:00
Chris Lattner b439464c61 This is the initial implementation of IPSCCP, as requested by Brian.
This implements SCCP/ipsccp-basic.ll, rips apart Olden/mst (as described in
PR415), and does other nice things.

There is still more to come with this, but it's a start.

llvm-svn: 18752
2004-12-10 08:02:06 +00:00
Chris Lattner 36d39cecb4 note to self: Do not check in debugging code!
llvm-svn: 18693
2004-12-09 07:15:52 +00:00
Chris Lattner f17a2fb849 Implement trivial sinking for load instructions. This causes us to sink 567 loads in spec
llvm-svn: 18692
2004-12-09 07:14:34 +00:00
Chris Lattner 39c98bb31c Do extremely simple sinking of instructions when they are only used in a
successor block.  This turns cases like this:

x = a op b
if (c) {
  use x
}

into:

if (c) {
  x = a op b
  use x
}

This triggers 3965 times in spec, and is tested by
Regression/Transforms/InstCombine/sink_instruction.ll

This appears to expose a bug in the X86 backend for 177.mesa, which I'm
looking in to.

llvm-svn: 18677
2004-12-08 23:43:58 +00:00
Alkis Evlogimenos a1291a0679 Fix this regression and remove the XFAIL from this test.
llvm-svn: 18674
2004-12-08 23:10:30 +00:00
Chris Lattner 8f30caf549 Fix Transforms/InstCombine/2004-12-08-RemInfiniteLoop.ll
llvm-svn: 18670
2004-12-08 22:20:34 +00:00
Reid Spencer 9273d480ad For PR387:\
Add doInitialization method to avoid overloaded virtuals

llvm-svn: 18602
2004-12-07 08:11:36 +00:00
Chris Lattner a4c9808603 This pass is moving to lib IPO
llvm-svn: 18439
2004-12-02 21:24:40 +00:00
Chris Lattner 951673a94c This pass is completely broken.
llvm-svn: 18387
2004-11-30 17:09:06 +00:00
Chris Lattner 6e455608e2 Allow hoisting loads of globals and alloca's in conditionals.
llvm-svn: 18363
2004-11-29 21:26:12 +00:00
Reid Spencer 279fa256a2 Fix for PR454:
* Make sure we handle signed to unsigned conversion correctly
* Move this visitSetCondInst case to its own method.

llvm-svn: 18312
2004-11-28 21:31:15 +00:00
Chris Lattner 6ea2888832 Make DSE potentially more aggressive by being more specific about alloca sizes.
llvm-svn: 18309
2004-11-28 20:44:37 +00:00
Chris Lattner 14f3cdc227 Implement Regression/Transforms/InstCombine/getelementptr_cast.ll, which
occurs many times in crafty

llvm-svn: 18273
2004-11-27 17:55:46 +00:00
Chris Lattner b137409926 Provide size information when checking to see if we can LICM a load, this
allows us to hoist more loads in some cases.

llvm-svn: 18265
2004-11-26 21:20:09 +00:00
Chris Lattner 540e5f92b4 Do not count debugger intrinsics in size estimation.
llvm-svn: 18110
2004-11-22 17:23:57 +00:00
Chris Lattner 6d048a0d32 Do not consider debug intrinsics in the size computations for loop unrolling.
Patch contributed by Michael McCracken!

llvm-svn: 18108
2004-11-22 17:18:36 +00:00
Chris Lattner 446948e094 Fix the exposed prototype for the lower packed pass, thanks to
Morten Ofstad.

llvm-svn: 17996
2004-11-19 16:49:34 +00:00
Chris Lattner 953075442d Delete stoppoints that occur for the same source line.
llvm-svn: 17970
2004-11-18 21:41:39 +00:00
Chris Lattner c08ac110df Check in hook that I forgot
llvm-svn: 17956
2004-11-18 17:24:20 +00:00
Chris Lattner 27af257ea0 Do not delete dead invoke instructions!
llvm-svn: 17897
2004-11-16 16:32:28 +00:00
Reid Spencer 9339638e9c Remove unused variable for compilation by VC++.
Patch contributed by Morten Ofstad.

llvm-svn: 17830
2004-11-15 17:29:41 +00:00
Chris Lattner 1890f94413 Minor cleanups. There is no reason for SCCP to derive from instvisitor anymore.
llvm-svn: 17825
2004-11-15 07:15:04 +00:00
Chris Lattner 9a038a3a5e Count more accurately
llvm-svn: 17824
2004-11-15 07:02:42 +00:00
Chris Lattner 97013636cd Quiet warnings on the persephone tester
llvm-svn: 17821
2004-11-15 05:54:07 +00:00
Chris Lattner d18c16b842 Two minor improvements:
1. Speedup getValueState by having it not consider Arguments.  It's better
    to just add them before we start SCCP'ing.
 2. SCCP can delete the contents of dead blocks.  No really, it's ok!  This
    reduces the size of the IR for subsequent passes, even though
    simplifycfg would do the same job.  In practice, simplifycfg does not
    run until much later than sccp in gccas

llvm-svn: 17820
2004-11-15 05:45:33 +00:00
Chris Lattner 4f0316229c rename InstValue to LatticeValue, as it holds for more than instructions.
llvm-svn: 17818
2004-11-15 05:03:30 +00:00
Chris Lattner 074be1f6e4 Substantially refactor the SCCP class into an SCCP pass and an SCCPSolver
class.  The only changes are minor:

 * Do not try to SCCP instructions that return void in the rewrite loop.
   This is silly and fool hardy, wasting a map lookup and adding an entry
   to the map which is never used.
 * If we decide something has an undefined value, rewrite it to undef,
   potentially leading to further simplications.

llvm-svn: 17816
2004-11-15 04:44:20 +00:00
Chris Lattner 46dd5a6304 This optimization makes MANY phi nodes that all have the same incoming value.
If this happens, detect it early instead of relying on instcombine to notice
it later.  This can be a big speedup, because PHI nodes can have many
incoming values.

llvm-svn: 17741
2004-11-14 19:29:34 +00:00
Chris Lattner 7515cabe2a Implement instcombine/phi.ll:test6 - pulling operations through PHI nodes.
This exposes subsequent optimization possiblities and reduces code size.
This triggers 1423 times in spec.

llvm-svn: 17740
2004-11-14 19:13:23 +00:00
Chris Lattner 15ff1e1885 Transform this:
%X = alloca ...
  %Y = alloca ...
    X == Y

into false.  This allows us to simplify some stuff in eon (and probably
many other C++ programs) where operator= was checking for self assignment.
Folding this allows us to SROA several additional structs.

llvm-svn: 17735
2004-11-14 07:33:16 +00:00
Chris Lattner fe3f4e6ebd Teach SROA how to promote an array index that is variable, if the dimension
of the array is just two.  This occurs 8 times in gcc, 6 times in crafty, and
12 times in 099.go.

This implements ScalarRepl/sroa_two.ll

llvm-svn: 17727
2004-11-14 05:00:19 +00:00
Chris Lattner 8881912d71 Rearrange some code, no functionality changes.
llvm-svn: 17724
2004-11-14 04:24:28 +00:00
Chris Lattner 8c3e7b92af Simplify handling of shifts to be the same as we do for adds. Add support
for (X * C1) + (X * C2) (where * can be mul or shl), allowing us to fold:

   Y+Y+Y+Y+Y+Y+Y+Y

into
         %tmp.8 = shl long %Y, ubyte 3           ; <long> [#uses=1]

instead of

        %tmp.4 = shl long %Y, ubyte 2           ; <long> [#uses=1]
        %tmp.12 = shl long %Y, ubyte 2          ; <long> [#uses=1]
        %tmp.8 = add long %tmp.4, %tmp.12               ; <long> [#uses=1]

This implements add.ll:test25

Also add support for (X*C1)-(X*C2) -> X*(C1-C2), implementing sub.ll:test18

llvm-svn: 17704
2004-11-13 19:50:12 +00:00
Chris Lattner 4efe20a103 Fold:
(X + (X << C2)) --> X * ((1 << C2) + 1)
   ((X << C2) + X) --> X * ((1 << C2) + 1)

This means that we now canonicalize "Y+Y+Y" into:

        %tmp.2 = mul long %Y, 3         ; <long> [#uses=1]

instead of:

        %tmp.10 = shl long %Y, ubyte 1          ; <long> [#uses=1]
        %tmp.6 = add long %Y, %tmp.10               ; <long> [#uses=1]

llvm-svn: 17701
2004-11-13 19:31:40 +00:00
Chris Lattner 2858e17538 Lazily create the abort message, so only translation units that use unwind
will actually get it.

llvm-svn: 17700
2004-11-13 19:07:32 +00:00
Chris Lattner 5c1d84c769 Simplify handling of constant initializers
llvm-svn: 17696
2004-11-12 22:42:57 +00:00
Chris Lattner 595016d090 This is V9 specific, move it there.
llvm-svn: 17545
2004-11-07 00:39:26 +00:00
Chris Lattner 33eb909939 Fix some warnings on VC++
llvm-svn: 17481
2004-11-05 04:45:43 +00:00
Chris Lattner 96f6616479 * Rearrange code slightly
* Disable broken transforms for simplifying (setcc (cast X to larger), CI)
  where CC is not != or ==

llvm-svn: 17422
2004-11-02 03:50:32 +00:00
Chris Lattner 8af7424920 Speed up the tail duplication pass on the testcase below from 68.2s to 1.23s:
#define CL0(a) case a: f(); goto c;
 #define CL1(a) CL0(a##0) CL0(a##1) CL0(a##2) CL0(a##3) CL0(a##4) CL0(a##5) \
 CL0(a##6) CL0(a##7) CL0(a##8) CL0(a##9)
 #define CL2(a) CL1(a##0) CL1(a##1) CL1(a##2) CL1(a##3) CL1(a##4) CL1(a##5) \
 CL1(a##6) CL1(a##7) CL1(a##8) CL1(a##9)
 #define CL3(a) CL2(a##0) CL2(a##1) CL2(a##2) CL2(a##3) CL2(a##4) CL2(a##5) \
 CL2(a##6) CL2(a##7) CL2(a##8) CL2(a##9)
 #define CL4(a) CL3(a##0) CL3(a##1) CL3(a##2) CL3(a##3) CL3(a##4) CL3(a##5) \
 CL3(a##6) CL3(a##7) CL3(a##8) CL3(a##9)

 void f();

 void a() {
     int b;
  c: switch (b) {
         CL4(1)
     }
 }

This comes from GCC PR 15524

llvm-svn: 17390
2004-11-01 07:05:07 +00:00
Reid Spencer 57cbe39d1e Change Library Names Not To Conflict With Others When Installed
llvm-svn: 17286
2004-10-27 23:18:45 +00:00
Chris Lattner 7dfc2d29ac Convert 'struct' to 'class' in various places to adhere to the coding standards
and work better with VC++.  Patch contributed by Morten Ofstad!

llvm-svn: 17281
2004-10-27 16:14:51 +00:00
Chris Lattner 70c2039b39 Hrm, this code was severely botched. As it turns out, this patch:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20041018/019708.html

exposed ANOTHER latent bug in this xform, which caused Prolangs-C/bison to fill
the zion nightly tester disk up and make the tester barf.

This is obviously not a good thing, so lets fix this bug shall we? :)

llvm-svn: 17276
2004-10-27 05:57:15 +00:00
Chris Lattner 845afe9b20 Initialize with the correct constant type
llvm-svn: 17270
2004-10-27 03:55:24 +00:00
Chris Lattner d57638c4a7 Fix compatibility with MSVC, patch by Morten Ofstad
llvm-svn: 17218
2004-10-25 18:45:16 +00:00
Chris Lattner 5c3c21e10a Fix a bug Nate noticed, where we miscompiled a simple testcase
llvm-svn: 17157
2004-10-22 04:53:16 +00:00
Reid Spencer c1c320c335 We won't use automake
llvm-svn: 17155
2004-10-22 03:35:04 +00:00
Chris Lattner 257b284038 Hrm, some people complain when the compiler cheerfully tells them what it's
doing... I guess they're right.

llvm-svn: 17142
2004-10-19 06:33:16 +00:00
Reid Spencer 6a11a75f31 Initial automake generated Makefile template
llvm-svn: 17136
2004-10-18 23:55:41 +00:00
Nate Begeman b18121e6a9 Initial implementation of the strength reduction for GEP instructions in
loops.  This optimization is not turned on by default yet, but may be run
with the opt tool's -loop-reduce flag.  There are many FIXMEs listed in the
code that will make it far more applicable to a wide range of code, but you
have to start somewhere :)

This limited version currently triggers on the following tests in the
MultiSource directory:
pcompress2: 7 times
cfrac: 5 times
anagram: 2 times
ks: 6 times
yacr2: 2 times

llvm-svn: 17134
2004-10-18 21:08:22 +00:00
Reid Spencer ce0783318b Correction to allow compilation with Visual C++.
Patch contributed by Morten Ofstad. Thanks Morten!

llvm-svn: 17123
2004-10-18 14:38:48 +00:00
Chris Lattner a67dd32004 Turn store -> null/undef into the LLVM unreachable instruction! This simple
change hacks off 10K of bytecode from perlbmk (.5%) even though the front-end
is not generating them yet and we are not optimizing the resultant code.
This isn't too bad.

llvm-svn: 17111
2004-10-18 03:00:50 +00:00
Chris Lattner 8ba9ec9bbb Turn things with obviously undefined semantics into 'store -> null'
llvm-svn: 17110
2004-10-18 02:59:09 +00:00
Chris Lattner 3b92f17165 My friend the invoke instruction does not dominate all basic blocks if it
occurs in the entry node of a function

llvm-svn: 17109
2004-10-18 01:48:31 +00:00
Chris Lattner 6a792feb02 Getting ADCE to interact well with unreachable instructions seems like a nontrivial
exercise that I'm not interested in tackling right now.  Just punt and treat them
like unwind's.

This 'fixes' test/Regression/Transforms/ADCE/unreachable-function.ll

llvm-svn: 17106
2004-10-17 23:45:06 +00:00
Chris Lattner 107c15c33d Remove printout, realize that instructions in the entry block dominate all
other blocks.

llvm-svn: 17099
2004-10-17 21:31:34 +00:00
Chris Lattner e29d634a94 hasConstantValue will soon return instructions that don't dominate the PHI node,
so prepare for this.

llvm-svn: 17095
2004-10-17 21:22:38 +00:00
Chris Lattner 67f0545daf Fix a type violation
llvm-svn: 17069
2004-10-16 23:28:04 +00:00
Chris Lattner 684c5c6587 Kill the bogon that slipped into my buffer before I committed.
llvm-svn: 17067
2004-10-16 19:46:33 +00:00
Chris Lattner 6580e09fef Implement InstCombine/getelementptr.ll:test9, which is the source of many
ugly and giant constnat exprs in some programs.

llvm-svn: 17066
2004-10-16 19:44:59 +00:00
Chris Lattner 81a7a23494 Optimize instructions involving undef values. For example X+undef == undef.
llvm-svn: 17047
2004-10-16 18:11:37 +00:00
Chris Lattner 646354bae1 Handle undef values as undefined on the constant lattice
ignore unreachable instructions

llvm-svn: 17044
2004-10-16 18:09:41 +00:00
Chris Lattner 6ac3ef950d Add note
llvm-svn: 17043
2004-10-16 18:09:25 +00:00
Reid Spencer ace94df71f Update to reflect changes in Makefile rules.
llvm-svn: 16950
2004-10-13 11:46:52 +00:00
Chris Lattner 00648e1f86 Transform memmove -> memcpy when the source is obviously constant memory.
llvm-svn: 16932
2004-10-12 04:52:52 +00:00
Chris Lattner 7cabf6f87a Fix a REALLY obscure bug in my previous checkin, which was splicing the END
marker from one ilist into the middle of another basic block!

llvm-svn: 16925
2004-10-12 01:02:29 +00:00
Chris Lattner 9776f7259b Handle a common case more carefully. In particular, instead of transforming
pointer recurrences into expressions from this:

  %P_addr.0.i.0 = phi sbyte* [ getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), %entry ], [ %inc.0.i, %no_exit.i ]
  %inc.0.i = getelementptr sbyte* %P_addr.0.i.0, int 1            ; <sbyte*> [#uses=2]

into this:

  %inc.0.i = getelementptr sbyte* getelementptr ([8 x sbyte]* %.str_1, int 0, int 0), int %inc.0.i.rec

Actually create something nice, like this:

  %inc.0.i = getelementptr [8 x sbyte]* %.str_1, int 0, int %inc.0.i.rec

llvm-svn: 16924
2004-10-11 23:06:50 +00:00
Chris Lattner a92af96c56 Reenable the transform, turning X/-10 < 1 into X > -10
llvm-svn: 16918
2004-10-11 19:40:04 +00:00
Reid Spencer 97327f05fc Initial version of automake Makefile.am file.
llvm-svn: 16893
2004-10-10 22:20:40 +00:00
Chris Lattner 5c91c8f18b Use DEBUG instead of DebugFlag directly, as DebugFlag does not respect
-debug-only!

llvm-svn: 16868
2004-10-09 19:30:36 +00:00
Chris Lattner 4ad08352b4 Implement sub.ll:test17, -X/C -> X/-C
llvm-svn: 16863
2004-10-09 02:50:40 +00:00
Chris Lattner 0b41e861b6 Temporarily disable a buggy transformation until it can be fixed. This fixes
254.gap.

llvm-svn: 16853
2004-10-08 19:15:44 +00:00
Chris Lattner bff91d9a2e Instcombine (X & FF00) + xx00 -> (X+xx00) & FF00, implementing and.ll:test27
This comes up when doing adds to bitfield elements.

llvm-svn: 16836
2004-10-08 05:07:56 +00:00
Chris Lattner 44bd392cbf Little patch to turn (shl (add X, 123), 4) -> (add (shl X, 4), 123 << 4)
This triggers in cases of bitfield additions, opening opportunities for
future improvements.

llvm-svn: 16834
2004-10-08 03:46:20 +00:00
Chris Lattner 0aee4b7947 Instcombine: -(X sdiv C) -> (X sdiv -C), tested by sub.ll:test16
llvm-svn: 16769
2004-10-06 15:08:25 +00:00
Chris Lattner 2ce32df8b0 Reduce code growth implied by the tail duplication pass by not duplicating
an instruction if it can be hoisted to a common dominator of the block.
This implements: test/Regression/Transforms/TailDup/MergeTest.ll

llvm-svn: 16758
2004-10-06 03:27:37 +00:00
Chris Lattner abae776b18 Hrm, debugging printouts do not need to be in here
llvm-svn: 16598
2004-09-29 21:21:14 +00:00
Chris Lattner 6862fbd2cf * Pull range optimization code out into new InsertRangeTest function.
* SubOne/AddOne functions always return ConstantInt, declare them as such
* Pull code for handling setcc X, cst, where cst is at the end of the range,
  or cc is LE or GE up earlier in visitSetCondInst.  This reduces #iterations
  in some cases.
* Fold: (div X, C1) op C2 -> range check, implementing div.ll:test6 - test9.

llvm-svn: 16588
2004-09-29 17:40:11 +00:00
Chris Lattner 6a4adcda4c Fold binary expressions and casts into PHI nodes that have all constant inputs.
This takes something like this:

%A = phi int [ 3, %cond_false.0 ], [ 2, %endif.0.i ], [ 2, %endif.1.i ]
%B = div int %tmp.243, 4

and turns it into:

%A = phi int [ 3/4, %cond_false.0 ], [ 2/4, %endif.0.i ], [ 2/4, %endif.1.i ]

which is later simplified (in this case) into %A = 0.

This triggers thousands of times in spec, for example, 269 times in 176.gcc.

This is tested by InstCombine/add.ll:test23 and set.ll:test18.

llvm-svn: 16582
2004-09-29 05:07:12 +00:00
Chris Lattner c949128b2f Hrm, really, all tests passed without this, but it is scary to think how...
llvm-svn: 16568
2004-09-29 03:16:24 +00:00
Chris Lattner be7a69ebd8 Remove debugging printout
Instcombine (setcc (truncate X), C1).

This occurs THOUSANDS of times in many benchmarks.  Particularlly common
seem to be things like (seteq (cast bool X to int), int 0)

This turns it into (seteq bool %X, false), which then becomes (not %X).

llvm-svn: 16567
2004-09-29 03:09:18 +00:00
Chris Lattner dcf756ec22 Fold (X setcc C1) | (X setcc C2)
This implements or.ll:test1[89]

llvm-svn: 16561
2004-09-28 22:33:08 +00:00
Chris Lattner 623826c888 Fold (and (setcc X, C1), (setcc X, C2))
This is important for several reasons:

1. Benchmarks have lots of code that looks like this (perlbmk in particular):

  %tmp.2.i = setne int %tmp.0.i, 128              ; <bool> [#uses=1]
  %tmp.6343 = seteq int %tmp.0.i, 1               ; <bool> [#uses=1]
  %tmp.63 = and bool %tmp.2.i, %tmp.6343          ; <bool> [#uses=1]

   we now fold away the setne, a clear improvement.

2. In the more important cases, such as (X >= 10) & (X < 20), we now produce
   smaller code: (X-10) < 10.

3. Perhaps the nicest effect of this patch is that it really helps out the
   code generators.  In particular, for a 'range test' like the above,
   instead of generating this on X86 (the difference on PPC is even more
   pronounced):

        cmp %EAX, 50
        setge %CL
        cmp %EAX, 100
        setl %AL
        and %CL, %AL
        cmp %CL, 0

   we now generate this:

        add %EAX, -50
        cmp %EAX, 50

   Furthermore, this causes setcc's to be folded into branches more often.

These combinations trigger dozens of times in the spec benchmarks, particularly
in 176.gcc, 186.crafty, 253.perlbmk, 254.gap, & 099.go.

llvm-svn: 16559
2004-09-28 21:48:02 +00:00
Chris Lattner 272d5ca9e0 Implement X / C1 / C2 folding
Implement (setcc (shl X, C1), C2) folding.

The second one occurs several dozen times in spec.  The first was added
just in case.  :)

These are tested by shift.ll:test2[12], and div.ll:test5

llvm-svn: 16549
2004-09-28 18:22:15 +00:00
Chris Lattner 6afc02f816 shl is always zero extending, so always use a zero extending shift right.
This latent bug was exposed by recent changes, and is tested as:
llvm/test/Regression/Transforms/InstCombine/2004-09-28-BadShiftAndSetCC.llx

llvm-svn: 16546
2004-09-28 17:54:07 +00:00
Alkis Evlogimenos 3ce42ec7ee Pull assignment out of for loop conditional in order for this to
compile under windows. Patch contributed by Paolo Invernizzi!

llvm-svn: 16534
2004-09-28 02:40:37 +00:00
Chris Lattner bfff18a869 Fix two bugs: one where a condition was mistakenly swapped, and another
where we folded (X & 254) -> X < 1 instead of X < 2.  These problems were
latent problems exposed by the latest patch.

llvm-svn: 16528
2004-09-27 19:29:18 +00:00
Chris Lattner 1023b8726e Fold: (setcc (shr X, ShAmt), CI), where 'cc' is eq or ne. This xform
triggers often, for example:

6x in povray, 1x in gzip, 279x in gcc, 1x in crafty, 8x in eon, 11x in perlbmk,
362x in gap, 4x in vortex, 14 in m88ksim, 211x in 126.gcc, 1x in compress,
11x in ijpeg, and 4x in 147.vortex.

llvm-svn: 16521
2004-09-27 16:18:50 +00:00
Chris Lattner 7e794273f5 Implement shift-and combinations, implementing InstCombine/and.ll:test19-21
These combinations trigger 4 times in povray, 7x in gcc, 4x in gap, and 2x in bzip2.

llvm-svn: 16508
2004-09-24 15:21:34 +00:00
Chris Lattner e1b4d2a470 Move LHSI->hasOneUse() into the arms of the conditional, reindenting code.
No functionality changes here.

llvm-svn: 16505
2004-09-23 21:52:49 +00:00
Chris Lattner 8fc5af4da9 Implement Transforms/InstCombine/and.ll:test18, a case that occurs 20 times
in perlbmk

llvm-svn: 16504
2004-09-23 21:46:38 +00:00
Chris Lattner bdcf41a8a2 Implement select.ll:test16: fold load (select C, X, null) -> load X
llvm-svn: 16499
2004-09-23 15:46:00 +00:00
Chris Lattner b121ae1cec Do not fold (X + C1 != C2) if there are other users of the add. Doing
this transformation used to take a loop like this:

int Array[1000];
void test(int X) {
  int i;
  for (i = 0; i < 1000; ++i)
    Array[i] += X;
}

Compiled to LLVM is:

no_exit:                ; preds = %entry, %no_exit
        %indvar = phi uint [ 0, %entry ], [ %indvar.next, %no_exit ]            ; <uint> [#uses=2]
        %tmp.4 = getelementptr [1000 x int]* %Array, int 0, uint %indvar                ; <int*> [#uses=2]
        %tmp.7 = load int* %tmp.4               ; <int> [#uses=1]
        %tmp.9 = add int %tmp.7, %X             ; <int> [#uses=1]
        store int %tmp.9, int* %tmp.4
***     %indvar.next = add uint %indvar, 1              ; <uint> [#uses=2]
***     %exitcond = seteq uint %indvar.next, 1000               ; <bool> [#uses=1]
        br bool %exitcond, label %return, label %no_exit

and turn it into a loop like this:

no_exit:                ; preds = %entry, %no_exit
        %indvar = phi uint [ 0, %entry ], [ %indvar.next, %no_exit ]            ; <uint> [#uses=3]
        %tmp.4 = getelementptr [1000 x int]* %Array, int 0, uint %indvar                ; <int*> [#uses=2]
        %tmp.7 = load int* %tmp.4               ; <int> [#uses=1]
        %tmp.9 = add int %tmp.7, %X             ; <int> [#uses=1]
        store int %tmp.9, int* %tmp.4
***     %indvar.next = add uint %indvar, 1              ; <uint> [#uses=1]
***     %exitcond = seteq uint %indvar, 999             ; <bool> [#uses=1]
        br bool %exitcond, label %return, label %no_exit

Note that indvar.next and indvar can no longer be coallesced.  In machine
code terms, this patch changes this code:

.LBBtest_1:     # no_exit
        mov %EDX, OFFSET Array
        mov %ESI, %EAX
        add %ESI, DWORD PTR [%EDX + 4*%ECX]
        mov %EDX, OFFSET Array
        mov DWORD PTR [%EDX + 4*%ECX], %ESI
        mov %EDX, %ECX
        inc %EDX
        cmp %ECX, 999
        mov %ECX, %EDX
        jne .LBBtest_1  # no_exit

into this:

.LBBtest_1:     # no_exit
        mov %EDX, OFFSET Array
        mov %ESI, %EAX
        add %ESI, DWORD PTR [%EDX + 4*%ECX]
        mov %EDX, OFFSET Array
        mov DWORD PTR [%EDX + 4*%ECX], %ESI
        inc %ECX
        cmp %ECX, 1000
        jne .LBBtest_1  # no_exit

We need better instruction selection to get this:

.LBBtest_1:     # no_exit
        add DWORD PTR [Array + 4*%ECX], EAX
        inc %ECX
        cmp %ECX, 1000
        jne .LBBtest_1  # no_exit

... but at least there is less register juggling

llvm-svn: 16473
2004-09-21 21:35:23 +00:00
Chris Lattner 42618551d5 Fix potential miscompilations: InstCombine/2004-09-20-BadLoadCombine*.llx
llvm-svn: 16447
2004-09-20 10:15:10 +00:00
Alkis Evlogimenos d59cebf87a Fix loop condition so that we don't decrement off the beginning of the
list.

llvm-svn: 16440
2004-09-20 06:42:58 +00:00
Chris Lattner 3e86084641 Prototype these functions more accurately
llvm-svn: 16432
2004-09-20 04:43:15 +00:00
Chris Lattner e6f13093e6 Make isSafeToLoadUnconditionally a bit smarter, implementing PR362 and
Regression/Transforms/InstCombine/CPP_min_max.llx

llvm-svn: 16409
2004-09-19 19:18:10 +00:00
Chris Lattner f62ea8ef4b Make instruction combining a bit more aggressive in the face of volatile
loads, and implement two new transforms: InstCombine/load.ll:test[56].

llvm-svn: 16404
2004-09-19 18:43:46 +00:00
Reid Spencer 6614946443 Convert code to compile with vc7.1.
Patch contributed by Paolo Invernizzi. Thanks Paolo!

llvm-svn: 16368
2004-09-15 17:06:42 +00:00
Chris Lattner f11216d24f Fix a bug in the previous checkin that broke 255.vortex
llvm-svn: 16355
2004-09-15 02:34:40 +00:00
Chris Lattner a346578d92 Make sure to update alias analysis information as we transform the function.
This fixes PR420 and Regression/Transforms/LICM/2004-09-14-AliasAnalysisInvalidate.llx

llvm-svn: 16348
2004-09-15 01:04:07 +00:00
Chris Lattner f41b80a05f Remove a long-dead pass. Actually, this pass was never used at all.
llvm-svn: 16337
2004-09-14 16:33:01 +00:00
Alkis Evlogimenos a5c04ee50f Fixes to make LLVM compile with vc7.1.
Patch contributed by Paolo Invernizzi!

llvm-svn: 16152
2004-09-03 18:19:51 +00:00
Reid Spencer 7c16caa336 Changes For Bug 352
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.

llvm-svn: 16137
2004-09-01 22:55:40 +00:00
Reid Spencer f39f66e3ef Initial checkin of a pass to lower packed operations to scalars operations.
This also registers the pass with opt with a -lower-packed command line
option.

Patch contributed by Brad Jones.

llvm-svn: 15987
2004-08-21 21:39:24 +00:00
Chris Lattner 4456da6a4c Fix InstCombine/2004-08-10-BoolSetCC.ll, a bug that is miscompiling
176.gcc.  Note that this is apparently not the only bug miscompiling gcc
though. :(

llvm-svn: 15639
2004-08-11 00:50:51 +00:00
Chris Lattner 8e7260652b Fix InstCombine/2004-08-09-RemInfLoop.llx
This should go into the 1.3 branch

llvm-svn: 15593
2004-08-09 21:05:48 +00:00
Alkis Evlogimenos 832437255d Stop using getValues().
llvm-svn: 15487
2004-08-04 08:44:43 +00:00
Chris Lattner 7aa2d4747a Fix a regression in InstCombine/xor.ll
llvm-svn: 15410
2004-08-01 19:42:59 +00:00
Misha Brukman 9c003d8f65 Fix De Morgan's name.
llvm-svn: 15343
2004-07-30 12:50:08 +00:00
Chris Lattner d4252a7c64 Start using the PatternMatcher a bit.
llvm-svn: 15342
2004-07-30 07:50:03 +00:00
Misha Brukman 63b38bd2ed Fix #includes of i*.h => Instructions.h as per PR403.
llvm-svn: 15334
2004-07-29 17:30:56 +00:00
Misha Brukman 2b3387a6d9 Fix #includes of i*.h => Instructions.h as per PR403.
llvm-svn: 15328
2004-07-29 17:05:13 +00:00
Robert Bocchino 7b5b86cd0f This change fixed a bug in the function visitMul. The prior version
assumed that a constant on the RHS of a multiplication was either an
IntConstant or an FPConstant.  It checked for an IntConstant and then,
if it did not find one, did a hard cast to an FPConstant.  That code
would crash if the RHS were a ConstantExpr that was neither an
IntConstant nor an FPConstant.  This version replaces the hard cast
with a dyn_cast.  It performs the same way for IntConstants and
FPConstants but does nothing, instead of crashing, for constant
expressions.

The regression test for this change is 2004-07-27-ConstantExprMul.ll.

llvm-svn: 15291
2004-07-27 21:02:21 +00:00
Brian Gaeke 38b79e8fbc Make the create...() functions for some of these passes return a FunctionPass *.
llvm-svn: 15276
2004-07-27 17:43:21 +00:00
Chris Lattner 50eb771d37 Fix hoisting of void typed values, e.g. calls
llvm-svn: 15263
2004-07-27 07:38:32 +00:00
Chris Lattner f29807169a Implement DeadStoreElim/alloca.llx by observing that allocas are dead at the
end of the function (either return or unwind)

llvm-svn: 15232
2004-07-26 06:14:11 +00:00
Chris Lattner e5ad26dbb3 Throttle back indvar substitution from creating multiplies in loops. This is bad bad bad.
llvm-svn: 15227
2004-07-26 02:47:12 +00:00
Chris Lattner 7b25bcdf52 * Substantially simplify how free instructions are handled (potentially fixing
a bug in DSE).
* Delete dead operand uses iteratively instead of recursively, using a
  SetVector.
* Defer deletion of dead operand uses until the end of processing, which means
  we don't have to bother with updating the AliasSetTracker.  This speeds up
  DSE substantially.

llvm-svn: 15204
2004-07-25 11:09:56 +00:00
Chris Lattner 4c1c1ac7e4 Free instructions kill values too. This implements DeadStoreElim/free.llx
llvm-svn: 15199
2004-07-25 07:58:38 +00:00
Chris Lattner bad6478b00 obvious fix
llvm-svn: 15162
2004-07-24 07:51:27 +00:00
Chris Lattner 3844c300de This is a trivial dead store elimination pass. It very very simple and
can be improved in many ways.  But: stop laughing, even with -basicaa it
deletes 15% of the stores in 252.eon :)

llvm-svn: 15101
2004-07-22 08:00:28 +00:00
Chris Lattner 51f7c9e56d Update GC intrinsics to take a pointer to the object as well as a pointer
to the field being updated.  Patch contributed by Tobias Nurmiranta

llvm-svn: 15097
2004-07-22 05:51:13 +00:00
Chris Lattner d8f5e2ccac * Further cleanup.
* Test for whether bits are shifted out during the optzn.

If so, the fold is illegal, though it can be handled explicitly for setne/seteq

This fixes the miscompilation of 254.gap last night, which was a latent bug
exposed by other optimizer improvements.

llvm-svn: 15085
2004-07-21 20:14:10 +00:00
Chris Lattner 1638de4499 Make cast-cast code a bit more defensive
"simplify" a bit of code for comparison/and folding

llvm-svn: 15082
2004-07-21 19:50:44 +00:00
Chris Lattner 4fbad968f8 Remove special casing of pointers and treat them generically as integers of
the appopriate size.  This gives us the ability to eliminate int -> ptr -> int

llvm-svn: 15063
2004-07-21 04:27:24 +00:00
Chris Lattner 11ffd59e37 Implement Transforms/InstCombine/IntPtrCast.ll
llvm-svn: 15029
2004-07-20 05:21:00 +00:00
Chris Lattner 44d0b9502a Implement InstCombine/GEPIdxCanon.ll
llvm-svn: 15024
2004-07-20 01:48:15 +00:00
Chris Lattner 4e2dbc6b4a Rewrite cast->cast elimination code completely based on the information we
actually care about.  Someday when the cast instruction is gone, we can do
better here, but this will do for now.  This implements
instcombine/cast.ll:test17/18 as well.

llvm-svn: 15018
2004-07-20 00:59:32 +00:00
Chris Lattner f3edc49ae2 Minor cleanup, no functionality change
llvm-svn: 14972
2004-07-18 18:59:44 +00:00
Reid Spencer f0a5bcaae4 Delete a redundant if branch.
llvm-svn: 14967
2004-07-18 08:34:52 +00:00
Reid Spencer c44cb6bd9f Expand the coercion of constants to include the newly constant Globals.
llvm-svn: 14966
2004-07-18 08:34:19 +00:00
Reid Spencer 539429d9b5 Delete a no-op loop.
llvm-svn: 14965
2004-07-18 08:32:43 +00:00
Reid Spencer 6c2b627e23 Expand the scope to include global values because they are now constants
too.

llvm-svn: 14964
2004-07-18 08:32:10 +00:00
Reid Spencer cb3fb5d4f5 bug 122:
- Replace ConstantPointerRef usage with GlobalValue usage

llvm-svn: 14953
2004-07-18 00:44:37 +00:00
Reid Spencer 874368790f bug 122:
- Replace ConstantPointerRef usage with GlobalValue usage
- Minimize redundant isa<GlobalValue> usage
- Correct isa<Constant> for GlobalValue subclass

llvm-svn: 14950
2004-07-18 00:38:32 +00:00
Reid Spencer ef784f01dd bug 122:
- Minimize redundant isa<GlobalValue> usage

llvm-svn: 14948
2004-07-18 00:32:14 +00:00
Reid Spencer c5afc9512b bug 122:
- Replace ConstantPointerRef usage with GlobalValue usage
- Correct isa<Constant> for GlobalValue subclass

llvm-svn: 14947
2004-07-18 00:31:05 +00:00
Reid Spencer 9e855c6832 bug 122:
- Minimize redundant isa<GlobalValue> usage
- Correct isa<Constant> for GlobalValue subclass

llvm-svn: 14946
2004-07-18 00:29:57 +00:00
Chris Lattner d79334df33 This patch was contributed by Daniel Berlin!
Speed up SCCP substantially by processing overdefined values quickly.  This
patch speeds up SCCP by about 30-40% on large testcases.

llvm-svn: 14861
2004-07-15 23:36:43 +00:00
Chris Lattner f2c018c0c1 Fix PR404 try #2
This version takes about 1s longer than the previous one (down to 2.35s),
but on the positive side, it actually works :)

llvm-svn: 14856
2004-07-15 08:20:22 +00:00
Chris Lattner daa12135da Revert previous patch until I get a bug fixed
llvm-svn: 14853
2004-07-15 05:36:31 +00:00
Chris Lattner 70177e402d Fix PR404: Loop simplify is really slow on 252.eon
This eliminates an N*N*logN algorithm from the loop simplify pass, replacing
it with a much simpler and faster alternative.  In a debug build, this reduces
gccas time on eon from 85s to 42s.

llvm-svn: 14851
2004-07-15 04:27:04 +00:00
Chris Lattner 9a63520b1a Fixes working towards PR341
llvm-svn: 14839
2004-07-15 01:50:47 +00:00
Chris Lattner ba7aef39fd Now that we codegen the portable "sizeof" efficiently, we can use it for
malloc lowering.  This means that lowerallocations doesn't need targetdata
anymore.  yaay.

llvm-svn: 14835
2004-07-15 01:08:08 +00:00
Chris Lattner 35e24774eb Factor some code to handle "load (constantexpr cast foo)" just like
"load (cast foo)".  This allows us to compile C++ code like this:

class Bclass {
  public: virtual int operator()() { return 666; }
};

class Dclass: public Bclass {
  public: virtual int operator()() { return 667; }
} ;

int main(int argc, char** argv) {
  Dclass x;
  return x();
}

Into this:

int %main(int %argc, sbyte** %argv) {
entry:
        call void %__main( )
        ret int 667
}

Instead of this:

int %main(int %argc, sbyte** %argv) {
entry:
        %x = alloca "struct.std::bad_typeid"            ; <"struct.std::bad_typeid"*> [#uses=3]
        call void %__main( )
        %tmp.1.i.i = getelementptr "struct.std::bad_typeid"* %x, uint 0, uint 0, uint 0         ; <int (...)***> [#uses=1]
        store int (...)** getelementptr ([3 x int (...)*]*  %vtable for Bclass, int 0, long 2), int (...)*** %tmp.1.i.i
        %tmp.3.i = getelementptr "struct.std::bad_typeid"* %x, int 0, uint 0, uint 0            ; <int (...)***> [#uses=1]
        store int (...)** getelementptr ([3 x int (...)*]*  %vtable for Dclass, int 0, long 2), int (...)*** %tmp.3.i
        %tmp.5 = load int ("struct.std::bad_typeid"*)** cast (int (...)** getelementptr ([3 x int (...)*]*  %vtable for Dclass, int 0, long 2) to int
("struct.std::bad_typeid"*)**)          ; <int ("struct.std::bad_typeid"*)*> [#uses=1]
        %tmp.6 = call int %tmp.5( "struct.std::bad_typeid"* %x )                ; <int> [#uses=1]
	ret int %tmp.6
        ret int 0
}

In order words, we now resolve the virtual function call.

llvm-svn: 14783
2004-07-13 01:49:43 +00:00
Chris Lattner 9eb9ccd9f6 Check to make sure types are sized before calling getTypeSize on them.
llvm-svn: 14649
2004-07-06 19:28:42 +00:00
Brian Gaeke a501be556f It doesn't matter what the 2nd operand is; if the GEP has 2 operands and
the first is a zero, we should leave it alone.

llvm-svn: 14648
2004-07-06 19:24:47 +00:00
Brian Gaeke 0e0fe8a2e9 Add helper function.
Don't touch GEPs for which DecomposeArrayRef is not going to do anything
special (e.g., < 2 indices, or 2 indices and the last one is a constant.)

llvm-svn: 14647
2004-07-06 18:15:39 +00:00
Chris Lattner 23b47b6af9 Implement rem.ll:test3
llvm-svn: 14640
2004-07-06 07:38:18 +00:00
Chris Lattner 98c6bdf251 Fix a minor bug where we would go into infinite loops on some constants
llvm-svn: 14638
2004-07-06 07:11:42 +00:00
Chris Lattner 7fd5f0745a Implement InstCombine/sub.ll:test15: X % -Y === X % Y
Also, remove X % -1 = 0, because it's not true for unsigneds, and the
signed case is superceeded by this new handling.

llvm-svn: 14637
2004-07-06 07:01:22 +00:00
Reid Spencer eb04d9bcb4 Add #include <iostream> since Value.h does not #include it any more.
llvm-svn: 14622
2004-07-04 12:19:56 +00:00
Chris Lattner 4c9c20af28 Implement add.ll:test22, a common case in MSIL files
llvm-svn: 14587
2004-07-03 00:26:11 +00:00
Chris Lattner 49df6cefa5 Do not call getTypeSize on a type that has no size
llvm-svn: 14584
2004-07-02 22:55:47 +00:00
Brian Gaeke e1a136fb4b Get rid of a dead variable, and fix a typo in a comment.
llvm-svn: 14560
2004-07-02 05:30:01 +00:00
Brian Gaeke 163c87fc32 Make this pass use a more specific debug message than "Processing:".
llvm-svn: 14541
2004-07-01 19:27:10 +00:00
Chris Lattner 6e07936ed2 Implement InstCombine/add.ll:test21
llvm-svn: 14443
2004-06-27 22:51:36 +00:00
Chris Lattner 7f4222237d New constant expression lowering pass to simplify your instruction selection needs.
Contributed by Vladimir Prus!

llvm-svn: 14399
2004-06-25 07:48:09 +00:00
Chris Lattner 7a002d6010 Two fixes. First, stop using the ugly shouldSubstituteIndVar method.
Second, disable substitution of quadratic addrec expressions to avoid putting
multiplies in loops!

llvm-svn: 14358
2004-06-24 06:49:18 +00:00
Chris Lattner c9e06336ab Make use of BinaryOperator::create* methods to shrinkify code.
llvm-svn: 14262
2004-06-20 05:04:01 +00:00
Chris Lattner 42ad646104 Now that dominator tree children are built in determinstic order, this horrible code
can go away

llvm-svn: 14254
2004-06-19 20:23:35 +00:00
Chris Lattner 4027500e1c Fix a nasty bug, noticed by Reid
llvm-svn: 14249
2004-06-19 18:15:50 +00:00
Chris Lattner ec2d34cc19 Fix one source of nondeterminism in the -licm pass: the hoist pass
was processing blocks in whatever order they happened to end up in the
dominator tree data structure.  Force an ordering.

llvm-svn: 14248
2004-06-19 08:56:43 +00:00
Chris Lattner b5f8eb8315 Do not loop over uses as we delete them. This causes iterators to be
invalidated out from under us.  This bug goes back to revision 1.1: scary.

llvm-svn: 14242
2004-06-19 02:02:22 +00:00
Chris Lattner 023a483c76 Implement Transforms/InstCombine/and.ll:test17, a common case that
occurs due to unordered comparison macros in math.h

llvm-svn: 14221
2004-06-18 06:07:51 +00:00
Chris Lattner 97bfcea262 Rename Type::PrimitiveID to TypeId and ::getPrimitiveID() to ::getTypeID()
Delete two functions that are now methods on the Type class

llvm-svn: 14200
2004-06-17 18:16:02 +00:00
Brian Gaeke 661963c63f Fix typo in DEBUG printout.
llvm-svn: 14196
2004-06-17 07:26:52 +00:00
Chris Lattner ee59d4bf04 Fix a bug in my checkin from last night that caused miscompilations of
186.crafty, fhourstones and 132.ijpeg.

Bugpoint makes really nasty miscompilations embarassingly easy to find.  It
narrowed it down to the instcombiner and this testcase (from fhourstones):

bool %l7153_l4706_htstat_loopentry_2E_4_no_exit_2E_4(int* %i, [32 x int]* %works, int* %tmp.98.out) {
newFuncRoot:
        %tmp.96 = load int* %i          ; <int> [#uses=1]
        %tmp.97 = getelementptr [32 x int]* %works, long 0, int %tmp.96         ; <int*> [#uses=1]
        %tmp.98 = load int* %tmp.97             ; <int> [#uses=2]
        %tmp.99 = load int* %i          ; <int> [#uses=1]
        %tmp.100 = and int %tmp.99, 7           ; <int> [#uses=1]
        %tmp.101 = seteq int %tmp.100, 7                ; <bool> [#uses=2]
        %tmp.102 = cast bool %tmp.101 to int            ; <int> [#uses=0]
        br bool %tmp.101, label %codeRepl4.exitStub, label %codeRepl3.exitStub

codeRepl4.exitStub:             ; preds = %newFuncRoot
        store int %tmp.98, int* %tmp.98.out
        ret bool true

codeRepl3.exitStub:             ; preds = %newFuncRoot
        store int %tmp.98, int* %tmp.98.out
        ret bool false
}

... which only has one combination performed on it:

$ llvm-as < t.ll | opt -instcombine -debug | llvm-dis
IC: Old =       %tmp.101 = seteq int %tmp.100, 7                ; <bool> [#uses=1]
    New =       setne int %tmp.100, 0           ; <bool>:<badref> [#uses=0]
IC: MOD =       br bool %tmp.101, label %codeRepl3.exitStub, label %codeRepl4.exitStub
IC: MOD =       %tmp.97 = getelementptr [32 x int]* %works, uint 0, int %tmp.96         ; <int*> [#uses=1]

It doesn't get much better than this.  :)

llvm-svn: 14109
2004-06-10 02:33:20 +00:00
Chris Lattner c8e7e298c1 More minor cleanups
llvm-svn: 14108
2004-06-10 02:12:35 +00:00
Chris Lattner df20a4d589 Eliminate many occurrances of Instruction::
llvm-svn: 14107
2004-06-10 02:07:29 +00:00
Chris Lattner 35167c3087 Implement InstCombine/select.ll:test15*
llvm-svn: 14095
2004-06-09 07:59:58 +00:00
Chris Lattner 396dbfe327 Be more careful about the order we put stuff onto the worklist. This allow us to
collapse this:
bool %le(int %A, int %B) {
        %c1 = setgt int %A, %B
        %tmp = select bool %c1, int 1, int 0
        %c2 = setlt int %A, %B
        %result = select bool %c2, int -1, int %tmp
        %c3 = setle int %result, 0
        ret bool %c3
}

into:

bool %le(int %A, int %B) {
        %c3 = setle int %A, %B          ; <bool> [#uses=1]
        ret bool %c3
}

which is handy, because the Java FE makes these sequences all over the place.

This is tested as: test/Regression/Transforms/InstCombine/JavaCompare.ll

llvm-svn: 14086
2004-06-09 05:08:07 +00:00
Chris Lattner 2dd017402b Implement select.ll:test14*
llvm-svn: 14083
2004-06-09 04:24:29 +00:00
Chris Lattner 523d3e6674 Fix one of the major things that is causing the C Backend to infinite loop
llvm-svn: 13872
2004-05-28 05:02:13 +00:00
Chris Lattner ed79d8af53 Fix InstCombine/load.ll & PR347.
This code hadn't been updated after the "structs with more than 256 elements"
related changes to the GEP instruction.  Also it was not handling the
ConstantAggregateZero class.

Now it does!

llvm-svn: 13834
2004-05-27 17:30:27 +00:00
Reid Spencer 297d7fe7e6 Remove unused header file.
llvm-svn: 13750
2004-05-25 08:51:36 +00:00
Reid Spencer 1cc31f264f Make this pass simply invoke SymbolTable::strip().
llvm-svn: 13749
2004-05-25 08:51:25 +00:00
Chris Lattner e1e10e1883 Implement InstCombine:shift.ll:test16, which turns (X >> C1) & C2 != C3
into (X & (C2 << C1)) != (C3 << C1), where the shift may be either left or
right and the compare may be any one.

This triggers 1546 times in 176.gcc alone, as it is a common pattern that
occurs for bitfield accesses.

llvm-svn: 13740
2004-05-25 06:32:08 +00:00
Chris Lattner 03841659a4 Implement instcombine/cast.ll:test16:
Canonicalize cast X to bool into a setne instruction

llvm-svn: 13736
2004-05-25 04:29:21 +00:00
Chris Lattner 99173879ad Spelling people's names right is kinda important
llvm-svn: 13702
2004-05-23 21:27:29 +00:00
Chris Lattner 289ba2ac4d Adjust to the changes in the AliasSetTracker interface
llvm-svn: 13690
2004-05-23 21:20:19 +00:00
Chris Lattner e67dbc2ae2 Add support for replacement of formal arguments with simpler expressions.
llvm-svn: 13689
2004-05-23 21:19:55 +00:00
Chris Lattner 099c8cfe90 Implement the -lowergc pass which is used by code generators (like the CBE)
that do not have builtin support for garbage collection.

llvm-svn: 13688
2004-05-23 21:19:22 +00:00
Chris Lattner 0026512bac This was not meant to be committed
llvm-svn: 13565
2004-05-13 20:56:34 +00:00
Chris Lattner c12c945cc4 Fix a nasty bug that caused us to unroll EXTREMELY large loops due to overflow
in the size calculation.

This is not something you want to see:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - UNROLLING!

The problem was that 2*2147483648 == 0.

Now we get:
Loop Unroll: F[main] Loop %no_exit Loop Size = 2 Trip Count = 2147483648 - TOO LARGE: 4294967296>100

Thanks to some anonymous person playing with the demo page that repeatedly
caused zion to go into swapping land.  That's one way to ensure you'll get
a quick bugfix.  :)

Testcase here: Transforms/LoopUnroll/2004-05-13-DontUnrollTooMuch.ll

llvm-svn: 13564
2004-05-13 20:43:31 +00:00
Chris Lattner 8ec5f88c79 Fix stupid bug in my checkin yesterday
llvm-svn: 13429
2004-05-08 22:41:42 +00:00
Chris Lattner 5f667a6f58 Implement folding of GEP's like:
%tmp.0 = getelementptr [50 x sbyte]* %ar, uint 0, int 5         ; <sbyte*> [#uses=2]
        %tmp.7 = getelementptr sbyte* %tmp.0, int 8             ; <sbyte*> [#uses=1]

together.  This patch actually allows us to simplify and generalize the code.

llvm-svn: 13415
2004-05-07 22:09:22 +00:00
Chris Lattner d9e5813821 Fix PR336: The instcombine pass asserts when visiting load instruction
llvm-svn: 13400
2004-05-07 15:35:56 +00:00
Chris Lattner 9490849028 Do not mark instructions in unreachable sections of the function as live.
This fixes PR332 and ADCE/2004-05-04-UnreachableBlock.llx

llvm-svn: 13349
2004-05-04 17:00:46 +00:00
Chris Lattner dd1a86d858 Minor efficiency tweak, suggested by Patrick Meredith
llvm-svn: 13341
2004-05-04 15:19:33 +00:00
Chris Lattner 63d75af920 Make sure to reprocess instructions used by deleted instructions to avoid
missing opportunities for combination.

llvm-svn: 13309
2004-05-01 23:27:23 +00:00
Chris Lattner b643a9e675 Make sure the instruction combiner doesn't lose track of instructions
when replacing them, missing the opportunity to do simplifications

llvm-svn: 13308
2004-05-01 23:19:52 +00:00
Chris Lattner 652064e3b8 Fix a major pessimization in the instcombiner. If an allocation instruction
is only used by a cast, and the casted type is the same size as the original
allocation, it would eliminate the cast by folding it into the allocation.

Unfortunately, it was placing the new allocation instruction right before
the cast, which could pull (for example) alloca instructions into the body
of a function.  This turns statically allocatable allocas into expensive
dynamically allocated allocas, which is bad bad bad.

This fixes the problem by placing the new allocation instruction at the same
place the old one was, duh. :)

llvm-svn: 13289
2004-04-30 04:37:52 +00:00
Chris Lattner 2d3a7a6ff0 Changes to fix up the inst_iterator to pass to boost iterator checks. This
patch was graciously contributed by Vladimir Prus.

llvm-svn: 13185
2004-04-27 15:13:33 +00:00
Chris Lattner e20c334e65 Instcombine X/-1 --> 0-X
llvm-svn: 13172
2004-04-26 14:01:59 +00:00
Chris Lattner 83cd87efcd Move the scev expansion code into this pass, where it belongs. There is
still room for cleanup, but at least the code modification is out of the
analysis now.

llvm-svn: 13135
2004-04-23 21:29:48 +00:00
Chris Lattner c27302c79f Disable a previous patch that was causing indvars to loop infinitely :(
llvm-svn: 13108
2004-04-22 15:12:36 +00:00
Chris Lattner c1a682dda0 Fix an extremely serious thinko I made in revision 1.60 of this file.
llvm-svn: 13106
2004-04-22 14:59:40 +00:00
Chris Lattner af532f27e7 Implement a todo, rewriting all possible scev expressions inside of the
loop.  This eliminates the extra add from the previous case, but it's
not clear that this will be a performance win overall.  Tommorows test
results will tell. :)

llvm-svn: 13103
2004-04-21 23:36:08 +00:00
Chris Lattner fb9a299f68 This code really wants to iterate over the OPERANDS of an instruction, not
over its USES.  If it's dead it doesn't have any uses!  :)

Thanks to the fabulous and mysterious Bill Wendling for pointing this out.  :)

llvm-svn: 13102
2004-04-21 22:29:37 +00:00
Chris Lattner dc7cc35088 Implement a fixme. The helps loops that have induction variables of different
types in them.  Instead of creating an induction variable for all types, it
creates a single induction variable and casts to the other sizes.  This generates
this code:

no_exit:                ; preds = %entry, %no_exit
        %indvar = phi uint [ %indvar.next, %no_exit ], [ 0, %entry ]            ; <uint> [#uses=4]
***     %j.0.0 = cast uint %indvar to short             ; <short> [#uses=1]
        %indvar = cast uint %indvar to int              ; <int> [#uses=1]
        %tmp.7 = getelementptr short* %P, uint %indvar          ; <short*> [#uses=1]
        store short %j.0.0, short* %tmp.7
        %inc.0 = add int %indvar, 1             ; <int> [#uses=2]
        %tmp.2 = setlt int %inc.0, %N           ; <bool> [#uses=1]
        %indvar.next = add uint %indvar, 1              ; <uint> [#uses=1]
        br bool %tmp.2, label %no_exit, label %loopexit

instead of:

no_exit:                ; preds = %entry, %no_exit
        %indvar = phi ushort [ %indvar.next, %no_exit ], [ 0, %entry ]          ; <ushort> [#uses=2]
***     %indvar = phi uint [ %indvar.next, %no_exit ], [ 0, %entry ]            ; <uint> [#uses=3]
        %indvar = cast uint %indvar to int              ; <int> [#uses=1]
        %indvar = cast ushort %indvar to short          ; <short> [#uses=1]
        %tmp.7 = getelementptr short* %P, uint %indvar          ; <short*> [#uses=1]
        store short %indvar, short* %tmp.7
        %inc.0 = add int %indvar, 1             ; <int> [#uses=2]
        %tmp.2 = setlt int %inc.0, %N           ; <bool> [#uses=1]
        %indvar.next = add uint %indvar, 1
***     %indvar.next = add ushort %indvar, 1
        br bool %tmp.2, label %no_exit, label %loopexit

This is an improvement in register pressure, but probably doesn't happen that
often.

The more important fix will be to get rid of the redundant add.

llvm-svn: 13101
2004-04-21 22:22:01 +00:00
Chris Lattner c1aa21f5a7 Fix PR325
llvm-svn: 13081
2004-04-20 20:26:03 +00:00
Chris Lattner f48f777d4c Initial checkin of a simple loop unswitching pass. It still needs work,
but it's a start, and seems to do it's basic job.

llvm-svn: 13068
2004-04-19 18:07:02 +00:00
Chris Lattner bc02177fdc Add #include
llvm-svn: 13057
2004-04-19 03:01:23 +00:00
Chris Lattner fc44a25bcb Move isLoopInvariant to the Loop class
llvm-svn: 13051
2004-04-18 22:46:08 +00:00
Chris Lattner 827826320d Correct rewriting of exit blocks after my last patch
llvm-svn: 13048
2004-04-18 22:27:10 +00:00
Chris Lattner 35eaa55cfc Loop exit sets are no longer explicitly held, they are dynamically computed on demand.
llvm-svn: 13046
2004-04-18 22:15:13 +00:00
Chris Lattner d72c3eb54e Change the ExitBlocks list from being explicitly contained in the Loop
structure to being dynamically computed on demand.  This makes updating
loop information MUCH easier.

llvm-svn: 13045
2004-04-18 22:14:10 +00:00
Chris Lattner d15250240c Reduce the unrolling limit
llvm-svn: 13040
2004-04-18 18:06:14 +00:00
Chris Lattner 30ae18155d If the preheader of the loop was the entry block of the function, make sure
that the exit block of the loop becomes the new entry block of the function.

This was causing a verifier assertion on 252.eon.

llvm-svn: 13039
2004-04-18 17:38:42 +00:00
Chris Lattner 230bcb6b35 Be much more careful about how we update instructions outside of the loop
using instructions inside of the loop.  This should fix the MishaTest failure
from last night.

llvm-svn: 13038
2004-04-18 17:32:39 +00:00
Chris Lattner 4d52e1e401 After unrolling our single basic block loop, fold it into the preheader and exit
block.  The primary motivation for doing this is that we can now unroll nested loops.

This makes a pretty big difference in some cases.  For example, in 183.equake,
we are now beating the native compiler with the CBE, and we are a lot closer
with LLC.

I'm now going to play around a bit with the unroll factor and see what effect
it really has.

llvm-svn: 13034
2004-04-18 06:27:43 +00:00
Chris Lattner f2cc841619 Fix a bug: this does not preserve the CFG!
While we're at it, add support for updating loop information correctly.

llvm-svn: 13033
2004-04-18 05:38:37 +00:00
Chris Lattner 946b255977 Initial checkin of a simple loop unroller. This pass is extremely basic and
limited.  Even in it's extremely simple state (it can only *fully* unroll single
basic block loops that execute a constant number of times), it already helps improve
performance a LOT on some benchmarks, particularly with the native code generators.

llvm-svn: 13028
2004-04-18 05:20:17 +00:00
Chris Lattner c14da9600b Make the tail duplication threshold accessible from the command line instead of hardcoded
llvm-svn: 13025
2004-04-18 00:52:43 +00:00
Chris Lattner a814080025 If the loop executes a constant number of times, try a bit harder to replace
exit values.

llvm-svn: 13018
2004-04-17 18:44:09 +00:00
Chris Lattner 1e9ac1a45e Fix a HUGE pessimization on X86. The indvars pass was taking this
(familiar) function:

int _strlen(const char *str) {
    int len = 0;
    while (*str++) len++;
    return len;
}

And transforming it to use a ulong induction variable, because the type of
the pointer index was left as a constant long.  This is obviously very bad.

The fix is to shrink long constants in getelementptr instructions to intptr_t,
making the indvars pass insert a uint induction variable, which is much more
efficient.

Here's the before code for this function:

int %_strlen(sbyte* %str) {
entry:
        %tmp.13 = load sbyte* %str              ; <sbyte> [#uses=1]
        %tmp.24 = seteq sbyte %tmp.13, 0                ; <bool> [#uses=1]
        br bool %tmp.24, label %loopexit, label %no_exit

no_exit:                ; preds = %entry, %no_exit
***     %indvar = phi uint [ %indvar.next, %no_exit ], [ 0, %entry ]            ; <uint> [#uses=2]
***     %indvar = phi ulong [ %indvar.next, %no_exit ], [ 0, %entry ]           ; <ulong> [#uses=2]
        %indvar1 = cast ulong %indvar to uint           ; <uint> [#uses=1]
        %inc.02.sum = add uint %indvar1, 1              ; <uint> [#uses=1]
        %inc.0.0 = getelementptr sbyte* %str, uint %inc.02.sum          ; <sbyte*> [#uses=1]
        %tmp.1 = load sbyte* %inc.0.0           ; <sbyte> [#uses=1]
        %tmp.2 = seteq sbyte %tmp.1, 0          ; <bool> [#uses=1]
        %indvar.next = add ulong %indvar, 1             ; <ulong> [#uses=1]
        %indvar.next = add uint %indvar, 1              ; <uint> [#uses=1]
        br bool %tmp.2, label %loopexit.loopexit, label %no_exit

loopexit.loopexit:              ; preds = %no_exit
        %indvar = cast uint %indvar to int              ; <int> [#uses=1]
        %inc.1 = add int %indvar, 1             ; <int> [#uses=1]
        ret int %inc.1

loopexit:               ; preds = %entry
        ret int 0
}


Here's the after code:

int %_strlen(sbyte* %str) {
entry:
        %inc.02 = getelementptr sbyte* %str, uint 1             ; <sbyte*> [#uses=1]
        %tmp.13 = load sbyte* %str              ; <sbyte> [#uses=1]
        %tmp.24 = seteq sbyte %tmp.13, 0                ; <bool> [#uses=1]
        br bool %tmp.24, label %loopexit, label %no_exit

no_exit:                ; preds = %entry, %no_exit
***     %indvar = phi uint [ %indvar.next, %no_exit ], [ 0, %entry ]            ; <uint> [#uses=3]
        %indvar = cast uint %indvar to int              ; <int> [#uses=1]
        %inc.0.0 = getelementptr sbyte* %inc.02, uint %indvar           ; <sbyte*> [#uses=1]
        %inc.1 = add int %indvar, 1             ; <int> [#uses=1]
        %tmp.1 = load sbyte* %inc.0.0           ; <sbyte> [#uses=1]
        %tmp.2 = seteq sbyte %tmp.1, 0          ; <bool> [#uses=1]
        %indvar.next = add uint %indvar, 1              ; <uint> [#uses=1]
        br bool %tmp.2, label %loopexit, label %no_exit

loopexit:               ; preds = %entry, %no_exit
        %len.0.1 = phi int [ 0, %entry ], [ %inc.1, %no_exit ]          ; <int> [#uses=1]
        ret int %len.0.1
}

llvm-svn: 13016
2004-04-17 18:16:10 +00:00
Chris Lattner 885a6eb74d Even if there are not any induction variables in the loop, if we can compute
the trip count for the loop, insert one so that we can canonicalize the exit
condition.

llvm-svn: 13015
2004-04-17 18:08:33 +00:00
Chris Lattner 284d3b0311 Fix some really nasty dominance bugs that were exposed by my patch to
make the verifier more strict.  This fixes building zlib

llvm-svn: 13002
2004-04-16 18:08:07 +00:00
Chris Lattner 9e9b2b7474 Fix some of the strange CBE-only failures that happened last night.
llvm-svn: 12980
2004-04-16 06:03:17 +00:00
Chris Lattner d7a559e353 Fix a bug in the previous checkin: if the exit block is not the same as
the back-edge block, we must check the preincremented value.

llvm-svn: 12968
2004-04-15 20:26:22 +00:00
Chris Lattner 0cec5cb92c Change the canonical induction variable that we insert.
Instead of producing code like this:

Loop:
  X = phi 0, X2
  ...

  X2 = X + 1
  if (X != N-1) goto Loop

We now generate code that looks like this:

Loop:
  X = phi 0, X2
  ...

  X2 = X + 1
  if (X2 != N) goto Loop

This has two big advantages:
  1. The trip count of the loop is now explicit in the code, allowing
     the direct implementation of Loop::getTripCount()
  2. This reduces register pressure in the loop, and allows X and X2 to be
     put into the same register.

As a consequence of the second point, the code we generate for loops went
from:

.LBB2:  # no_exit.1
	...
        mov %EDI, %ESI
        inc %EDI
        cmp %ESI, 2
        mov %ESI, %EDI
        jne .LBB2 # PC rel: no_exit.1

To:

.LBB2:  # no_exit.1
	...
        inc %ESI
        cmp %ESI, 3
        jne .LBB2 # PC rel: no_exit.1

... which has two fewer moves, and uses one less register.

llvm-svn: 12961
2004-04-15 15:21:43 +00:00
Chris Lattner 6679e46b59 ADd a trivial instcombine: load null -> null
llvm-svn: 12940
2004-04-14 03:28:36 +00:00
Chris Lattner ff9362a8da Add SCCP support for constant folding calls, implementing:
test/Regression/Transforms/SCCP/calltest.ll

llvm-svn: 12921
2004-04-13 19:43:54 +00:00
Chris Lattner d0dc6d5295 Constant propagation should remove the dead instructions
llvm-svn: 12917
2004-04-13 19:28:20 +00:00
Chris Lattner 89e959bb1f Fix LoopSimplify/2004-04-13-LoopSimplifyUpdateDomFrontier.ll
LoopSimplify was not updating dominator frontiers correctly in some cases.

llvm-svn: 12890
2004-04-13 16:23:25 +00:00
Chris Lattner a6e22814ab Refactor code a bit to make it simpler and eliminate the goto
llvm-svn: 12888
2004-04-13 15:21:18 +00:00
Chris Lattner 8417052938 This patch addresses PR35: Loop simplify should reconstruct nested loops.
This is fairly straight-forward, but was a real nightmare to get just
perfect.  aarg.  :)

llvm-svn: 12884
2004-04-13 05:05:33 +00:00
Chris Lattner 494a685449 Add support for removing invoke instructions
llvm-svn: 12858
2004-04-12 05:15:13 +00:00
Chris Lattner 24cf0200c7 Fix a bug in my select transformation
llvm-svn: 12826
2004-04-11 01:39:19 +00:00
Chris Lattner f16fe7206c Update the value numbering interface.
llvm-svn: 12824
2004-04-10 22:33:34 +00:00
Chris Lattner 623fba1107 Implement InstCombine/select.ll:test13*
llvm-svn: 12821
2004-04-10 22:21:27 +00:00
Chris Lattner cf4a996cba Implement InstCombine/add.ll:test20
Canonicalize add of sign bit constant into a xor

llvm-svn: 12819
2004-04-10 22:01:55 +00:00
Chris Lattner 69c4900512 Rewrite the GCSE pass to be *substantially* simpler, a bit more efficient,
and a bit more powerful

llvm-svn: 12817
2004-04-10 21:11:11 +00:00
Chris Lattner f9d9665138 Fix spurious warning in release mode
llvm-svn: 12816
2004-04-10 19:15:56 +00:00
Chris Lattner d95ef7eff0 Simplify code a bit, and fix a bug that was breaking perlbmk
llvm-svn: 12814
2004-04-10 18:06:21 +00:00
Chris Lattner 7ebfe61dc1 Fix a bug in my checkin last night that was breaking programs using invoke.
llvm-svn: 12813
2004-04-10 16:53:29 +00:00
Chris Lattner 5093213c40 Fix previous patch
llvm-svn: 12811
2004-04-10 07:27:48 +00:00
Chris Lattner 6149ac8991 Correctly update counters
llvm-svn: 12810
2004-04-10 07:02:02 +00:00
Chris Lattner cfa1adcdb8 Simplify code a bit, and use alias analysis to allow us to delete unused
call and invoke instructions that are known to not write to memory.

llvm-svn: 12807
2004-04-10 06:53:09 +00:00
Chris Lattner 56e4d3d8ad Implement select.ll:test12*
This transforms code like this:

   %C = or %A, %B
   %D = select %cond, %C, %A
into:
   %C = select %cond, %B, 0
   %D = or %A, %C

Since B is often a constant, the select can often be eliminated.  In any case,
this reduces the usage count of A, allowing subsequent optimizations to happen.

This xform applies when the operator is any of:
  add, sub, mul, or, xor, and, shl, shr

llvm-svn: 12800
2004-04-09 23:46:01 +00:00
Chris Lattner 183b336a54 Fold binary operators with a constant operand into select instructions
that have a constant operand.  This implements
add.ll:test19, shift.ll:test15*, and others that are not tested

llvm-svn: 12794
2004-04-09 19:05:30 +00:00
Chris Lattner cf7baf3519 Implement select.ll:test11
llvm-svn: 12793
2004-04-09 18:19:44 +00:00
Chris Lattner e228ee5870 Implement InstCombine/cast-propagate.ll
llvm-svn: 12784
2004-04-08 20:39:49 +00:00
Chris Lattner 1c631e813d Implement InstCombine/select.ll:test[7-10]
llvm-svn: 12769
2004-04-08 04:43:23 +00:00
Chris Lattner 2b2412d0c8 Implement test/Regression/Transforms/InstCombine/getelementptr_index.ll
llvm-svn: 12762
2004-04-07 18:38:20 +00:00
Chris Lattner 4d1fcf1dcd Fix a bug in yesterdays checkins which broke siod. siod is a great testcase! :)
llvm-svn: 12659
2004-04-05 16:02:41 +00:00
Chris Lattner 8953b90aaa Fix InstCombine/2004-04-04-InstCombineReplaceAllUsesWith.ll
llvm-svn: 12658
2004-04-05 02:10:19 +00:00
Chris Lattner 69193f93b6 Support getelementptr instructions which use uint's to index into structure
types and can have arbitrary 32- and 64-bit integer types indexing into
sequential types.

llvm-svn: 12653
2004-04-05 01:30:19 +00:00
Chris Lattner e61b67d7d5 Rewrite the indvars pass to use the ScalarEvolution analysis.
This also implements some new features for the indvars pass, including
linear function test replacement, exit value substitution, and it works with
a much more general class of induction variables and loops.

llvm-svn: 12620
2004-04-02 20:24:31 +00:00
Chris Lattner 59fdf74968 Remove some assertions that are now bogus with the last patch I put in
llvm-svn: 12595
2004-04-01 19:21:46 +00:00
Chris Lattner 146d0df5e4 Fix PR306: Loop simplify incorrectly updates dominator information
Testcase: LoopSimplify/2004-04-01-IncorrectDomUpdate.ll

llvm-svn: 12592
2004-04-01 19:06:07 +00:00
Chris Lattner 61fab1409d Add warning
llvm-svn: 12573
2004-03-31 22:00:30 +00:00
Chris Lattner 533bc49775 Implement select.ll:test[3-6]
llvm-svn: 12544
2004-03-30 19:37:13 +00:00
Chris Lattner 059f390257 Add a simple select instruction lowering pass
llvm-svn: 12540
2004-03-30 18:41:10 +00:00
Chris Lattner 56b5051428 X % -1 == X % 1 == 0
llvm-svn: 12520
2004-03-26 16:11:24 +00:00
Chris Lattner 57c67b06e9 Two changes:
#1 is to unconditionally strip constantpointerrefs out of
instruction operands where they are absolutely pointless and inhibit
optimization.  GRRR!

#2 is to implement InstCombine/getelementptr_const.ll

llvm-svn: 12519
2004-03-25 22:59:29 +00:00
Chris Lattner abb77c9959 Teach the optimizer to delete zero sized alloca's (but not mallocs!)
llvm-svn: 12507
2004-03-19 06:08:10 +00:00
Chris Lattner 684fa5ac64 Be more accurate
llvm-svn: 12464
2004-03-17 01:59:27 +00:00
Chris Lattner a3783a577e Fix bug in previous checkin
llvm-svn: 12458
2004-03-16 23:36:49 +00:00
Chris Lattner 95057f6ad1 Okay, so there is no reasonable way for tail duplication to update SSA form,
as it is making effectively arbitrary modifications to the CFG and we don't
have a domset/domfrontier implementations that can handle the dynamic updates.
Instead of having a bunch of code that doesn't actually work in practice,
just demote any potentially tricky values to the stack (causing the problem
to go away entirely).  Later invocations of mem2reg will rebuild SSA for us.

This fixes all of the major performance regressions with tail duplication
from LLVM 1.1.  For example, this loop:

---
int popcount(int x) {
  int result = 0;
  while (x != 0) {
    result = result + (x & 0x1);
    x = x >> 1;
  }
  return result;
}
---
Used to be compiled into:

int %popcount(int %X) {
entry:
	br label %loopentry

loopentry:		; preds = %entry, %no_exit
	%x.0 = phi int [ %X, %entry ], [ %tmp.9, %no_exit ]		; <int> [#uses=3]
	%result.1.0 = phi int [ 0, %entry ], [ %tmp.6, %no_exit ]		; <int> [#uses=2]
	%tmp.1 = seteq int %x.0, 0		; <bool> [#uses=1]
	br bool %tmp.1, label %loopexit, label %no_exit

no_exit:		; preds = %loopentry
	%tmp.4 = and int %x.0, 1		; <int> [#uses=1]
	%tmp.6 = add int %tmp.4, %result.1.0		; <int> [#uses=1]
	%tmp.9 = shr int %x.0, ubyte 1		; <int> [#uses=1]
	br label %loopentry

loopexit:		; preds = %loopentry
	ret int %result.1.0
}

And is now compiled into:

int %popcount(int %X) {
entry:
        br label %no_exit

no_exit:                ; preds = %entry, %no_exit
        %x.0.0 = phi int [ %X, %entry ], [ %tmp.9, %no_exit ]          ; <int> [#uses=2]
        %result.1.0.0 = phi int [ 0, %entry ], [ %tmp.6, %no_exit ]             ; <int> [#uses=1]
        %tmp.4 = and int %x.0.0, 1              ; <int> [#uses=1]
        %tmp.6 = add int %tmp.4, %result.1.0.0          ; <int> [#uses=2]
        %tmp.9 = shr int %x.0.0, ubyte 1                ; <int> [#uses=2]
        %tmp.1 = seteq int %tmp.9, 0            ; <bool> [#uses=1]
        br bool %tmp.1, label %loopexit, label %no_exit

loopexit:               ; preds = %no_exit
        ret int %tmp.6
}

llvm-svn: 12457
2004-03-16 23:29:09 +00:00
Chris Lattner 7a7b114871 Do not try to optimize PHI nodes with incredibly high degree. This reduces SCCP
time from 615s to 1.49s on a large testcase that has a gigantic switch statement
that all of the blocks in the function go to (an intepreter).

llvm-svn: 12442
2004-03-16 19:49:59 +00:00
Chris Lattner a64923ad26 Do not copy gigantic switch instructions
llvm-svn: 12441
2004-03-16 19:45:22 +00:00
Chris Lattner db5b8f4d6b Fix a regression from this patch:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040308/013095.html

Basically, this patch only updated the immediate dominatees of the header node
to tell them that the preheader also dominated them.  In practice, ALL
dominatees of the header node are also dominated by the preheader.

This fixes: LoopSimplify/2004-03-15-IncorrectDomUpdate.
and PR293

llvm-svn: 12434
2004-03-16 06:00:15 +00:00
Chris Lattner cd83282df1 Add counters for the number of calls elimianted
llvm-svn: 12420
2004-03-15 05:46:59 +00:00