bf96e02a54 
								
							 
						 
						
							
							
								
								For PR1097:  
							
							... 
							
							
							
							Enable complex addressing modes on 64-bit platforms involving two induction
variables by keeping a size and scale in 64-bits not 32.
Patch by Dan Gohman.
llvm-svn: 33011 
							
						 
						
							2007-01-08 16:17:51 +00:00  
				
					
						
							
							
								 
						
							
								3fe98ae10a 
								
							 
						 
						
							
							
								
								no need to worry about int vs uint any more.  
							
							... 
							
							
							
							llvm-svn: 32946 
							
						 
						
							2007-01-06 01:37:35 +00:00  
				
					
						
							
							
								 
						
							
								c635f47d9a 
								
							 
						 
						
							
							
								
								For PR950:  
							
							... 
							
							
							
							This patch replaces signed integer types with signless ones:
1. [US]Byte -> Int8
2. [U]Short -> Int16
3. [U]Int   -> Int32
4. [U]Long  -> Int64.
5. Removal of isSigned, isUnsigned, getSignedVersion, getUnsignedVersion
   and other methods related to signedness. In a few places this warranted
   identifying the signedness information from other sources.
llvm-svn: 32785 
							
						 
						
							2006-12-31 05:48:39 +00:00  
				
					
						
							
							
								 
						
							
								266e42b312 
								
							 
						 
						
							
							
								
								For PR950:  
							
							... 
							
							
							
							This patch removes the SetCC instructions and replaces them with the ICmp
and FCmp instructions. The SetCondInst instruction has been removed and
been replaced with ICmpInst and FCmpInst.
llvm-svn: 32751 
							
						 
						
							2006-12-23 06:05:41 +00:00  
				
					
						
							
							
								 
						
							
								79a42ac941 
								
							 
						 
						
							
							
								
								Switch over Transforms/Scalar to use the STATISTIC macro.  For each statistic  
							
							... 
							
							
							
							converted, we lose a static initializer.  This also allows GCC to emit warnings
about unused statistics.
llvm-svn: 32690 
							
						 
						
							2006-12-19 21:40:18 +00:00  
				
					
						
							
							
								 
						
							
								df1f19a8ef 
								
							 
						 
						
							
							
								
								Change the interface to SCEVExpander::InsertCastOfTo to take a cast opcode  
							
							... 
							
							
							
							so the decision of which opcode to use is pushed upward to the caller.
Adjust the callers to pass the expected opcode.
llvm-svn: 32535 
							
						 
						
							2006-12-13 08:06:42 +00:00  
				
					
						
							
							
								 
						
							
								b341b0861d 
								
							 
						 
						
							
							
								
								Change inferred getCast into specific getCast. Passes all tests.  
							
							... 
							
							
							
							llvm-svn: 32469 
							
						 
						
							2006-12-12 05:05:00 +00:00  
				
					
						
							
							
								 
						
							
								f3baad3ee1 
								
							 
						 
						
							
							
								
								Changed llvm_ostream et all to OStream. llvm_cerr, llvm_cout, llvm_null, are  
							
							... 
							
							
							
							now cerr, cout, and NullStream resp.
llvm-svn: 32298 
							
						 
						
							2006-12-07 01:30:32 +00:00  
				
					
						
							
							
								 
						
							
								700b873130 
								
							 
						 
						
							
							
								
								Detemplatize the Statistic class.  The only type it is instantiated with  
							
							... 
							
							
							
							is 'unsigned'.
llvm-svn: 32279 
							
						 
						
							2006-12-06 17:46:33 +00:00  
				
					
						
							
							
								 
						
							
								6c38f0bb07 
								
							 
						 
						
							
							
								
								For PR950:  
							
							... 
							
							
							
							The long awaited CAST patch. This introduces 12 new instructions into LLVM
to replace the cast instruction. Corresponding changes throughout LLVM are
provided. This passes llvm-test, llvm/test, and SPEC CPUINT2000 with the
exception of 175.vpr which fails only on a slight floating point output
difference.
llvm-svn: 31931 
							
						 
						
							2006-11-27 01:05:10 +00:00  
				
					
						
							
							
								 
						
							
								5dbf43c983 
								
							 
						 
						
							
							
								
								Removed #include <iostream> and replaced with llvm_* streams.  
							
							... 
							
							
							
							llvm-svn: 31923 
							
						 
						
							2006-11-26 09:46:52 +00:00  
				
					
						
							
							
								 
						
							
								21eba2da26 
								
							 
						 
						
							
							
								
								If an indvar with a variable stride is used by the exit condition, go ahead  
							
							... 
							
							
							
							and handle it like constant stride vars.  This fixes some bad codegen in
variable stride cases.  For example, it compiles this:
void foo(int k, int i) {
  for (k=i+i; k <= 8192; k+=i)
    flags2[k] = 0;
}
to:
LBB1_1: #bb.preheader
        movl %eax, %ecx
        addl %ecx, %ecx
        movl L_flags2$non_lazy_ptr, %edx
LBB1_2: #bb
        movb $0, (%edx,%ecx)
        addl %eax, %ecx
        cmpl $8192, %ecx
        jle LBB1_2      #bb
LBB1_5: #return
        ret
or (if the array is local and we are in dynamic-nonpic or static mode):
LBB3_2: #bb
        movb $0, _flags2(%ecx)
        addl %eax, %ecx
        cmpl $8192, %ecx
        jle LBB3_2      #bb
and:
        lis r2, ha16(L_flags2$non_lazy_ptr)
        lwz r2, lo16(L_flags2$non_lazy_ptr)(r2)
        slwi r3, r4, 1
LBB1_2: ;bb
        li r5, 0
        add r6, r4, r3
        stbx r5, r2, r3
        cmpwi cr0, r6, 8192
        bgt cr0, LBB1_5 ;return
instead of:
        leal (%eax,%eax,2), %ecx
        movl %eax, %edx
        addl %edx, %edx
        addl L_flags2$non_lazy_ptr, %edx
        xorl %esi, %esi
LBB1_2: #bb
        movb $0, (%edx,%esi)
        movl %eax, %edi
        addl %esi, %edi
        addl %ecx, %esi
        cmpl $8192, %esi
        jg LBB1_5       #return
and:
        lis r2, ha16(L_flags2$non_lazy_ptr)
        lwz r2, lo16(L_flags2$non_lazy_ptr)(r2)
        mulli r3, r4, 3
        slwi r5, r4, 1
        li r6, 0
        add r2, r2, r5
LBB1_2: ;bb
        li r5, 0
        add r7, r3, r6
        stbx r5, r2, r6
        add r6, r4, r6
        cmpwi cr0, r7, 8192
        ble cr0, LBB1_2 ;bb
This speeds up Benchmarks/Shootout/sieve from 8.533s to 6.464s and
implements LoopStrengthReduce/var_stride_used_by_compare.ll
llvm-svn: 31809 
							
						 
						
							2006-11-17 06:17:33 +00:00  
				
					
						
							
							
								 
						
							
								de46e48420 
								
							 
						 
						
							
							
								
								For PR786:  
							
							... 
							
							
							
							Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
llvm-svn: 31380 
							
						 
						
							2006-11-02 20:25:50 +00:00  
				
					
						
							
							
								 
						
							
								a6eb7e0803 
								
							 
						 
						
							
							
								
								break edges more intelligently  
							
							... 
							
							
							
							llvm-svn: 31257 
							
						 
						
							2006-10-28 06:45:33 +00:00  
				
					
						
							
							
								 
						
							
								5191c65485 
								
							 
						 
						
							
							
								
								prepare for a change I'm about to make  
							
							... 
							
							
							
							llvm-svn: 31248 
							
						 
						
							2006-10-28 00:59:20 +00:00  
				
					
						
							
							
								 
						
							
								e0fc4dfc22 
								
							 
						 
						
							
							
								
								For PR950:  
							
							... 
							
							
							
							This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.
llvm-svn: 31063 
							
						 
						
							2006-10-20 07:07:24 +00:00  
				
					
						
							
							
								 
						
							
								c2d3d3112e 
								
							 
						 
						
							
							
								
								eliminate RegisterOpt.  It does the same thing as RegisterPass.  
							
							... 
							
							
							
							llvm-svn: 29925 
							
						 
						
							2006-08-27 22:42:52 +00:00  
				
					
						
							
							
								 
						
							
								3d27be1333 
								
							 
						 
						
							
							
								
								s|llvm/Support/Visibility.h|llvm/Support/Compiler.h|  
							
							... 
							
							
							
							llvm-svn: 29911 
							
						 
						
							2006-08-27 12:54:02 +00:00  
				
					
						
							
							
								 
						
							
								3ff620178b 
								
							 
						 
						
							
							
								
								Changes:  
							
							... 
							
							
							
							1. Update an obsolete comment.
  2. Make the sorting by base an explicit (though still N^2) step, so
     that the code is more clear on what it is doing.
  3. Partition uses so that uses inside the loop are handled before uses
     outside the loop.
Note that none of these changes currently changes the code inserted by LSR,
but they are a stepping stone to getting there.
This code is the result of some crazy pair programming with Nate. :)
llvm-svn: 29493 
							
						 
						
							2006-08-03 06:34:50 +00:00  
				
					
						
							
							
								 
						
							
								e9c68f52e1 
								
							 
						 
						
							
							
								
								Only reuse a previous IV if it would not require a type conversion.  
							
							... 
							
							
							
							llvm-svn: 29186 
							
						 
						
							2006-07-18 19:07:58 +00:00  
				
					
						
							
							
								 
						
							
								996795b0dd 
								
							 
						 
						
							
							
								
								Use hidden visibility to make symbols in an anonymous namespace get  
							
							... 
							
							
							
							dropped.  This shrinks libllvmgcc.dylib another 67K
llvm-svn: 28975 
							
						 
						
							2006-06-28 23:17:24 +00:00  
				
					
						
							
							
								 
						
							
								398f70292c 
								
							 
						 
						
							
							
								
								RewriteExpr, either the new PHI node of induction variable or the  
							
							... 
							
							
							
							post-increment value, should be first cast to the appropriated type (to the
type of the common expr). Otherwise, the rewrite of a use based on (common +
iv) may end up with an incorrect type.
llvm-svn: 28735 
							
						 
						
							2006-06-09 00:12:42 +00:00  
				
					
						
							
							
								 
						
							
								13a1a7a4a6 
								
							 
						 
						
							
							
								
								Get rid of a signed/unsigned compare warning.  
							
							... 
							
							
							
							llvm-svn: 27625 
							
						 
						
							2006-04-12 19:28:15 +00:00  
				
					
						
							
							
								 
						
							
								f365f5f0c1 
								
							 
						 
						
							
							
								
								Fix spello  
							
							... 
							
							
							
							llvm-svn: 27052 
							
						 
						
							2006-03-24 07:14:34 +00:00  
				
					
						
							
							
								 
						
							
								7d80b4f366 
								
							 
						 
						
							
							
								
								silence a bogus gcc warning  
							
							... 
							
							
							
							llvm-svn: 26953 
							
						 
						
							2006-03-22 17:27:24 +00:00  
				
					
						
							
							
								 
						
							
								c28282bd87 
								
							 
						 
						
							
							
								
								- Fixed a bogus if condition.  
							
							... 
							
							
							
							- Added more debugging info.
- Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride.
llvm-svn: 26841 
							
						 
						
							2006-03-18 08:03:12 +00:00  
				
					
						
							
							
								 
						
							
								f09f0ebd48 
								
							 
						 
						
							
							
								
								Sort StrideOrder so we can process the smallest strides first. This allows  
							
							... 
							
							
							
							for more IV reuses.
llvm-svn: 26837 
							
						 
						
							2006-03-18 00:44:49 +00:00  
				
					
						
							
							
								 
						
							
								4520698820 
								
							 
						 
						
							
							
								
								Allow users of iv / stride to be rewritten with expression that is a multiply  
							
							... 
							
							
							
							of a smaller stride even if they have a common loop invariant expression part.
llvm-svn: 26828 
							
						 
						
							2006-03-17 19:52:23 +00:00  
				
					
						
							
							
								 
						
							
								3df447d354 
								
							 
						 
						
							
							
								
								For each loop, keep track of all the IV expressions inserted indexed by  
							
							... 
							
							
							
							stride. For a set of uses of the IV of a stride which is a multiple
of another stride, do not insert a new IV expression. Rather, reuse the
previous IV and rewrite the uses as uses of IV expression multiplied by
the factor.
e.g.
x = 0 ...; x ++
y = 0 ...; y += 4
then use of y can be rewritten as use of 4*x for x86.
llvm-svn: 26803 
							
						 
						
							2006-03-16 21:53:05 +00:00  
				
					
						
							
							
								 
						
							
								c567c4efbb 
								
							 
						 
						
							
							
								
								Added target lowering hooks which LSR consults to make more intelligent  
							
							... 
							
							
							
							transformation decisions.
llvm-svn: 26738 
							
						 
						
							2006-03-13 23:14:23 +00:00  
				
					
						
							
							
								 
						
							
								d30c4991a1 
								
							 
						 
						
							
							
								
								Use SCEVExpander::InsertCastOfTo instead of our own code.  This reduces  
							
							... 
							
							
							
							#LLVM LOC, and auto-cse's cast instructions.
llvm-svn: 25974 
							
						 
						
							2006-02-04 09:52:43 +00:00  
				
					
						
							
							
								 
						
							
								2959f0003e 
								
							 
						 
						
							
							
								
								Fix two significant bugs in LSR:  
							
							... 
							
							
							
							1. When rewriting code in outer loops, sometimes we would insert code into
   inner loops that is invariant in that loop.
2. Notice that 4*(2+x) is 8+4*x and use that to simplify expressions.
This is a performance neutral change.
llvm-svn: 25964 
							
						 
						
							2006-02-04 07:36:50 +00:00  
				
					
						
							
							
								 
						
							
								c597b8a55e 
								
							 
						 
						
							
							
								
								Make iostream #inclusion explicit  
							
							... 
							
							
							
							llvm-svn: 25514 
							
						 
						
							2006-01-22 23:32:06 +00:00  
				
					
						
							
							
								 
						
							
								cb36710ff9 
								
							 
						 
						
							
							
								
								Switch these to using ETForest instead of DominatorSet to compute itself.  
							
							... 
							
							
							
							Patch written by Daniel Berlin!
llvm-svn: 25202 
							
						 
						
							2006-01-11 05:10:20 +00:00  
				
					
						
							
							
								 
						
							
								077200737c 
								
							 
						 
						
							
							
								
								getRawValue zero extens for unsigned values, use getsextvalue so that we  
							
							... 
							
							
							
							know that small negative values fit into the immediate field of addressing
modes.
llvm-svn: 24608 
							
						 
						
							2005-12-05 18:23:57 +00:00  
				
					
						
							
							
								 
						
							
								5df0e36e98 
								
							 
						 
						
							
							
								
								My previous patch was too conservative.  Reject FP and void types, but do  
							
							... 
							
							
							
							allow pointer types.
llvm-svn: 23859 
							
						 
						
							2005-10-21 05:45:41 +00:00  
				
					
						
							
							
								 
						
							
								0c0b38bb4c 
								
							 
						 
						
							
							
								
								Do NOT touch FP ops with LSR.  This fixes a testcase Nate sent me from an  
							
							... 
							
							
							
							inner loop like this:
LBB_RateConvertMono8AltiVec_2:  ; no_exit
        lis r2, ha16(.CPI_RateConvertMono8AltiVec_0)
        lfs f3, lo16(.CPI_RateConvertMono8AltiVec_0)(r2)
        fmr f3, f3
        fadd f0, f2, f0
        fadd f3, f0, f3
        fcmpu cr0, f3, f1
        bge cr0, LBB_RateConvertMono8AltiVec_2  ; no_exit
to an inner loop like this:
LBB_RateConvertMono8AltiVec_1:  ; no_exit
        fsub f2, f2, f1
        fcmpu cr0, f2, f1
        fmr f0, f2
        bge cr0, LBB_RateConvertMono8AltiVec_1  ; no_exit
Doh! good catch!
llvm-svn: 23838 
							
						 
						
							2005-10-20 04:47:10 +00:00  
				
					
						
							
							
								 
						
							
								192cd18f53 
								
							 
						 
						
							
							
								
								Fix (hopefully the last) issue where LSR is nondeterminstic.  When pulling  
							
							... 
							
							
							
							out CSE's of base expressions it could build a result whose order was
nondet.
llvm-svn: 23698 
							
						 
						
							2005-10-11 18:41:04 +00:00  
				
					
						
							
							
								 
						
							
								5c9d63da31 
								
							 
						 
						
							
							
								
								Fix another problem where LSR was being nondeterminstic.  Also remove elements  
							
							... 
							
							
							
							from the end of a vector instead of the beginning
llvm-svn: 23697 
							
						 
						
							2005-10-11 18:30:57 +00:00  
				
					
						
							
							
								 
						
							
								b7a3894e7c 
								
							 
						 
						
							
							
								
								Fix another lsr-is-nondeterministic case  
							
							... 
							
							
							
							llvm-svn: 23695 
							
						 
						
							2005-10-11 18:17:57 +00:00  
				
					
						
							
							
								 
						
							
								eb4be8b942 
								
							 
						 
						
							
							
								
								Hrm, you didn't see this.  
							
							... 
							
							
							
							llvm-svn: 23673 
							
						 
						
							2005-10-09 06:24:02 +00:00  
				
					
						
							
							
								 
						
							
								4ea0a3eaac 
								
							 
						 
						
							
							
								
								Fix a source of non-determinism in the backend: the order of processing  
							
							... 
							
							
							
							IV strides dependend on the pointer order of the strides in memory.
Non-determinism is bad.
llvm-svn: 23672 
							
						 
						
							2005-10-09 06:20:55 +00:00  
				
					
						
							
							
								 
						
							
								f07a587c79 
								
							 
						 
						
							
							
								
								Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI.  In  
							
							... 
							
							
							
							particular, it should realize that phi's use their values in the pred block
not the phi block itself.  This change turns our em3d loop from this:
_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_6    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; endif.loopexit.loopexit_crit_edge
        addi r3, r2, 1
        blr
LBB_test_6:     ; loopexit
        or r3, r2, r2
        blr
into:
_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r2, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r6, r6
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        or r2, r6, r6
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r2, r2
        blr
Unfortunately, this is actually worse code, because the register coallescer
is getting confused somehow.  If it were doing its job right, it could turn the
code into this:
_test:
        cmpwi cr0, r4, 0
        bgt cr0, LBB_test_2     ; entry.no_exit_crit_edge
LBB_test_1:     ; entry.loopexit_crit_edge
        li r6, 0
        b LBB_test_5    ; loopexit
LBB_test_2:     ; entry.no_exit_crit_edge
        li r6, 0
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
LBB_test_5:     ; loopexit
        or r3, r6, r6
        blr
... which I'll work on next. :)
llvm-svn: 23604 
							
						 
						
							2005-10-03 02:50:05 +00:00  
				
					
						
							
							
								 
						
							
								e4ed42a426 
								
							 
						 
						
							
							
								
								Refactor some code into a function  
							
							... 
							
							
							
							llvm-svn: 23603 
							
						 
						
							2005-10-03 01:04:44 +00:00  
				
					
						
							
							
								 
						
							
								360928dbed 
								
							 
						 
						
							
							
								
								This break is bogus and I have no idea why it was there.  Basically it prevents  
							
							... 
							
							
							
							memoizing code when IV's are used by phinodes outside of loops.  In a simple
example, we were getting this code before (note that r6 and r7 are isomorphic
IV's):
        li r6, 0
        or r7, r6, r6
LBB_test_3:     ; no_exit
        lwz r2, 0(r3)
        cmpw cr0, r2, r5
        or r2, r7, r7
        beq cr0, LBB_test_5     ; loopexit
LBB_test_4:     ; endif
        addi r2, r7, 1
        addi r7, r7, 1
        addi r3, r3, 4
        addi r6, r6, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
Now we get:
        li r6, 0
LBB_test_3:     ; no_exit
        or r2, r6, r6
        lwz r6, 0(r3)
        cmpw cr0, r6, r5
        beq cr0, LBB_test_6     ; loopexit
LBB_test_4:     ; endif
        addi r3, r3, 4
        addi r6, r2, 1
        cmpw cr0, r6, r4
        blt cr0, LBB_test_3     ; no_exit
this was noticed in em3d.
llvm-svn: 23602 
							
						 
						
							2005-10-03 00:37:33 +00:00  
				
					
						
							
							
								 
						
							
								8fcce170cf 
								
							 
						 
						
							
							
								
								when checking if we should move a split edge block outside of a loop,  
							
							... 
							
							
							
							check the presplit pred, not the post-split pred.  This was causing us
to make the wrong decision in some cases, leaving the critical edge block
in the loop.
llvm-svn: 23601 
							
						 
						
							2005-10-03 00:31:52 +00:00  
				
					
						
							
							
								 
						
							
								92233d2175 
								
							 
						 
						
							
							
								
								Make the pass name simpler  
							
							... 
							
							
							
							llvm-svn: 23476 
							
						 
						
							2005-09-27 21:10:32 +00:00  
				
					
						
							
							
								 
						
							
								fd018c8dfe 
								
							 
						 
						
							
							
								
								Fix an issue where LSR would miss rewriting a use of an IV expression by a PHI node that is not the original PHI.  
							
							... 
							
							
							
							This fixes up a dot-product loop in galgel, speeding it up from 18.47s to
16.13s.
llvm-svn: 23327 
							
						 
						
							2005-09-13 02:09:55 +00:00  
				
					
						
							
							
								 
						
							
								8048b85e8f 
								
							 
						 
						
							
							
								
								Fix a regression from last night, which caused this pass to create invalid  
							
							... 
							
							
							
							code for IV uses outside of loops that are not dominated by the latch block.
We should only convert these uses to use the post-inc value if they ARE
dominated by the latch block.
Also use a new LoopInfo method to simplify some code.
This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll
llvm-svn: 23318 
							
						 
						
							2005-09-12 17:11:27 +00:00  
				
					
						
							
							
								 
						
							
								a67648396a 
								
							 
						 
						
							
							
								
								_test:  
							
							... 
							
							
							
							li r2, 0
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r2, 1
        stw r2, 0(r4)
        blr
[zion ~/llvm]$ cat > ~/xx
Uses of IV's outside of the loop should use hte post-incremented version
of the IV, not the preincremented version.  This helps many loops (e.g. in sixtrack)
which used to generate code like this (this is the code from the
dont-hoist-simple-loop-constants.ll testcase):
_test:
        li r2, 0                 **** IV starts at 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2            **** Copy for loop exit
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2           **** IV+2
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2       ****  IV+2
        stw r2, 0(r4)
        blr
And now generated code like this:
_test:
        li r2, 1               *** IV starts at 1
LBB_test_1:     ; no_exit.2
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        addi r3, r3, 4
        cmpwi cr0, r2, 701     *** IV.postinc + 0
        blt cr0, LBB_test_1
LBB_test_2:     ; loopexit.2.loopexit
        stw r2, 0(r4)          *** IV.postinc + 0
        blr
llvm-svn: 23313 
							
						 
						
							2005-09-12 06:04:47 +00:00  
				
					
						
							
							
								 
						
							
								530fe6ab30 
								
							 
						 
						
							
							
								
								implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.  
							
							... 
							
							
							
							We used to emit this code for it:
_test:
        li r2, 1     ;; Value tying up a register for the whole loop
        li r5, 0
LBB_test_1:     ; no_exit.2
        or r6, r5, r5
        li r5, 0
        stw r5, 0(r3)
        addi r5, r6, 1
        addi r3, r3, 4
        add r7, r2, r5  ;; should be addi r7, r5, 1
        cmpwi cr0, r7, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r6, 2
        stw r2, 0(r4)
        blr
now we emit this:
_test:
        li r2, 0
LBB_test_1:     ; no_exit.2
        or r5, r2, r2
        li r2, 0
        stw r2, 0(r3)
        addi r3, r3, 4
        addi r2, r5, 1
        addi r6, r5, 2   ;; whoa, fold those adds!
        cmpwi cr0, r6, 701
        blt cr0, LBB_test_1     ; no_exit.2
LBB_test_2:     ; loopexit.2.loopexit
        addi r2, r5, 2
        stw r2, 0(r4)
        blr
more improvement coming.
llvm-svn: 23306 
							
						 
						
							2005-09-10 01:18:45 +00:00  
				
					
						
							
							
								 
						
							
								ea7dfd53d6 
								
							 
						 
						
							
							
								
								Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crash  
							
							... 
							
							
							
							on 177.mesa
llvm-svn: 22843 
							
						 
						
							2005-08-17 21:22:41 +00:00  
				
					
						
							
							
								 
						
							
								2bf7cb5213 
								
							 
						 
						
							
							
								
								Use a new helper to split critical edges, making the code simpler.  
							
							... 
							
							
							
							Do not claim to not change the CFG.  We do change the cfg to split critical
edges.  This isn't causing us a problem now, but could likely do so in the
future.
llvm-svn: 22824 
							
						 
						
							2005-08-17 06:35:16 +00:00  
				
					
						
							
							
								 
						
							
								5cf983ee0f 
								
							 
						 
						
							
							
								
								Fix a bad case in gzip where we put lots of things in registers across the  
							
							... 
							
							
							
							loop, because a IV-dependent value was used outside of the loop and didn't
have immediate-folding capability
llvm-svn: 22798 
							
						 
						
							2005-08-16 00:38:11 +00:00  
				
					
						
							
							
								 
						
							
								47d3ec3525 
								
							 
						 
						
							
							
								
								Ooops, don't forget to clear this.  The real inner loop is now:  
							
							... 
							
							
							
							.LBB_foo_3:     ; no_exit.1
        lfd f2, 0(r9)
        lfd f3, 8(r9)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r9)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfd f2, 0(r9)
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1
llvm-svn: 22782 
							
						 
						
							2005-08-13 07:42:01 +00:00  
				
					
						
							
							
								 
						
							
								5949d49032 
								
							 
						 
						
							
							
								
								Recursively scan scev expressions for common subexpressions.  This allows us  
							
							... 
							
							
							
							to handle nested loops much better, for example, by being able to tell that
these two expressions:
{( 8 + ( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp 12)}<loopentry.1>
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
Have the following common part that can be shared:
{(( 16 * ( 1 +  %Tmp11 +  %Tmp12)) +  %c_),+,( 16 *  %Tmp12)}<loopentry.1>
This allows us to codegen an important inner loop in 168.wupwise as:
.LBB_foo_4:     ; no_exit.1
        lfd f2, 16(r9)
        fmul f3, f0, f2
        fmul f2, f1, f2
        fadd f4, f3, f2
        stfd f4, 8(r9)
        fsub f2, f3, f2
        stfd f2, 16(r9)
        addi r8, r8, 1
        addi r9, r9, 16
        cmpw cr0, r8, r4
        ble .LBB_foo_4  ; no_exit.1
instead of:
.LBB_foo_3:     ; no_exit.1
        lfdx f2, r6, r9
        add r10, r6, r9
        lfd f3, 8(r10)
        fmul f4, f1, f2
        fmadd f4, f0, f3, f4
        stfd f4, 8(r10)
        fmul f3, f1, f3
        fmsub f2, f0, f2, f3
        stfdx f2, r6, r9
        addi r9, r9, 16
        addi r8, r8, 1
        cmpw cr0, r8, r4
        ble .LBB_foo_3  ; no_exit.1
llvm-svn: 22781 
							
						 
						
							2005-08-13 07:27:18 +00:00  
				
					
						
							
							
								 
						
							
								8447b49526 
								
							 
						 
						
							
							
								
								When splitting critical edges, make sure not to leave the new block in the  
							
							... 
							
							
							
							middle of the loop.  This turns a critical loop in gzip into this:
.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_8 ; loopentry.loopexit_crit_edge
.LBB_test_2:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_7 ; shortcirc_next.0.loopexit_crit_edge
.LBB_test_3:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        bne .LBB_test_6 ; shortcirc_next.1.loopexit_crit_edge
.LBB_test_4:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry
instead of this:
.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_3:    ; shortcirc_next.0
        add r28, r3, r27
        lhz r28, 5(r28)
        add r26, r4, r27
        lhz r26, 5(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_5 ; shortcirc_next.1
.LBB_test_4:    ; shortcirc_next.0.loopexit_crit_edge
        add r2, r11, r27
        add r8, r12, r27
        b .LBB_test_9   ; loopexit
.LBB_test_5:    ; shortcirc_next.1
        add r28, r3, r27
        lhz r28, 7(r28)
        add r26, r4, r27
        lhz r26, 7(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_7 ; shortcirc_next.2
.LBB_test_6:    ; shortcirc_next.1.loopexit_crit_edge
        add r2, r9, r27
        add r8, r10, r27
        b .LBB_test_9   ; loopexit
.LBB_test_7:    ; shortcirc_next.2
        add r28, r3, r27
        lhz r26, 9(r28)
        add r28, r4, r27
        lhz r25, 9(r28)
        addi r28, r27, 8
        cmpw cr7, r26, r25
        mfcr r26, 1
        rlwinm r26, r26, 31, 31, 31
        add r25, r8, r27
        cmpw cr7, r25, r7
        mfcr r25, 1
        rlwinm r25, r25, 29, 31, 31
        and. r26, r26, r25
        bne .LBB_test_1 ; loopentry
Next up, improve the code for the loop.
llvm-svn: 22769 
							
						 
						
							2005-08-12 22:22:17 +00:00  
				
					
						
							
							
								 
						
							
								4fec86d348 
								
							 
						 
						
							
							
								
								Fix a FIXME: if we are inserting code for a PHI argument, split the critical  
							
							... 
							
							
							
							edge so that the code is not always executed for both operands.  This
prevents LSR from inserting code into loops whose exit blocks contain
PHI uses of IV expressions (which are outside of loops).  On gzip, for
example, we turn this ugly code:
.LBB_test_1:    ; loopentry
        add r27, r3, r28
        lhz r27, 3(r27)
        add r26, r4, r28
        lhz r26, 3(r26)
        add r25, r30, r28    ;; Only live if exiting the loop
        add r24, r29, r28    ;; Only live if exiting the loop
        cmpw cr0, r27, r26
        bne .LBB_test_5 ; loopexit
into this:
.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_test_9   ; loopexit
.LBB_test_2:    ; shortcirc_next.0
        ...
        blt .LBB_test_1
into this:
.LBB_test_1:    ; loopentry
        or r27, r28, r28
        add r28, r3, r27
        lhz r28, 3(r28)
        add r26, r4, r27
        lhz r26, 3(r26)
        cmpw cr0, r28, r26
        beq .LBB_test_3 ; shortcirc_next.0
.LBB_test_2:    ; loopentry.loopexit_crit_edge
        add r2, r30, r27
        add r8, r29, r27
        b .LBB_t_3:    ; shortcirc_next.0
.LBB_test_3:    ; shortcirc_next.0
        ...
        blt .LBB_test_1
Next step: get the block out of the loop so that the loop is all
fall-throughs again.
llvm-svn: 22766 
							
						 
						
							2005-08-12 22:06:11 +00:00  
				
					
						
							
							
								 
						
							
								edff91a49a 
								
							 
						 
						
							
							
								
								Teach LSR to strength reduce IVs that have a loop-invariant but non-constant stride.  
							
							... 
							
							
							
							For code like this:
void foo(float *a, float *b, int n, int stride_a, int stride_b) {
  int i;
  for (i=0; i<n; i++)
      a[i*stride_a] = b[i*stride_b];
}
we now emit:
.LBB_foo2_2:    ; no_exit
        lfs f0, 0(r4)
        stfs f0, 0(r3)
        addi r7, r7, 1
        add r4, r2, r4
        add r3, r6, r3
        cmpw cr0, r7, r5
        blt .LBB_foo2_2 ; no_exit
instead of:
.LBB_foo_2:     ; no_exit
        mullw r8, r2, r7     ;; multiply!
        slwi r8, r8, 2
        lfsx f0, r4, r8
        mullw r8, r2, r6     ;; multiply!
        slwi r8, r8, 2
        stfsx f0, r3, r8
        addi r2, r2, 1
        cmpw cr0, r2, r5
        blt .LBB_foo_2  ; no_exit
loops with variable strides occur pretty often.  For example, in SPECFP2K
there are 317 variable strides in 177.mesa, 3 in 179.art, 14 in 188.ammp,
56 in 168.wupwise, 36 in 172.mgrid.
Now we can allow indvars to turn functions written like this:
void foo2(float *a, float *b, int n, int stride_a, int stride_b) {
  int i, ai = 0, bi = 0;
  for (i=0; i<n; i++)
    {
      a[ai] = b[bi];
      ai += stride_a;
      bi += stride_b;
    }
}
into code like the above for better analysis.  With this patch, they generate
identical code.
llvm-svn: 22740 
							
						 
						
							2005-08-10 00:45:21 +00:00  
				
					
						
							
							
								 
						
							
								dde7dc525e 
								
							 
						 
						
							
							
								
								Fix Regression/Transforms/LoopStrengthReduce/phi_node_update_multiple_preds.ll  
							
							... 
							
							
							
							by being more careful about updating PHI nodes
llvm-svn: 22739 
							
						 
						
							2005-08-10 00:35:32 +00:00  
				
					
						
							
							
								 
						
							
								c6c4d99a21 
								
							 
						 
						
							
							
								
								Fix some 80 column violations.  
							
							... 
							
							
							
							Once we compute the evolution for a GEP, tell SE about it.  This allows users
of the GEP to know it, if the users are not direct.  This allows us to compile
this testcase:
void fbSolidFillmmx(int w, unsigned char *d) {
    while (w >= 64) {
        *(unsigned long long *) (d +  0) = 0;
        *(unsigned long long *) (d +  8) = 0;
        *(unsigned long long *) (d + 16) = 0;
        *(unsigned long long *) (d + 24) = 0;
        *(unsigned long long *) (d + 32) = 0;
        *(unsigned long long *) (d + 40) = 0;
        *(unsigned long long *) (d + 48) = 0;
        *(unsigned long long *) (d + 56) = 0;
        w -= 64;
        d += 64;
    }
}
into:
.LBB_fbSolidFillmmx_2:  ; no_exit
        li r2, 0
        stw r2, 0(r4)
        stw r2, 4(r4)
        stw r2, 8(r4)
        stw r2, 12(r4)
        stw r2, 16(r4)
        stw r2, 20(r4)
        stw r2, 24(r4)
        stw r2, 28(r4)
        stw r2, 32(r4)
        stw r2, 36(r4)
        stw r2, 40(r4)
        stw r2, 44(r4)
        stw r2, 48(r4)
        stw r2, 52(r4)
        stw r2, 56(r4)
        stw r2, 60(r4)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit
instead of:
.LBB_fbSolidFillmmx_2:  ; no_exit
        li r11, 0
        stw r11, 0(r4)
        stw r11, 4(r4)
        stwx r11, r10, r4
        add r12, r10, r4
        stw r11, 4(r12)
        stwx r11, r9, r4
        add r12, r9, r4
        stw r11, 4(r12)
        stwx r11, r8, r4
        add r12, r8, r4
        stw r11, 4(r12)
        stwx r11, r7, r4
        add r12, r7, r4
        stw r11, 4(r12)
        stwx r11, r6, r4
        add r12, r6, r4
        stw r11, 4(r12)
        stwx r11, r5, r4
        add r12, r5, r4
        stw r11, 4(r12)
        stwx r11, r2, r4
        add r12, r2, r4
        stw r11, 4(r12)
        addi r4, r4, 64
        addi r3, r3, -64
        cmpwi cr0, r3, 63
        bgt .LBB_fbSolidFillmmx_2       ; no_exit
llvm-svn: 22737 
							
						 
						
							2005-08-09 23:39:36 +00:00  
				
					
						
							
							
								 
						
							
								02742710f3 
								
							 
						 
						
							
							
								
								SCEVAddExpr::get() of an empty list is invalid.  
							
							... 
							
							
							
							llvm-svn: 22724 
							
						 
						
							2005-08-09 01:13:47 +00:00  
				
					
						
							
							
								 
						
							
								a091ff1764 
								
							 
						 
						
							
							
								
								Implement: LoopStrengthReduce/share_ivs.ll  
							
							... 
							
							
							
							Two changes:
  * Only insert one PHI node for each stride.  Other values are live in
    values.  This cannot introduce higher register pressure than the
    previous approach, and can take advantage of reg+reg addressing modes.
  * Factor common base values out of uses before moving values from the
    base to the immediate fields.  This improves codegen by starting the
    stride-specific PHI node out at a common place for each IV use.
As an example, we used to generate this for a loop in swim:
.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfd f0, 0(r8)
        stfd f0, 0(r3)
        lfd f0, 0(r6)
        stfd f0, 0(r7)
        lfd f0, 0(r2)
        stfd f0, 0(r5)
        addi r9, r9, 1
        addi r2, r2, 8
        addi r5, r5, 8
        addi r6, r6, 8
        addi r7, r7, 8
        addi r8, r8, 8
        addi r3, r3, 8
        cmpw cr0, r9, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
now we emit:
.LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_2:        ; no_exit.7.i
        lfdx f0, r8, r2
        stfdx f0, r9, r2
        lfdx f0, r5, r2
        stfdx f0, r7, r2
        lfdx f0, r3, r2
        stfdx f0, r6, r2
        addi r10, r10, 1
        addi r2, r2, 8
        cmpw cr0, r10, r4
        bgt .LBB_main_no_exit_2E_6_2E_i_no_exit_2E_7_2E_i_1
As another more dramatic example, we used to emit this:
.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfd f0, 8(r21)
        lfd f4, 8(r3)
        lfd f5, 8(r27)
        lfd f6, 8(r22)
        lfd f7, 8(r5)
        lfd f8, 8(r6)
        lfd f9, 8(r30)
        lfd f10, 8(r11)
        lfd f11, 8(r12)
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfd f0, 8(r4)
        lfd f0, 8(r25)
        lfd f5, 8(r26)
        lfd f6, 8(r23)
        lfd f9, 8(r28)
        lfd f10, 8(r10)
        lfd f12, 8(r9)
        lfd f13, 8(r29)
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfd f0, 8(r24)
        lfd f0, 8(r8)
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfd f0, 8(r2)
        addi r20, r20, 1
        addi r2, r2, 8
        addi r8, r8, 8
        addi r10, r10, 8
        addi r12, r12, 8
        addi r6, r6, 8
        addi r29, r29, 8
        addi r28, r28, 8
        addi r26, r26, 8
        addi r25, r25, 8
        addi r24, r24, 8
        addi r5, r5, 8
        addi r23, r23, 8
        addi r22, r22, 8
        addi r3, r3, 8
        addi r9, r9, 8
        addi r11, r11, 8
        addi r30, r30, 8
        addi r27, r27, 8
        addi r21, r21, 8
        addi r4, r4, 8
        cmpw cr0, r20, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
we now emit:
.LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_2:       ; no_exit.1.i19
        lfdx f0, r21, r20
        lfdx f4, r3, r20
        lfdx f5, r27, r20
        lfdx f6, r22, r20
        lfdx f7, r5, r20
        lfdx f8, r6, r20
        lfdx f9, r30, r20
        lfdx f10, r11, r20
        lfdx f11, r12, r20
        fsub f10, f10, f11
        fadd f5, f4, f5
        fmul f5, f5, f1
        fadd f6, f6, f7
        fadd f6, f6, f8
        fadd f6, f6, f9
        fmadd f0, f5, f6, f0
        fnmsub f0, f10, f2, f0
        stfdx f0, r4, r20
        lfdx f0, r25, r20
        lfdx f5, r26, r20
        lfdx f6, r23, r20
        lfdx f9, r28, r20
        lfdx f10, r10, r20
        lfdx f12, r9, r20
        lfdx f13, r29, r20
        fsub f11, f13, f11
        fadd f4, f4, f5
        fmul f4, f4, f1
        fadd f5, f6, f9
        fadd f5, f5, f10
        fadd f5, f5, f12
        fnmsub f0, f4, f5, f0
        fnmsub f0, f11, f3, f0
        stfdx f0, r24, r20
        lfdx f0, r8, r20
        fsub f4, f7, f8
        fsub f5, f12, f10
        fnmsub f0, f5, f2, f0
        fnmsub f0, f4, f3, f0
        stfdx f0, r2, r20
        addi r19, r19, 1
        addi r20, r20, 8
        cmpw cr0, r19, r7
        bgt .LBB_main_L_90_no_exit_2E_0_2E_i16_no_exit_2E_1_2E_i19_1
llvm-svn: 22722 
							
						 
						
							2005-08-09 00:18:09 +00:00  
				
					
						
							
							
								 
						
							
								37c24cc98c 
								
							 
						 
						
							
							
								
								Suck the base value out of the UsersToProcess vector into the BasedUser  
							
							... 
							
							
							
							class to simplify the code.  Fuse two loops.
llvm-svn: 22721 
							
						 
						
							2005-08-08 22:56:21 +00:00  
				
					
						
							
							
								 
						
							
								37ed895bf1 
								
							 
						 
						
							
							
								
								Split MoveLoopVariantsToImediateField out from MoveImmediateValues.  The  
							
							... 
							
							
							
							first is a correctness thing, and the later is an optzn thing.  This also
is needed to support a future change.
llvm-svn: 22720 
							
						 
						
							2005-08-08 22:32:34 +00:00  
				
					
						
							
							
								 
						
							
								14203e85b2 
								
							 
						 
						
							
							
								
								Not all constants are legal immediates in load/store instructions.  
							
							... 
							
							
							
							llvm-svn: 22704 
							
						 
						
							2005-08-08 06:25:50 +00:00  
				
					
						
							
							
								 
						
							
								c70bbc0c41 
								
							 
						 
						
							
							
								
								Implement LoopStrengthReduce/share_code_in_preheader.ll by having one  
							
							... 
							
							
							
							rewriter for all code inserted into the preheader, which is never flushed.
llvm-svn: 22702 
							
						 
						
							2005-08-08 05:47:49 +00:00  
				
					
						
							
							
								 
						
							
								9bfa6f8784 
								
							 
						 
						
							
							
								
								Implement a simple optimization for the termination condition of the loop.  
							
							... 
							
							
							
							The termination condition actually wants to use the post-incremented value
of the loop, not a new indvar with an unusual base.
On PPC, for example, this allows us to compile
LoopStrengthReduce/exit_compare_live_range.ll to:
_foo:
        li r2, 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r2, r2, 1
        cmpw cr0, r2, r4
        bne .LBB_foo_1  ; no_exit
        blr
instead of:
_foo:
        li r2, 1                ;; IV starts at 1, not 0
.LBB_foo_1:     ; no_exit
        li r5, 0
        stw r5, 0(r3)
        addi r5, r2, 1
        cmpw cr0, r2, r4
        or r2, r5, r5           ;; Reg-reg copy, extra live range
        bne .LBB_foo_1  ; no_exit
        blr
This implements LoopStrengthReduce/exit_compare_live_range.ll
llvm-svn: 22699 
							
						 
						
							2005-08-08 05:28:22 +00:00  
				
					
						
							
							
								 
						
							
								11e7a5eda7 
								
							 
						 
						
							
							
								
								Make sure to clean CastedPointers after casts are potentially deleted.  
							
							... 
							
							
							
							This fixes LSR crashes on 301.apsi, 191.fma3d, and 189.lucas
llvm-svn: 22673 
							
						 
						
							2005-08-05 01:30:11 +00:00  
				
					
						
							
							
								 
						
							
								45f8b6e7aa 
								
							 
						 
						
							
							
								
								Modify how immediates are removed from base expressions to deal with the fact  
							
							... 
							
							
							
							that the symbolic evaluator is not always able to use subtraction to remove
expressions.  This makes the code faster, and fixes the last crash on 178.galgel.
Finally, add a statistic to see how many phi nodes are inserted.
On 178.galgel, we get the follow stats:
2562 loop-reduce  - Number of PHIs inserted
3927 loop-reduce  - Number of GEPs strength reduced
llvm-svn: 22662 
							
						 
						
							2005-08-04 22:34:05 +00:00  
				
					
						
							
							
								 
						
							
								a6d7c355bc 
								
							 
						 
						
							
							
								
								* Refactor some code into a new BasedUser::RewriteInstructionToUseNewBase  
							
							... 
							
							
							
							method.
* Fix a crash on 178.galgel, where we would insert expressions before PHI
  nodes instead of into the PHI node predecessor blocks.
llvm-svn: 22657 
							
						 
						
							2005-08-04 20:03:32 +00:00  
				
					
						
							
							
								 
						
							
								0f7c0fa2a7 
								
							 
						 
						
							
							
								
								Fix a case that caused this to crash on 178.galgel  
							
							... 
							
							
							
							llvm-svn: 22653 
							
						 
						
							2005-08-04 19:26:19 +00:00  
				
					
						
							
							
								 
						
							
								acc42c4df1 
								
							 
						 
						
							
							
								
								Teach LSR about loop-variant expressions, such as loops like this:  
							
							... 
							
							
							
							for (i = 0; i < N; ++i)
    A[i][foo()] = 0;
here we still want to strength reduce the A[i] part, even though foo() is
l-v.
This also simplifies some of the 'CanReduce' logic.
This implements Transforms/LoopStrengthReduce/ops_after_indvar.ll
llvm-svn: 22652 
							
						 
						
							2005-08-04 19:08:16 +00:00  
				
					
						
							
							
								 
						
							
								456044b724 
								
							 
						 
						
							
							
								
								Remove some more dead code.  
							
							... 
							
							
							
							llvm-svn: 22650 
							
						 
						
							2005-08-04 18:13:56 +00:00  
				
					
						
							
							
								 
						
							
								eaf24725b2 
								
							 
						 
						
							
							
								
								Refactor this code substantially with the following improvements:  
							
							... 
							
							
							
							1. We only analyze instructions once, guaranteed
  2. AnalyzeGetElementPtrUsers has been ripped apart and replaced with
     something much simpler.
The next step is to handle expressions that are not all indvar+loop-invariant
values (e.g. handling indvar+loopvariant).
llvm-svn: 22649 
							
						 
						
							2005-08-04 17:40:30 +00:00  
				
					
						
							
							
								 
						
							
								6f286b760f 
								
							 
						 
						
							
							
								
								refactor some code  
							
							... 
							
							
							
							llvm-svn: 22643 
							
						 
						
							2005-08-04 01:19:13 +00:00  
				
					
						
							
							
								 
						
							
								6510749050 
								
							 
						 
						
							
							
								
								invert to if's to make the logic simpler  
							
							... 
							
							
							
							llvm-svn: 22641 
							
						 
						
							2005-08-04 00:40:47 +00:00  
				
					
						
							
							
								 
						
							
								a0102fbc4f 
								
							 
						 
						
							
							
								
								When processing outer loops and we find uses of an IV in inner loops, make  
							
							... 
							
							
							
							sure to handle the use, just don't recurse into it.
This permits us to generate this code for a simple nested loop case:
.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r29, 44(r1)
        stw r30, 40(r1)
        mflr r11
        stw r11, 56(r1)
        lis r2, ha16(L_A$non_lazy_ptr)
        lwz r30, lo16(L_A$non_lazy_ptr)(r2)
        li r29, 1
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        li r2, 1
        or r3, r30, r30
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r3)
        stfd f0, 0(r3)
        addi r4, r2, 1
        addi r3, r3, 8
        cmpwi cr0, r2, 100
        or r2, r4, r4
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r30, r30, 800
        addi r2, r29, 1
        cmpwi cr0, r29, 100
        or r29, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 40(r1)
        lwz r29, 44(r1)
        lwz r1, 0(r1)
        blr
instead of this:
_foo:
.LBB_foo_0:     ; entry
        stwu r1, -48(r1)
        stw r28, 44(r1)                   ;; uses an extra register.
        stw r29, 40(r1)
        stw r30, 36(r1)
        mflr r11
        stw r11, 56(r1)
        li r30, 1
        li r29, 0
        or r28, r29, r29
.LBB_foo_1:     ; no_exit.0
        bl L_bar$stub
        mulli r2, r28, 800           ;; unstrength-reduced multiply
        lis r3, ha16(L_A$non_lazy_ptr)   ;; loop invariant address computation
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        mulli r4, r29, 800           ;; unstrength-reduced multiply
        addi r3, r3, 8
        add r3, r4, r3
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8                 ;; multiple stride 8 IV's
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1
.LBB_foo_3:     ; loopexit.1
        addi r28, r28, 1               ;;; Many IV's with stride 1
        addi r29, r29, 1
        addi r2, r30, 1
        cmpwi cr0, r30, 100
        or r30, r2, r2
        bne .LBB_foo_1  ; no_exit.0
.LBB_foo_4:     ; return
        lwz r11, 56(r1)
        mtlr r11
        lwz r30, 36(r1)
        lwz r29, 40(r1)
        lwz r28, 44(r1)
        lwz r1, 0(r1)
        blr
llvm-svn: 22640 
							
						 
						
							2005-08-04 00:14:11 +00:00  
				
					
						
							
							
								 
						
							
								fc62470466 
								
							 
						 
						
							
							
								
								Teach loop-reduce to see into nested loops, to pull out immediate values  
							
							... 
							
							
							
							pushed down by SCEV.
In a nested loop case, this allows us to emit this:
        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        li r3, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 8(r2)        ;; Uses offset of 8 instead of 0
        stfd f0, 0(r2)
        addi r4, r3, 1
        addi r2, r2, 8
        cmpwi cr0, r3, 100
        or r3, r4, r4
        bne .LBB_foo_2  ; no_exit.1
instead of this:
        lis r3, ha16(L_A$non_lazy_ptr)
        lwz r3, lo16(L_A$non_lazy_ptr)(r3)
        add r2, r2, r3
        addi r3, r3, 8
        li r4, 1
.LBB_foo_2:     ; no_exit.1
        lfd f0, 0(r3)
        stfd f0, 0(r2)
        addi r5, r4, 1
        addi r2, r2, 8
        addi r3, r3, 8
        cmpwi cr0, r4, 100
        or r4, r5, r5
        bne .LBB_foo_2  ; no_exit.1
llvm-svn: 22639 
							
						 
						
							2005-08-03 23:44:42 +00:00  
				
					
						
							
							
								 
						
							
								bb78c97e24 
								
							 
						 
						
							
							
								
								improve debug output  
							
							... 
							
							
							
							llvm-svn: 22638 
							
						 
						
							2005-08-03 23:30:08 +00:00  
				
					
						
							
							
								 
						
							
								db23c74e5e 
								
							 
						 
						
							
							
								
								Move from Stage 0 to Stage 1.  
							
							... 
							
							
							
							Only emit one PHI node for IV uses with identical bases and strides (after
moving foldable immediates to the load/store instruction).
This implements LoopStrengthReduce/dont_insert_redundant_ops.ll, allowing
us to generate this PPC code for test1:
        or r30, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r30)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop
instead of this code:
        or r30, r3, r3
        or r29, r3, r3
.LBB_test1_1:   ; Loop
        li r2, 0
        stw r2, 0(r29)
        stw r2, 4(r30)
        bl L_pred$stub
        addi r30, r30, 8        ;; Two iv's with step of 8
        addi r29, r29, 8
        cmplwi cr0, r3, 0
        bne .LBB_test1_1        ; Loop
llvm-svn: 22635 
							
						 
						
							2005-08-03 22:51:21 +00:00  
				
					
						
							
							
								 
						
							
								430d0022df 
								
							 
						 
						
							
							
								
								Rename IVUse to IVUsersOfOneStride, use a struct instead of a pair to  
							
							... 
							
							
							
							unify some parallel vectors and get field names more descriptive than
"first" and "second".  This isn't lisp afterall :)
llvm-svn: 22633 
							
						 
						
							2005-08-03 22:21:05 +00:00  
				
					
						
							
							
								 
						
							
								84e9baa925 
								
							 
						 
						
							
							
								
								Fix a nasty dangling pointer issue.  The ScalarEvolution pass would keep a  
							
							... 
							
							
							
							map from instruction* to SCEVHandles.  When we delete instructions, we have
to tell it about it.  We would run into nasty cases where new instructions
were reallocated at old instruction addresses and get the old map values.
Bad bad bad :(
llvm-svn: 22632 
							
						 
						
							2005-08-03 21:36:09 +00:00  
				
					
						
							
							
								 
						
							
								351b891cbc 
								
							 
						 
						
							
							
								
								Like the comment says, do not insert cast instructions before phi nodes  
							
							... 
							
							
							
							llvm-svn: 22586 
							
						 
						
							2005-08-02 03:31:14 +00:00  
				
					
						
							
							
								 
						
							
								75a44e154e 
								
							 
						 
						
							
							
								
								add a comment, make a check more lenient  
							
							... 
							
							
							
							llvm-svn: 22581 
							
						 
						
							2005-08-02 02:52:02 +00:00  
				
					
						
							
							
								 
						
							
								dcce49e006 
								
							 
						 
						
							
							
								
								Simplify for loop, clear a per-loop map after processing each loop  
							
							... 
							
							
							
							llvm-svn: 22580 
							
						 
						
							2005-08-02 02:44:31 +00:00  
				
					
						
							
							
								 
						
							
								9ef1294210 
								
							 
						 
						
							
							
								
								Add a comment  
							
							... 
							
							
							
							Make LSR ignore GEP's that have loop variant base values, as we currently
cannot codegen them
llvm-svn: 22576 
							
						 
						
							2005-08-02 01:32:29 +00:00  
				
					
						
							
							
								 
						
							
								564900e5e5 
								
							 
						 
						
							
							
								
								Fix an iterator invalidation problem  
							
							... 
							
							
							
							llvm-svn: 22575 
							
						 
						
							2005-08-02 00:41:11 +00:00  
				
					
						
							
							
								 
						
							
								546fd5944e 
								
							 
						 
						
							
							
								
								Keep tabs and trailing spaces out.  
							
							... 
							
							
							
							llvm-svn: 22565 
							
						 
						
							2005-07-30 18:33:25 +00:00  
				
					
						
							
							
								 
						
							
								c500991055 
								
							 
						 
						
							
							
								
								Fix VC++ build problems.  
							
							... 
							
							
							
							llvm-svn: 22564 
							
						 
						
							2005-07-30 18:22:27 +00:00  
				
					
						
							
							
								 
						
							
								17a0e2afea 
								
							 
						 
						
							
							
								
								Ack, typo  
							
							... 
							
							
							
							llvm-svn: 22560 
							
						 
						
							2005-07-30 00:21:31 +00:00  
				
					
						
							
							
								 
						
							
								e68bcd1946 
								
							 
						 
						
							
							
								
								Commit a new LoopStrengthReduce pass that can use scalar evolutions and  
							
							... 
							
							
							
							target data to decide which loop induction variables to strength reduce
and how to do so.  This work is mostly by Chris Lattner, with tweaks by
me to get it working on some of MultiSource.
llvm-svn: 22558 
							
						 
						
							2005-07-30 00:15:07 +00:00  
				
					
						
							
							
								 
						
							
								b1c9317bb4 
								
							 
						 
						
							
							
								
								Remove trailing whitespace  
							
							... 
							
							
							
							llvm-svn: 21427 
							
						 
						
							2005-04-21 23:48:37 +00:00  
				
					
						
							
							
								 
						
							
								8c79559443 
								
							 
						 
						
							
							
								
								fix a bug where we thought arguments were constants :(  
							
							... 
							
							
							
							llvm-svn: 20506 
							
						 
						
							2005-03-06 22:52:29 +00:00  
				
					
						
							
							
								 
						
							
								2ce303b406 
								
							 
						 
						
							
							
								
								Fix Regression/Transforms/LoopStrengthReduce/dont_insert_redundant_ops.ll,  
							
							... 
							
							
							
							hopefully not breaking too many other things.
llvm-svn: 20505 
							
						 
						
							2005-03-06 22:36:12 +00:00  
				
					
						
							
							
								 
						
							
								45403e5052 
								
							 
						 
						
							
							
								
								implement Transforms/LoopStrengthReduce/invariant_value_first_arg.ll  
							
							... 
							
							
							
							llvm-svn: 20501 
							
						 
						
							2005-03-06 22:06:22 +00:00  
				
					
						
							
							
								 
						
							
								d3874fad44 
								
							 
						 
						
							
							
								
								minor simplifications of the code.  
							
							... 
							
							
							
							llvm-svn: 20497 
							
						 
						
							2005-03-06 21:58:22 +00:00  
				
					
						
							
							
								 
						
							
								4abcea3a69 
								
							 
						 
						
							
							
								
								Reformat comments to fix 80 columns.  
							
							... 
							
							
							
							llvm-svn: 20467 
							
						 
						
							2005-03-05 22:45:40 +00:00  
				
					
						
							
							
								 
						
							
								be37fa07fd 
								
							 
						 
						
							
							
								
								Reuse induction variables created for strength-reduced GEPs by other similar GEPs.  
							
							... 
							
							
							
							llvm-svn: 20466 
							
						 
						
							2005-03-05 22:40:34 +00:00  
				
					
						
							
							
								 
						
							
								a2c59b7423 
								
							 
						 
						
							
							
								
								Add support for not strength reducing GEPs where the element size is a small  
							
							... 
							
							
							
							power of two.  This emphatically includes the zeroeth power of two.
llvm-svn: 20429 
							
						 
						
							2005-03-04 04:04:26 +00:00  
				
					
						
							
							
								 
						
							
								8ea6f9e821 
								
							 
						 
						
							
							
								
								Fixed the following LSR bugs:  
							
							... 
							
							
							
							* Loop invariant code does not dominate the loop header, but rather
    the end of the loop preheader.
  * The base for a reduced GEP isn't a constant unless all of its
    operands (preceding the induction variable) are constant.
  * Allow induction variable elimination for the simple case after all.
Also made changes recommended by Chris for properly deleting
instructions.
llvm-svn: 20383 
							
						 
						
							2005-03-01 03:46:11 +00:00  
				
					
						
							
							
								 
						
							
								dcaa48b5c4 
								
							 
						 
						
							
							
								
								Fix crash in LSR due to attempt to remove original induction variable.  However,  
							
							... 
							
							
							
							for reasons explained in the comments, I also deactivated this code as it needs
more thought.
llvm-svn: 20367 
							
						 
						
							2005-02-28 00:08:56 +00:00  
				
					
						
							
							
								 
						
							
								fd63d3af0d 
								
							 
						 
						
							
							
								
								PHI nodes were incorrectly placed when more than one GEP is reduced in a loop.  
							
							... 
							
							
							
							llvm-svn: 20360 
							
						 
						
							2005-02-27 21:08:04 +00:00  
				
					
						
							
							
								 
						
							
								39751c3b7c 
								
							 
						 
						
							
							
								
								First pass at improved Loop Strength Reduction.  Still not yet ready for prime time.  
							
							... 
							
							
							
							llvm-svn: 20358 
							
						 
						
							2005-02-27 19:37:07 +00:00  
				
					
						
							
							
								 
						
							
								b18121e6a9 
								
							 
						 
						
							
							
								
								Initial implementation of the strength reduction for GEP instructions in  
							
							... 
							
							
							
							loops.  This optimization is not turned on by default yet, but may be run
with the opt tool's -loop-reduce flag.  There are many FIXMEs listed in the
code that will make it far more applicable to a wide range of code, but you
have to start somewhere :)
This limited version currently triggers on the following tests in the
MultiSource directory:
pcompress2: 7 times
cfrac: 5 times
anagram: 2 times
ks: 6 times
yacr2: 2 times
llvm-svn: 17134 
							
						 
						
							2004-10-18 21:08:22 +00:00