Commit Graph

4774 Commits

Author SHA1 Message Date
Chris Lattner dc3f6f2c12 Factor some code into a new FoldSingleEntryPHINodes method.
llvm-svn: 60501
2008-12-03 19:44:02 +00:00
Dale Johannesen d49ceff6ba Fix a really wrong comment.
llvm-svn: 60494
2008-12-03 19:25:46 +00:00
Chris Lattner 595c7279bd Teach jump threading some more simple tricks:
1) have it fold "br undef", which does occur with
   surprising frequency as jump threading iterates.
2) teach j-t to delete dead blocks.  This removes the successor
   edges, reducing the in-edges of other blocks, allowing 
   recursive simplification.
3) Fold things like:
     br COND, BBX, BBY
  BBX:
     br COND, BBZ, BBW

   which also happens because jump threading iterates.

llvm-svn: 60470
2008-12-03 07:48:08 +00:00
Chris Lattner 37e0136fef third time is the charm.
llvm-svn: 60469
2008-12-03 07:45:15 +00:00
Chris Lattner c04a1ffa9a fix assertion.
llvm-svn: 60468
2008-12-03 07:43:05 +00:00
Chris Lattner 7eb270ed03 Rename DeleteBlockIfDead to DeleteDeadBlock and make it
unconditionally delete the block.  All likely clients will
do the checking anyway.

llvm-svn: 60464
2008-12-03 06:40:52 +00:00
Chris Lattner bcc904a67c Factor some code out of SimplifyCFG, forming a new
DeleteBlockIfDead method.

llvm-svn: 60463
2008-12-03 06:37:44 +00:00
Dale Johannesen 4d2ecb8f68 Minor rewrite per review feedback.
llvm-svn: 60442
2008-12-02 21:17:11 +00:00
Dale Johannesen 70060013d2 Make the code do what the comment says it does.
llvm-svn: 60431
2008-12-02 18:40:09 +00:00
Chris Lattner 1db9bbe802 Implement PRE of loads in the GVN pass with a pretty cheap and
straight-forward implementation.  This does not require any extra
alias analysis queries beyond what we already do for non-local loads.

Some programs really really like load PRE.  For example, SPASS triggers
this ~1000 times, ~300 times in 255.vortex, and ~1500 times on 403.gcc.

The biggest limitation to the implementation is that it does not split
critical edges.  This is a huge killer on many programs and should be
addressed after the initial patch is enabled by default.

The implementation of this should incidentally speed up rejection of 
non-local loads because it avoids creating the repl densemap in cases 
when it won't be used for fully redundant loads.

This is currently disabled by default.
Before I turn this on, I need to fix a couple of miscompilations in
the testsuite, look at compile time performance numbers, and look at
perf impact.  This is pretty close to ready though.

llvm-svn: 60408
2008-12-02 08:16:11 +00:00
Bill Wendling 87beb9b909 Remove some errors that crept in. No functionality change.
llvm-svn: 60403
2008-12-02 06:24:20 +00:00
Bill Wendling 790b4bf9a9 Merge two if-statements into one.
llvm-svn: 60402
2008-12-02 06:22:04 +00:00
Bill Wendling 5635295266 More styalistic changes. No functionality change.
llvm-svn: 60401
2008-12-02 06:18:11 +00:00
Bill Wendling 85de4b35ca - Remove the buggy -X/C -> X/-C transform. This isn't valid when X isn't a
constant. If X is a constant, then this is folded elsewhere.

- Added a note to Target/README.txt to indicate that we'd like to implement
  this when we're able.

llvm-svn: 60399
2008-12-02 05:12:47 +00:00
Bill Wendling 5369db5917 Improve comment.
llvm-svn: 60398
2008-12-02 05:09:00 +00:00
Bill Wendling 21716dff5e - Reduce nesting.
- No need to do a swap on a canonicalized pattern.

No functionality change.

llvm-svn: 60397
2008-12-02 05:06:43 +00:00
Chris Lattner ead1a61b47 some random comment improvements.
llvm-svn: 60395
2008-12-02 04:52:26 +00:00
Owen Anderson d930420ccf Fix an issue that Chris noticed, where local PRE was not properly instantiating
a new value numbering set after splitting a critical edge.  This increases
the number of instances of PRE on 403.gcc from ~60 to ~570.

llvm-svn: 60393
2008-12-02 04:09:22 +00:00
Dale Johannesen 069a4eee55 Consider only references to an IV within the loop when
figuring out the base of the IV.  This produces better
code in the example.  (Addresses use (IV) instead of 
(BASE,IV) - a significant improvement on low-register
machines like x86).

llvm-svn: 60374
2008-12-01 22:00:01 +00:00
Bill Wendling 6f71bce4cf Don't rebuild RHSNeg. Just use the one that's already there.
llvm-svn: 60370
2008-12-01 21:06:30 +00:00
Bill Wendling 84f6f2539f Document what this check is doing. Also, no need to cast to ConstantInt.
llvm-svn: 60369
2008-12-01 21:03:43 +00:00
Bill Wendling e6c87a4952 Use a simple comparison. Overflow on integer negation can only occur when the
integer is "minint".

llvm-svn: 60366
2008-12-01 19:46:27 +00:00
Bill Wendling 47f733e4ea Generalize the FoldOrWithConstant method to fold for any two constants which
don't have overlapping bits.

llvm-svn: 60344
2008-12-01 08:32:40 +00:00
Bill Wendling 22e761b302 Reduce copy-and-paste code by splitting out the code into its own function.
llvm-svn: 60343
2008-12-01 08:23:25 +00:00
Bill Wendling 582fe6b0ca Use m_Specific() instead of double matching.
llvm-svn: 60341
2008-12-01 08:09:47 +00:00
Bill Wendling 4eecfb655b Move pattern check outside of the if-then statement. This prevents us from fiddling with constants unless we have to.
llvm-svn: 60340
2008-12-01 07:47:02 +00:00
Chris Lattner 6f5bf6a718 Rename some variables, only increment BI once at the start of the loop instead of throughout it.
llvm-svn: 60339
2008-12-01 07:35:54 +00:00
Chris Lattner f00aae4968 pull the predMap densemap out of the inner loop of performPRE, so
that it isn't reallocated all the time.  This is a tiny speedup for
GVN: 3.90->3.88s

llvm-svn: 60338
2008-12-01 07:29:03 +00:00
Chris Lattner 2b07d3ccde switch a couple more calls to use array_pod_sort.
llvm-svn: 60337
2008-12-01 06:52:57 +00:00
Chris Lattner 2c2dd15a85 Introduce a new array_pod_sort function and switch LSR to use it
instead of std::sort.  This shrinks the release-asserts LSR.o file
by 1100 bytes of code on my system.

We should start using array_pod_sort where possible.

llvm-svn: 60335
2008-12-01 06:49:59 +00:00
Chris Lattner 2aebea5735 Eliminate use of setvector for the DeadInsts set, just use a smallvector.
This is a lot cheaper and conceptually simpler.

llvm-svn: 60332
2008-12-01 06:27:41 +00:00
Chris Lattner 4da78e3774 DeleteTriviallyDeadInstructions is always passed the
DeadInsts ivar, just use it directly.

llvm-svn: 60330
2008-12-01 06:14:28 +00:00
Chris Lattner a68a5a4784 simplify DeleteTriviallyDeadInstructions again, unlike my previous
buggy rewrite, this notifies ScalarEvolution of a pending instruction
about to be removed and then erases it, instead of erasing it then 
notifying.

llvm-svn: 60329
2008-12-01 06:11:32 +00:00
Chris Lattner 9e6b243428 simplify these patterns using m_Specific. No need to grep for
xor in testcase (or is a substring).

llvm-svn: 60328
2008-12-01 05:16:26 +00:00
Chris Lattner 88a1f0213d Teach jump threading to clean up after itself, DCE and constfolding the
new instructions it simplifies.  Because we're threading jumps on edges
with constants coming in from PHI's, we inherently are exposing a lot more
constants to the new block.  Folding them and deleting dead conditions
allows the cost model in jump threading to be more accurate as it iterates.

llvm-svn: 60327
2008-12-01 04:48:07 +00:00
Chris Lattner 084b3a47d3 Change instcombine to use FoldPHIArgGEPIntoPHI to fold two operand PHIs
instead of using FoldPHIArgBinOpIntoPHI.  In addition to being more
obvious, this also fixes a problem where instcombine wouldn't merge two
phis that had different variable indices.  This prevented instcombine
from factoring big chunks of code in 403.gcc.  For example:

 insn_cuid.exit:                
-       %tmp336 = load i32** @uid_cuid, align 4      
-       %tmp337 = getelementptr %struct.rtx_def* %insn_addr.0.ph.i, i32 0, i32 3    
-       %tmp338 = bitcast [1 x %struct.rtunion]* %tmp337 to i32*               
-       %tmp339 = load i32* %tmp338, align 4           
-       %tmp340 = getelementptr i32* %tmp336, i32 %tmp339     
        br label %bb62
 
 bb61:       
-       %tmp341 = load i32** @uid_cuid, align 4     
-       %tmp342 = getelementptr %struct.rtx_def* %insn, i32 0, i32 3        
-       %tmp343 = bitcast [1 x %struct.rtunion]* %tmp342 to i32*           
-       %tmp344 = load i32* %tmp343, align 4        
-       %tmp345 = getelementptr i32* %tmp341, i32 %tmp344          
        br label %bb62
 
 bb62:      
-       %iftmp.62.0.in = phi i32* [ %tmp345, %bb61 ], [ %tmp340, %insn_cuid.exit ]         
+       %insn.pn2 = phi %struct.rtx_def* [ %insn, %bb61 ], [ %insn_addr.0.ph.i, %insn_cuid.exit ]         
+       %tmp344.pn.in.in = getelementptr %struct.rtx_def* %insn.pn2, i32 0, i32 3     
+       %tmp344.pn.in = bitcast [1 x %struct.rtunion]* %tmp344.pn.in.in to i32*  
+       %tmp341.pn = load i32** @uid_cuid     
+       %tmp344.pn = load i32* %tmp344.pn.in 
+       %iftmp.62.0.in = getelementptr i32* %tmp341.pn, i32 %tmp344.pn   
        %iftmp.62.0 = load i32* %iftmp.62.0.in     

llvm-svn: 60325
2008-12-01 03:42:51 +00:00
Chris Lattner 9d02a70a7d Teach inst combine to merge GEPs through PHIs. This is really
important because it is sinking the loads using the GEPs, but
not the GEPs themselves.  This triggers 647 times on 403.gcc
and makes the .s file much much nicer.  For example before:

        je      LBB1_87 ## bb78
LBB1_62:        ## bb77
        leal    84(%esi), %eax
LBB1_63:        ## bb79
        movl    (%eax), %eax
...
LBB1_87:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
        jmp     LBB1_62 ## bb77


after:

        jne     LBB1_63 ## bb79
LBB1_62:        ## bb78
        movl    $0, 4(%esp)
        movl    %esi, (%esp)
        call    L_make_decl_rtl$stub
LBB1_63:        ## bb79
        movl    84(%esi), %eax

The input code was (and the GEPs are merged and
the PHI is now eliminated by instcombine):

        br i1 %tmp233, label %bb78, label %bb77
bb77:           
        %tmp234 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb78:           
        call void @make_decl_rtl(%struct.tree_node* %t_addr.3, i8* null) nounwind
        %tmp235 = getelementptr %struct.tree_node* %t_addr.3, i32 0, i32 0, i32 22              
        br label %bb79
bb79:           
        %iftmp.12.0.in = phi %struct.rtx_def** [ %tmp235, %bb78 ], [ %tmp234, %bb77 ]           
        %iftmp.12.0 = load %struct.rtx_def** %iftmp.12.0.in             

llvm-svn: 60322
2008-12-01 02:34:36 +00:00
Chris Lattner 9ce8995d24 Make GVN be more intelligent about redundant load
elimination: when finding dependent load/stores, realize that
they are the same if aliasing claims must alias instead of relying
on the pointers to be exactly equal.  This makes load elimination
more aggressive.  For example, on 403.gcc, we had:

<     68 gvn    - Number of instructions PRE'd
< 152718 gvn    - Number of instructions deleted
<  49699 gvn    - Number of loads deleted
<   6153 memdep - Number of dirty cached non-local responses
< 169336 memdep - Number of fully cached non-local responses
< 162428 memdep - Number of uncached non-local responses

now we have:

>     64 gvn    - Number of instructions PRE'd
> 153623 gvn    - Number of instructions deleted
>  49856 gvn    - Number of loads deleted
>   5022 memdep - Number of dirty cached non-local responses
> 159030 memdep - Number of fully cached non-local responses
> 162443 memdep - Number of uncached non-local responses

That's an extra 157 loads deleted and extra 905 other instructions nuked.

This slows down GVN very slightly, from 3.91 to 3.96s.

llvm-svn: 60314
2008-12-01 01:31:36 +00:00
Chris Lattner 7e61dafc95 Reimplement the non-local dependency data structure in terms of a sorted
vector instead of a densemap.  This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster.  This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc

This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case.  Here's the stats
for 403.gcc:

  6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses

yay for caching :)

llvm-svn: 60313
2008-12-01 01:15:42 +00:00
Bill Wendling 5b902c5b1e Implement ((A|B)&1)|(B&-2) -> (A&1) | B transformation. This also takes care of
permutations of this pattern.

llvm-svn: 60312
2008-12-01 01:07:11 +00:00
Chris Lattner 8541edec44 Cache analyses in ivars and add some useful DEBUG output.
This speeds up GVN from 4.0386s to 3.9376s.

llvm-svn: 60310
2008-12-01 00:40:32 +00:00
Chris Lattner 80c7d81e81 improve indentation, do cheap checks before expensive ones,
remove some fixme's.  This speeds up GVN very slightly on 403.gcc 
(4.06->4.03s)

llvm-svn: 60309
2008-11-30 23:39:23 +00:00
Eli Friedman 11c15a5de7 Minor cleanup: use getTrue and getFalse where appropriate. No
functional change.

llvm-svn: 60307
2008-11-30 22:48:49 +00:00
Eli Friedman 55e4becba9 Some minor cleanups to instcombine; no functionality change.
Note that the FoldOpIntoPhi call is dead because it's impossible for the 
first operand of a subtraction to be both a ConstantInt and a PHINode.

llvm-svn: 60306
2008-11-30 21:09:11 +00:00
Bill Wendling de89bc275c Add instruction combining for ((A&~B)|(~A&B)) -> A^B and all permutations.
llvm-svn: 60291
2008-11-30 13:52:49 +00:00
Bill Wendling 9eef421e12 Implement (A&((~A)|B)) -> A&B transformation in the instruction combiner. This
takes care of all permutations of this pattern.

llvm-svn: 60290
2008-11-30 13:08:13 +00:00
Bill Wendling 2fe3229824 Forgot one remaining call to getSExtValue().
llvm-svn: 60289
2008-11-30 12:41:09 +00:00
Bill Wendling 2d2e7861b5 getSExtValue() doesn't work for ConstantInts with bitwidth > 64 bits. Use all
APInt calls instead.

This fixes PR3144.

llvm-svn: 60288
2008-11-30 12:38:24 +00:00
Eli Friedman 09bc610945 Optimize memmove and memset into the LLVM builtins. Note that these
only show up in code from front-ends besides llvm-gcc, like clang.

llvm-svn: 60287
2008-11-30 08:32:11 +00:00
Bill Wendling 7abf352f44 Don't make TwoToExp signed by default.
llvm-svn: 60279
2008-11-30 05:29:33 +00:00
Bill Wendling af200e9237 From Hacker's Delight:
"For signed integers, the determination of overflow of x*y is not so simple. If
x and y have the same sign, then overflow occurs iff xy > 2**31 - 1. If they
have opposite signs, then overflow occurs iff xy < -2**31."

In this case, x == -1.

llvm-svn: 60278
2008-11-30 05:01:05 +00:00
Bill Wendling 70635adea3 Instcombine was illegally transforming -X/C into X/-C when either X or C
overflowed on negation. This commit checks to make sure that neithe C nor X
overflows. This requires that the RHS of X (a subtract instruction) be a
constant integer.

llvm-svn: 60275
2008-11-30 03:42:12 +00:00
Chris Lattner 3ff6d01586 Fix a fixme by making memdep's handling of allocations more logical.
If we see that a load depends on the allocation of its memory with no
intervening stores, we now return a 'None' depedency instead of "Normal".
This tweaks GVN to do its optimization with the new result.

llvm-svn: 60267
2008-11-30 01:39:32 +00:00
Chris Lattner 63bd586d35 Eliminate the dropInstruction method, which is not needed any more.
Fix a subtle iterator invalidation bug I introduced in the last commit.

llvm-svn: 60258
2008-11-29 23:30:39 +00:00
Chris Lattner 1c6b62eb4d Change MemDep::getNonLocalDependency to return its results as
a smallvector instead of a DenseMap.  This speeds up GVN by 5%
on 403.gcc.

llvm-svn: 60255
2008-11-29 21:33:22 +00:00
Chris Lattner f280b0c729 reimplement getNonLocalDependency with a simpler worklist
formulation that is faster and doesn't require nonLazyHelper.
Much less code.

llvm-svn: 60253
2008-11-29 21:22:42 +00:00
Chris Lattner 8c5ff516c6 Fix a thinko that manifested as a crash on clamav last night.
llvm-svn: 60251
2008-11-29 20:29:04 +00:00
Chris Lattner 51ba8d0630 Split getDependency into getDependency and getDependencyFrom, the
former does caching, the later doesn't.  This dramatically simplifies
the logic in getDependency and getDependencyFrom.

llvm-svn: 60234
2008-11-29 03:47:00 +00:00
Bill Wendling 469e3aa696 Temporarily revert r60195. It's causing an optimized bootstrap of llvm-gcc to fail.
llvm-svn: 60233
2008-11-29 03:43:04 +00:00
Chris Lattner 7f9c8a0f05 Introduce and use a new MemDepResult class to hold the results of a memdep
query.  This makes it crystal clear what cases can escape from MemDep that
the clients have to handle.  This also gives the clients a nice simplified
interface to it that is easy to poke at.

This patch also makes DepResultTy and MemoryDependenceAnalysis::DepType
private, yay.

llvm-svn: 60231
2008-11-29 02:29:27 +00:00
Chris Lattner de04e1173a Reimplement the internal abstraction used by MemDep in terms
of a pointer/int pair instead of a manually bitmangled pointer.
This forces clients to think a little more about checking the 
appropriate pieces and will be useful for internal 
implementation improvements later.

I'm not particularly happy with this.  After going through this
I don't think that the clients of memdep should be exposed to
the internal type at all.  I'll fix this in a subsequent commit.

This has no functionality change.

llvm-svn: 60230
2008-11-29 01:43:36 +00:00
Chris Lattner f3f6a801cc don't revisit instructions off the beginning of the block.
llvm-svn: 60221
2008-11-28 22:50:08 +00:00
Chris Lattner f2a8ba4cf0 simplify some code, remove escaped newline.
llvm-svn: 60213
2008-11-28 21:29:52 +00:00
Chris Lattner 8a172daa55 don't call MergeBasicBlockIntoOnlyPred on a block whose only
predecessor is itself.  This doesn't make sense, and this is
a dead infinite loop anyway.

llvm-svn: 60210
2008-11-28 19:54:49 +00:00
Chris Lattner e9f6c355bf rewrite RecursivelyDeleteTriviallyDeadInstructions to use a more efficient
formulation that doesn't require set lookups or scanning a set.

llvm-svn: 60203
2008-11-28 01:20:46 +00:00
Chris Lattner d4b5ba615e remove some weirdness that came from the LSR code that has
nothing to do with dead instruction elimination.  No tests in
dejagnu depend on this, so I don't know what it was needed for.

llvm-svn: 60202
2008-11-28 00:58:15 +00:00
Chris Lattner 1adb6759ef rewrite a big chunk of how DSE does recursive dead operand
elimination to use more modern infrastructure.  Also do a bunch
of small cleanups.

llvm-svn: 60201
2008-11-28 00:27:14 +00:00
Chris Lattner 8e84c129ce delete ErasePossiblyDeadInstructionTree, replacing uses of it with
RecursivelyDeleteTriviallyDeadInstructions.

llvm-svn: 60196
2008-11-27 23:25:44 +00:00
Chris Lattner c077a2a535 Simplify LoopStrengthReduce::DeleteTriviallyDeadInstructions by
making it use RecursivelyDeleteTriviallyDeadInstructions to do
the heavy lifting.

llvm-svn: 60195
2008-11-27 23:23:35 +00:00
Chris Lattner a1bbdff933 enhance RecursivelyDeleteTriviallyDeadInstructions to make
PHIs dead if they are single-value.

llvm-svn: 60194
2008-11-27 23:18:11 +00:00
Chris Lattner 1cb4f72706 Enhance RecursivelyDeleteTriviallyDeadInstructions to optionally
return a list of deleted instructions.

llvm-svn: 60193
2008-11-27 23:14:34 +00:00
Chris Lattner 96e2dbe008 use continue to reduce indentation
llvm-svn: 60192
2008-11-27 23:00:20 +00:00
Chris Lattner c6c481cdfc remove doConstantPropagation and dceInstruction, they are just
wrappers around the interesting code and use an obscure iterator
abstraction that dates back many many years.

Move EraseDeadInstructions to Transforms/Utils and name it
RecursivelyDeleteTriviallyDeadInstructions.

llvm-svn: 60191
2008-11-27 22:57:53 +00:00
Chris Lattner 5ef9ebf787 simplify code.
llvm-svn: 60190
2008-11-27 22:56:14 +00:00
Chris Lattner c92fa42ddd simplify this logic.
llvm-svn: 60189
2008-11-27 22:46:09 +00:00
Nick Lewycky 4ab50b93c8 Chris prefers icmp/select over udiv!
llvm-svn: 60187
2008-11-27 22:41:10 +00:00
Nick Lewycky 69941fd0a0 Add a couple of missed optimizations on integer vectors. Multiply and divide
by 1, as well as multiply by -1.

llvm-svn: 60182
2008-11-27 20:21:08 +00:00
Chris Lattner 4059f43b74 defensive patch: if CGP is merging a block with the entry block, make sure
it ends up being the entry block.

llvm-svn: 60180
2008-11-27 19:29:14 +00:00
Chris Lattner 5dfbfcd80d Fix PR3138: if we merge the entry block into another block, make sure to
move the other block back up into the entry position!

llvm-svn: 60179
2008-11-27 19:25:19 +00:00
Chris Lattner e0d019def6 switch InstCombine::visitLoadInst to use
FindAvailableLoadedValue

llvm-svn: 60169
2008-11-27 08:56:30 +00:00
Chris Lattner c6ae56d23f enhance FindAvailableLoadedValue to make use of AliasAnalysis
if it has it.

llvm-svn: 60167
2008-11-27 08:18:12 +00:00
Chris Lattner 72f16e70f0 move FindAvailableLoadedValue from JumpThreading to Transforms/Utils.
llvm-svn: 60166
2008-11-27 08:10:05 +00:00
Chris Lattner d6204bed3d simplify this code a bit.
llvm-svn: 60164
2008-11-27 07:54:38 +00:00
Chris Lattner 206250284d Use the new MergeBasicBlockIntoOnlyPred function.
llvm-svn: 60163
2008-11-27 07:54:12 +00:00
Chris Lattner 99d6809ac1 move MergeBasicBlockIntoOnlyPred to Transforms/Utils.
llvm-svn: 60162
2008-11-27 07:43:12 +00:00
Chris Lattner 240051aace rename ThreadBlock to ProcessBlock, since it does other things than
just simple threading.

llvm-svn: 60157
2008-11-27 07:20:04 +00:00
Chris Lattner 98d89d1b1b Make jump threading substantially more powerful, in the following ways:
1. Make it fold blocks separated by an unconditional branch.  This enables
   jump threading to see a broader scope.
2. Make jump threading able to eliminate locally redundant loads when they
   feed the branch condition of a block.  This frequently occurs due to
   reg2mem running.
3. Make jump threading able to eliminate *partially redundant* loads when
   they feed the branch condition of a block.  This is common in code with
   lots of loads and stores like C++ code and 255.vortex.

This implements thread-loads.ll and rdar://6402033.

Per the fixme's, several pieces of this should be moved into Transforms/Utils.

llvm-svn: 60148
2008-11-27 05:07:53 +00:00
Chris Lattner 397a11ccd8 Turn on my codegen prepare heuristic by default. It doesn't affect
performance in most cases on the Grawp tester, but does speed some 
things up (like shootout/hash by 15%).  This also doesn't impact 
compile time in a noticable way on the Grawp tester.

It also, of course, gets the testcase it was designed for right :)

llvm-svn: 60120
2008-11-26 22:16:44 +00:00
Chris Lattner fef04acc50 teach the new heuristic how to handle inline asm.
llvm-svn: 60088
2008-11-26 04:59:11 +00:00
Chris Lattner 6d71b7fb95 Improve ValueAlreadyLiveAtInst with a cheap and dirty, but effective
heuristic: the value is already live at the new memory operation if
it is used by some other instruction in the memop's block.  This is
cheap and simple to compute (moreso than full liveness).

This improves the new heuristic even more.  For example, it cuts two
out of three new instructions out of 255.vortex:DbmFileInGrpHdr, 
which is one of the functions that the heuristic regressed.  This
overall eliminates another 40 instructions from 403.gcc and visibly
reduces register pressure in 255.vortex (though this only actually
ends up saving the 2 instructions from the whole program).

llvm-svn: 60084
2008-11-26 03:20:37 +00:00
Chris Lattner e34fe2c52d Start rewroking a subpiece of the profitability heuristic to be
phrased in terms of liveness instead of as a horrible hack.  :)

In pratice, this doesn't change the generated code for either 
255.vortex or 403.gcc, but it could cause minor code changes in 
theory.  This is framework for coming changes.

llvm-svn: 60082
2008-11-26 03:02:41 +00:00
Chris Lattner 383a797f42 add a comment, make save/restore logic more obvious.
llvm-svn: 60076
2008-11-26 02:11:11 +00:00
Chris Lattner eb3e4fb6fb This adds in some code (currently disabled unless you pass
-enable-smarter-addr-folding to llc) that gives CGP a better
cost model for when to sink computations into addressing modes.
The basic observation is that sinking increases register 
pressure when part of the addr computation has to be available
for other reasons, such as having a use that is a non-memory
operation.  In cases where it works, it can substantially reduce
register pressure.

This code is currently an overall win on 403.gcc and 255.vortex
(the two things I've been looking at), but there are several 
things I want to do before enabling it by default:

1. This isn't doing any caching of results, so it is much slower 
   than it could be.  It currently slows down release-asserts llc 
   by 1.7% on 176.gcc: 27.12s -> 27.60s.
2. This doesn't think about inline asm memory operands yet.
3. The cost model botches the case when the needed value is live
   across the computation for other reasons.

I'll continue poking at this, and eventually turn it on as llcbeta.

llvm-svn: 60074
2008-11-26 02:00:14 +00:00
Evan Cheng 496b042e20 Revert r60042. IndVarSimplify should check if APFloat is PPCDoubleDouble first before trying to convert it to an integer.
llvm-svn: 60072
2008-11-26 01:11:57 +00:00
Chris Lattner a9ab165b08 Teach CodeGenPrepare to look through Bitcast instructions when attempting to
optimize addressing modes.  This allows us to optimize things like isel-sink2.ll
into:

	movl	4(%esp), %eax
	cmpb	$0, 4(%eax)
	jne	LBB1_2	## F
LBB1_1:	## TB
	movl	$4, %eax
	ret
LBB1_2:	## F
	movzbl	7(%eax), %eax
	ret

instead of:

_test:
	movl	4(%esp), %eax
	cmpb	$0, 4(%eax)
	leal	4(%eax), %eax
	jne	LBB1_2	## F
LBB1_1:	## TB
	movl	$4, %eax
	ret
LBB1_2:	## F
	movzbl	3(%eax), %eax
	ret

This shrinks (e.g.) 403.gcc from 1133510 to 1128345 lines of .s.

Note that the 2008-10-16-SpillerBug.ll testcase is dubious at best, I doubt
it is really testing what it thinks it is.

llvm-svn: 60068
2008-11-26 00:26:16 +00:00
Chris Lattner f3e95505c5 Teach MatchScaledValue to handle Scales by 1 with MatchAddr (which
can recursively match things) and scales by 0 by ignoring them.
This triggers once in 403.gcc, saving 1 (!!!!) instruction in the 
whole huge app.

llvm-svn: 60013
2008-11-25 07:25:26 +00:00
Chris Lattner 728f90220a significantly refactor all the addressing mode matching logic
into a new AddressingModeMatcher class.  This makes it easier
to reason about and reduces passing around of stuff, but has
no functionality change.

llvm-svn: 60012
2008-11-25 07:09:13 +00:00
Chris Lattner 58f49d2916 refactor all the constantexpr/instruction handling code out into a
new FindMaximalLegalAddressingModeForOperation helper method.

llvm-svn: 60011
2008-11-25 05:15:49 +00:00
Chris Lattner a3fbff15b9 another minor tweak
llvm-svn: 60010
2008-11-25 04:47:41 +00:00
Chris Lattner d616ef5683 minor cleanups no functionality change.
llvm-svn: 60009
2008-11-25 04:42:10 +00:00