Commit Graph

5044 Commits

Author SHA1 Message Date
Benjamin Kramer 2fd48f2730 Implement mulo x, 2 -> addo x, x in DAGCombiner.
llvm-svn: 131800
2011-05-21 18:31:55 +00:00
Cameron Zwarich 2af60abad8 Fix PR9955 by only attaching load memory operands to load instructions and
similarly for stores. Now "make check" passes with the MachineVerifier forced
on with the VerifyCoalescing option!

llvm-svn: 131705
2011-05-19 23:44:34 +00:00
Stuart Hastings 4a4e5a2b55 Update some currently-disabled code, preparing for eventual use.
llvm-svn: 131663
2011-05-19 18:48:20 +00:00
Duncan Sands 3d9407f4eb Revert commit 131534 since it seems to have broken several buildbots.
Original log entry:
Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place.

llvm-svn: 131536
2011-05-18 14:57:56 +00:00
Nadav Rotem c5c27ede55 Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place.

llvm-svn: 131534
2011-05-18 12:26:38 +00:00
Eli Friedman e9692808b7 Make fast-isel miss counting in -stats and -fast-isel-verbose take terminators into account; since there are many fewer isel misses with recent changes, misses caused by terminators are more significant.
llvm-svn: 131502
2011-05-17 23:02:10 +00:00
Dan Gohman abffc991dc Misc. code cleanups.
llvm-svn: 131497
2011-05-17 22:22:52 +00:00
Dan Gohman 4298df6d86 Misc. code cleanups.
llvm-svn: 131495
2011-05-17 22:20:36 +00:00
Dan Gohman d282f46c6b Delete unused variables.
llvm-svn: 131430
2011-05-16 22:19:54 +00:00
Dan Gohman d4d12d14b5 Trim #includes.
llvm-svn: 131429
2011-05-16 22:14:50 +00:00
Dan Gohman ae9b1685a8 Fix whitespace and 80-column violations.
llvm-svn: 131428
2011-05-16 22:09:53 +00:00
Jim Grosbach e85c0dde7a Track how many insns fast-isel successfully selects as well as how many it
misses.

llvm-svn: 131426
2011-05-16 21:51:07 +00:00
Devang Patel 8e60ff11db Preserve debug info for unused zero extended boolean argument.
Radar 9422775.

llvm-svn: 131422
2011-05-16 21:24:05 +00:00
Eli Friedman a4d4a0162d Make fast-isel work correctly s/uadd.with.overflow intrinsics.
llvm-svn: 131420
2011-05-16 21:06:17 +00:00
Eli Friedman 4c08bb450a Fix silly typo.
llvm-svn: 131419
2011-05-16 20:34:53 +00:00
Eli Friedman 9ac944774f Basic fast-isel of extractvalue. Not too helpful on its own, given the IR clang generates for cases like this, but it should become more useful soon.
llvm-svn: 131417
2011-05-16 20:27:46 +00:00
Rafael Espindola 2050af838d Don't do tail calls in a function that call setjmp. The stack might be
corrupted when setjmp returns again.

llvm-svn: 131399
2011-05-16 03:05:33 +00:00
Eli Friedman 8f1e11cde9 Fix a FIXME by moving the fast-isel implementation of the objectsize intrinsic from the x86 code to the generic code.
llvm-svn: 131332
2011-05-14 00:47:51 +00:00
Rafael Espindola e53b7d1a11 Make codegen able to handle values of empty types. This is one way
to fix PR9900. I will keep it open until sable is able to comment on it.

llvm-svn: 131294
2011-05-13 15:18:06 +00:00
Stuart Hastings aa02c0847d Since I can't reproduce the failures from 131261, re-trying with a
simplified version.  <rdar://problem/9298790>

llvm-svn: 131274
2011-05-13 00:51:54 +00:00
Stuart Hastings 8d57d8ea64 Revert 131266 and 131261 due to buildbot complaints.
rdar://problem/9298790

llvm-svn: 131269
2011-05-13 00:15:17 +00:00
Stuart Hastings 89f1b47e3a Non-fast-isel followup to 129634; correctly handle branches controlled
by non-CMP expressions.  The executable test case (129821) would test
this as well, if we had an "-O0 -disable-arm-fast-isel" LLVM-GCC
tester.  Alas, the ARM assembly would be very difficult to check with
FileCheck.

The thumb2-cbnz.ll test is affected; it generates larger code (tst.w
vs. cmp #0), but I believe the new version is correct.
rdar://problem/9298790

llvm-svn: 131261
2011-05-12 23:36:41 +00:00
Nadav Rotem 8a7beb80f0 Fixes a bug in the DAGCombiner. LoadSDNodes have two values (data, chain).
If there is a store after the load node, then there is a chain, which means
that there is another user. Thus, asking hasOneUser would fail. Instead we
ask hasNUsesOfValue on the 'data' value.

llvm-svn: 131183
2011-05-11 14:40:50 +00:00
Bill Wendling 50117f8186 Give the 'eh.sjlj.dispatchsetup' intrinsic call the value coming from the setjmp
intrinsic call. This prevents it from being reordered so that it appears
*before* the setjmp intrinsic (thus making it completely useless).
<rdar://problem/9409683>

llvm-svn: 131174
2011-05-11 01:11:55 +00:00
Eli Friedman 768de0a0f8 Disable my little CopyToReg argument hack with fast-isel. rdar://problem/9413587 .
llvm-svn: 131156
2011-05-10 21:50:58 +00:00
Stuart Hastings 999fa3bf1f Correctly walk through nested and adjacent CALLSEQ_START nodes. No
test case; I've only seen this on a release branch, and I can't get it
to reproduce on trunk.  rdar://problem/7662569

llvm-svn: 131152
2011-05-10 21:20:03 +00:00
Eric Christopher 4480428474 Look through struct wrapped types for inline asm statments.
Patch by Evan Cheng.

llvm-svn: 131093
2011-05-09 20:04:43 +00:00
Duncan Sands 6be291a2cd Indent properly, no functionality change.
llvm-svn: 131082
2011-05-09 08:03:33 +00:00
Evan Cheng d26fc5e013 80 col violations.
llvm-svn: 131015
2011-05-06 20:52:23 +00:00
Eli Friedman 2518f8376d Make the logic for determining function alignment more explicit. No functionality change.
llvm-svn: 131012
2011-05-06 20:34:06 +00:00
Eli Friedman 7a78f66145 Use array_lengthof. No functional change.
llvm-svn: 131008
2011-05-06 19:50:10 +00:00
Owen Anderson 68b6b0efb0 Allow FastISel of three-register-operand instructions.
llvm-svn: 130934
2011-05-05 17:59:04 +00:00
Eli Friedman 441a01a2b8 Avoid extra vreg copies for arguments passed in registers. Specifically, this can make MachineCSE more effective in some cases (especially in small functions). PR8361 / part of rdar://problem/8259436 .
llvm-svn: 130928
2011-05-05 16:53:34 +00:00
Eli Friedman fd8c6adffb Small syntax cleanup; we don't need to #define constants in C++. No functionality change intended.
llvm-svn: 130926
2011-05-05 16:25:23 +00:00
Owen Anderson 66fd073974 Other parts of the SelectionDAG framework assume that targets use their pointer type for vector indices. Make the vector unrolling code respect that.
llvm-svn: 130733
2011-05-02 22:25:45 +00:00
Eli Friedman 4105ed1523 Make FastEmit_ri_ try a bit harder to succeed for supported operations; FastEmit_i can fail for non-Thumb2 ARM. Makes ARMSimplifyAddress work correctly, and reduces the number of fast-isel bailouts on non-Thumb ARM.
llvm-svn: 130560
2011-04-29 23:34:52 +00:00
Eli Friedman 33c133919a Fix a silly mistake in r130338.
llvm-svn: 130360
2011-04-28 00:42:03 +00:00
Eli Friedman 406c471b69 Make the fast-isel code for literal 0.0 a bit shorter/faster, since 0.0 is common. rdar://problem/9303592 .
llvm-svn: 130338
2011-04-27 22:41:55 +00:00
Eli Friedman 121d27e9e4 Remove unused function.
llvm-svn: 130337
2011-04-27 22:21:02 +00:00
Evan Cheng 1355bbdd11 Be careful about scheduling nodes above previous calls. It increase usages of
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.

Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.

rdar://9329627

llvm-svn: 130245
2011-04-26 21:31:35 +00:00
Dan Gohman 7da91aee83 Fast-isel support for simple inline asms.
llvm-svn: 130205
2011-04-26 17:18:34 +00:00
Evan Cheng 2f64754031 Fix typo
llvm-svn: 130190
2011-04-26 04:57:37 +00:00
Devang Patel 734f2218ac A dbg.declare may not be in entry block, even if it is referring to an incoming argument. However, It is appropriate to emit DBG_VALUE referring to this incoming argument in entry block in MachineFunction.
llvm-svn: 130129
2011-04-25 16:33:52 +00:00
Jay Foad 1a180156b6 Remove unused STL header includes.
llvm-svn: 130068
2011-04-23 19:53:52 +00:00
Owen Anderson dd450b86cf Teach FastISel to deal with instructions that have two immediate operands.
llvm-svn: 130033
2011-04-22 23:38:06 +00:00
Chris Lattner 6d277517d1 Recommit the fix for rdar://9289512 with a couple tweaks to
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
   would cause an assert.
2. The load may not be used by the current chain of instructions,
   and we could move it past a side-effecting instruction. Change
   how we process uses to define the problem away.

llvm-svn: 130018
2011-04-22 21:59:37 +00:00
Benjamin Kramer 341c11da3b DAGCombine: fold "(zext x) == C" into "x == (trunc C)" if the trunc is lossless.
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
  movzbl	(%rdi), %eax
  cmpl	$47, %eax
->
  cmpb	$47, (%rdi)

This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)

llvm-svn: 130005
2011-04-22 18:47:44 +00:00
Daniel Dunbar 6309828206 Revert r1296656, "Fix rdar://9289512 - not folding load into compare at -O0...",
which broke a couple GCC test suite tests at -O0.

llvm-svn: 129914
2011-04-21 16:14:46 +00:00
Eric Christopher bcaedb5ce0 Rewrite the expander for umulo/smulo to remember to sign extend the input
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).

Fixes rdar://9292577

llvm-svn: 129842
2011-04-20 01:19:45 +00:00
Stuart Hastings 468086d5e1 Delete unnecessary variable. <rdar://problem/7662569>
llvm-svn: 129796
2011-04-19 20:09:38 +00:00
Eli Friedman bcd09b3a7f SelectBasicBlock is rather slow even when it doesn't do anything; skip the
unnecessary work where possible.

llvm-svn: 129763
2011-04-19 17:01:08 +00:00
Stuart Hastings 0b68c1219f Support nested CALLSEQ_BEGIN/END; necessary for ARM byval support. <rdar://problem/7662569>
llvm-svn: 129761
2011-04-19 16:16:58 +00:00
Chris Lattner 91328b317b Implement support for x86 fastisel of small fixed-sized memcpys, which are generated
en-mass for C++ PODs.  On my c++ test file, this cuts the fast isel rejects by 10x 
and shrinks the generated .s file by 5%

llvm-svn: 129755
2011-04-19 05:52:03 +00:00
Chris Lattner 48f75ad678 while we're at it, handle 'sdiv exact' of a power of 2 also,
this fixes a few rejects on c++ iterator loops.

llvm-svn: 129694
2011-04-18 07:00:40 +00:00
Chris Lattner 562d6e82bd fix rdar://9297011 - udiv by power of two causing fast-isel rejects
llvm-svn: 129693
2011-04-18 06:55:51 +00:00
Chris Lattner b53ccb8e36 1. merge fast-isel-shift-imm.ll into fast-isel-x86-64.ll
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the 
   shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
   instead of FastEmit_ri to simplify code.

llvm-svn: 129666
2011-04-17 20:23:29 +00:00
Chris Lattner 4832660b4d fix an oversight which caused us to compile the testcase (and other
less trivial things) into a dummy lea.  Before we generated:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	leaq	(%rax), %rax
	ret

now we produce:

_test:                                  ## @test
	movq	_G@GOTPCREL(%rip), %rax
	ret

This is part of rdar://9289558

llvm-svn: 129662
2011-04-17 17:12:08 +00:00
Chris Lattner 045c43855c Fix rdar://9289512 - not folding load into compare at -O0
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo.  Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:


cmpb    $0, 52(%rax)
je      LBB4_2

instead of:

movb    52(%rax), %cl
cmpb    $0, %cl
je      LBB4_2

This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and 
generating less code.

This was one of the biggest classes of missing load folding.  Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.

llvm-svn: 129656
2011-04-17 06:35:44 +00:00
Chris Lattner d70ff0d807 split a complex predicate out to a helper function. Simplify two for loops,
which don't need to check for falling off the end of a block *and* end of phi
nodes, since terminators are never phis.

llvm-svn: 129655
2011-04-17 06:03:19 +00:00
Chris Lattner fba7ca63cc fix rdar://9289583 - fast isel should handle non-canonical commutative binops
allowing us to fold the immediate into the 'and' in this case:

int test1(int i) {
  return 8&i;
}

llvm-svn: 129653
2011-04-17 01:16:47 +00:00
Eli Friedman 55b0acd624 PR9055: extend the fix to PR4050 (r70179) to apply to zext and anyext.
Returning a new node makes the code try to replace the old node, which
in the included testcase is killed by CSE.

llvm-svn: 129650
2011-04-16 23:25:34 +00:00
Evan Cheng b14ce09fca Fix divmod libcall lowering. Convert to {S|U}DIVREM first and then expand the node to a libcall. rdar://9280991
llvm-svn: 129633
2011-04-16 03:08:26 +00:00
Chris Lattner 0ab5e2cded Fix a ton of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129558
2011-04-15 05:18:47 +00:00
Owen Anderson a519284fec Fix another instance of the DAG combiner not using the correct type for the RHS of a shift.
llvm-svn: 129522
2011-04-14 17:30:49 +00:00
Andrew Trick bfbd972b1f In the pre-RA scheduler, maintain cmp+br proximity.
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.

The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.

Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump

Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion

llvm-svn: 129508
2011-04-14 05:15:06 +00:00
Chris Lattner 493b3e72f2 sink a call into its only use.
llvm-svn: 129503
2011-04-14 04:12:47 +00:00
Owen Anderson 9c12834eed During post-legalization DAG combining, be careful to only create shifts where the RHS is of the legal type for the new operation.
llvm-svn: 129484
2011-04-13 23:22:23 +00:00
Andrew Trick b53a00d2cb Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
Additional fixes:
Do something reasonable for subtargets with generic
itineraries by handle node latency the same as for an empty
itinerary. Now nodes default to unit latency unless an itinerary
explicitly specifies a zero cycle stage or it is a TokenFactor chain.

Original fixes:
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make the ndoe latency adjustments work, I also
needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129421
2011-04-13 00:38:32 +00:00
Andrew Trick 1b60ad6644 Revert 129383. It causes some targets to hit a scheduler assert.
llvm-svn: 129385
2011-04-12 20:14:07 +00:00
Andrew Trick c5dd24a542 PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency.
UnitsSharePred was a source of randomness in the scheduler: node
priority depended on the queue data structure. I rewrote the recent
VRegCycle heuristics to completely replace the old heuristic without
any randomness. To make these heuristic adjustments to node latency work,
I also needed to do something a little more reasonable with TokenFactor. I
gave it zero latency to its consumers and always schedule it as low as
possible.

llvm-svn: 129383
2011-04-12 19:54:36 +00:00
Jay Foad 7c14a558fe Don't include Operator.h from InstrTypes.h.
llvm-svn: 129271
2011-04-11 09:35:34 +00:00
Chris Lattner cfe5aa65d2 Avoid excess precision issues that lead to generating host-compiler-specific code.
Switch lowering probably shouldn't be using FP for this.  This resolves PR9581.

llvm-svn: 129199
2011-04-09 06:57:13 +00:00
Chris Lattner 41c80e89f3 have dag combine zap "store undef", which can be formed during call lowering
with undef arguments.

llvm-svn: 129185
2011-04-09 02:32:02 +00:00
Evan Cheng 74d92c1924 Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time.
llvm-svn: 129152
2011-04-08 21:37:21 +00:00
Andrew Trick 2ad0b37318 Added a check in the preRA scheduler for potential interference on a
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.

Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing

llvm-svn: 129100
2011-04-07 19:54:57 +00:00
Bill Wendling dd4dcd549b Revamp the SjLj "dispatch setup" intrinsic.
It needed to be moved closer to the setjmp statement, because the code directly
after the setjmp needs to know about values that are on the stack. Also, the
'bitcast' of the function context was causing a dead load. This wouldn't be too
horrible, except that at -O0 it wasn't optimized out, and because it wasn't
using the correct base pointer (if there is a VLA), it would try to access a
value from a garbage address.
<rdar://problem/9130540>

llvm-svn: 128873
2011-04-05 01:37:43 +00:00
Stuart Hastings ad68c93a2d Revert 123704; it broke threaded LLVM.
llvm-svn: 128868
2011-04-05 00:37:28 +00:00
Cameron Zwarich 8c7bbc09e2 Add a RemoveFromWorklist method to DCI. This is needed to do some complicated
transformations in target-specific DAG combines without causing DAGCombiner to
delete the same node twice. If you know of a better way to avoid this (see my
next patch for an example), please let me know.

llvm-svn: 128758
2011-04-02 02:40:26 +00:00
Evan Cheng 8b1bca1998 Add comments.
llvm-svn: 128730
2011-04-01 19:57:01 +00:00
Evan Cheng 8d68ebd42a Assign node order numbers to results of call instruction lowering. This should improve src line debug info when sdisel is used. rdar://9199118
llvm-svn: 128728
2011-04-01 19:42:22 +00:00
Evan Cheng bd76679700 Issue libcalls __udivmod*i4 / __divmod*i4 for div / rem pairs.
rdar://8911343

llvm-svn: 128696
2011-04-01 00:42:02 +00:00
Benjamin Kramer 355ce07425 Turn SelectionDAGBuilder::GetRegistersForValue into a local function.
It couldn't be used outside of the file because SDISelAsmOperandInfo
is local to SelectionDAGBuilder.cpp. Making it a static function avoids
a weird linkage dance.

llvm-svn: 128342
2011-03-26 16:35:10 +00:00
Andrew Trick 3bd8b7a388 Fix for -pre-RA-sched=source.
Yet another case of unchecked NULL node (for physreg copy).
May fix PR9509.

llvm-svn: 128266
2011-03-25 06:40:55 +00:00
Eli Friedman 4c192305bf PR9535: add support for splitting and scalarizing vector ISD::FP_ROUND.
Also cleaning up some duplicated code while I'm here.

llvm-svn: 128176
2011-03-23 22:18:48 +00:00
Andrew Trick 13acae040c Ensure that def-side physreg copies are scheduled above any other uses
so the scheduler can't create new interferences on the copies
themselves. Prior to this fix the scheduler could get stuck in a loop
creating copies.
Fixes PR9509.

llvm-svn: 128164
2011-03-23 20:42:39 +00:00
Andrew Trick a8846e0540 whitespace
llvm-svn: 128163
2011-03-23 20:40:18 +00:00
Andrew Trick b1fd328581 Added block number and name to isel debug output.
I'm tired of doing this manually for each checkout.
If anyone knows a better way debug isel for non-trivial tests feel
free to revert and let me know how to do it.

llvm-svn: 128132
2011-03-23 01:38:28 +00:00
Eric Christopher 1b4b1e559a Grammar-o.
llvm-svn: 128004
2011-03-21 18:06:21 +00:00
Nadav Rotem e7a101ccab Add support for legalizing UINT_TO_FP of vectors on platforms which do
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.

llvm-svn: 127951
2011-03-19 13:09:10 +00:00
Benjamin Kramer cfcea12fe2 BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift.
This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into
	shrl	$2, %edi
	imulq	$613566757, %rdi, %rax
	shrq	$32, %rax
	ret

instead of
	movl    %edi, %eax
	imulq   $613566757, %rax, %rcx
	shrq    $32, %rcx
	subl    %ecx, %eax
	shrl    %eax
	addl    %ecx, %eax
	shrl    $4, %eax

on x86_64

llvm-svn: 127829
2011-03-17 20:39:14 +00:00
Cameron Zwarich 2ef0c69df1 Move more logic into getTypeForExtArgOrReturn.
llvm-svn: 127809
2011-03-17 14:53:37 +00:00
Cameron Zwarich 34e7b3f77e Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn().
llvm-svn: 127807
2011-03-17 14:21:56 +00:00
Cameron Zwarich ac106273d4 The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.

This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.

llvm-svn: 127766
2011-03-16 22:20:18 +00:00
Cameron Zwarich d1ad9bc277 Don't recompute something that we already have in a local variable.
llvm-svn: 127764
2011-03-16 22:20:07 +00:00
Evan Cheng c5c2cfa381 sext(undef) = 0, because the top bits will all be the same.
zext(undef) = 0, because the top bits will be zero.

llvm-svn: 127649
2011-03-15 02:22:10 +00:00
Evan Cheng 37139edc8c BIT_CONVERT has been renamed to BITCAST.
llvm-svn: 127600
2011-03-14 18:19:52 +00:00
Evan Cheng d2f3b01797 Minor optimization. sign-ext/anyext of undef is still undef.
llvm-svn: 127598
2011-03-14 18:15:55 +00:00
Owen Anderson 66443c034d Teach FastISel to support register-immediate-immediate instructions.
llvm-svn: 127496
2011-03-11 21:33:55 +00:00
Andrew Trick 710d5da306 Replace -dag-chain-limit flag with constant. It has survived a release cycle without being touched, so no longer needs to pollute the hidden-help text.
llvm-svn: 127468
2011-03-11 17:46:59 +00:00
Evan Cheng adb9c03e41 Avoid replacing the value of a directly stored load with the stored value if the load is indexed. rdar://9117613.
llvm-svn: 127440
2011-03-11 00:48:56 +00:00
Evan Cheng b4c6a34415 Re-commit 127368 and 127371. They are exonerated.
llvm-svn: 127380
2011-03-10 00:16:32 +00:00
Evan Cheng d4b3f8e009 Revert 127368 and 127371 for now.
llvm-svn: 127376
2011-03-09 23:53:17 +00:00
Evan Cheng ca9a936332 Change the definition of TargetRegisterInfo::getCrossCopyRegClass to be more
flexible.

If it returns a register class that's different from the input, then that's the
register class used for cross-register class copies.
If it returns a register class that's the same as the input, then no cross-
register class copies are needed (normal copies would do).
If it returns null, then it's not at all possible to copy registers of the
specified register class.

llvm-svn: 127368
2011-03-09 22:47:38 +00:00
Andrew Trick 072ed2ee0d Improve pre-RA-sched register pressure tracking for duplicate operands.
This helps cases like 2008-07-19-movups-spills.ll, but doesn't have an obvious impact on benchmarks

llvm-svn: 127347
2011-03-09 19:12:43 +00:00
Benjamin Kramer b2e4d84305 Fix typo, make helper static.
llvm-svn: 127335
2011-03-09 16:19:12 +00:00
Eric Christopher 7238cba180 Fix some latent bugs if the nodes are unschedulable. We'd gotten away
with this before since none of the register tracking or nightly tests
had unschedulable nodes.

This should probably be refixed with a special default Node that just
returns some "don't touch me" values.

Fixes PR9427

llvm-svn: 127263
2011-03-08 19:35:47 +00:00
Andrew Trick 52b3e38a1f Further improvements to pre-RA-sched=list-ilp.
This change uses the MaxReorderWindow for both height and depth, which
tends to limit the negative effects of high register pressure.

llvm-svn: 127203
2011-03-08 01:51:56 +00:00
Cameron Zwarich df61694417 Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo.
llvm-svn: 127175
2011-03-07 21:56:36 +00:00
Owen Anderson cd526fa15e Use the correct LHS type when determining the legalization of a shift's RHS type.
llvm-svn: 127163
2011-03-07 18:29:47 +00:00
Eric Christopher 9cb33deebf Typo.
llvm-svn: 127131
2011-03-06 21:13:45 +00:00
Andrew Trick dd01732e63 Disable a couple of experimental heuristics to get the best results from the current implementation of -pre-RA-sched=list-ilp.
llvm-svn: 127113
2011-03-06 00:03:32 +00:00
Andrew Trick 25cedf3fe4 Be explicit with abs(). Visual Studio workaround.
llvm-svn: 127075
2011-03-05 10:29:25 +00:00
Andrew Trick d7f4c21684 Fix for -sched-high-latency-cycles in sched=list-ilp mode.
llvm-svn: 127071
2011-03-05 09:18:16 +00:00
Andrew Trick b8390b7a25 Missing comment.
llvm-svn: 127068
2011-03-05 08:04:11 +00:00
Andrew Trick 641e2d4f8c Increased the register pressure limit on x86_64 from 8 to 12
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.

Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.

Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.

llvm-svn: 127067
2011-03-05 08:00:22 +00:00
Duncan Sands 6bd1044222 Revert commit 126684 "Use the correct shift amount type". It is only the correct
type after type legalization has completed.  Before then it may simply not be big
enough to hold the shift amount, particularly on x86 which uses a very small type
for shifts (this issue broke stuff in the past which is why LegalizeTypes carefully
uses a large type for shift amounts).

llvm-svn: 127000
2011-03-04 14:28:59 +00:00
Andrew Trick c88b7ecb88 Minor pre-RA-sched fixes and cleanup.
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.

llvm-svn: 126981
2011-03-04 02:03:45 +00:00
Bill Wendling f3658f3872 There are times when the landing pad won't have a call to 'eh.selector' in
it. It's been assumed up til now that it would be in its immediate
successor. However, this isn't necessarily the case. It could be in one of its
successor's successors.

Modify the code to more thoroughly check for an 'eh.selector' call in
successors. It only looks at a successor if we get there as a result of an
unconditional branch.

Testcase ObjC/exceptions-4.m in r126968.

llvm-svn: 126969
2011-03-03 23:14:05 +00:00
Eli Friedman d8a555bb3b Revert r123908; the code in question is completely untested and wrong.
llvm-svn: 126964
2011-03-03 22:33:23 +00:00
Bob Wilson 24b3ba5990 Avoid exponential blow-up when printing DAGs.
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG.  Unfortunately
this blows up exponentially in some cases.  The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes.  Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.

llvm-svn: 126899
2011-03-02 23:38:06 +00:00
Stuart Hastings 6b4007dec6 Can't introduce floating-point immediate constants after legalization.
Radar 9056407.

llvm-svn: 126864
2011-03-02 19:36:30 +00:00
Duncan Sands cb95eeecc6 Add a few missed unary cases when legalizing vector results. Put some cases
in alphabetical order.

llvm-svn: 126745
2011-03-01 15:15:43 +00:00
Jim Grosbach 621818ab1a trailing whitespace.
llvm-svn: 126733
2011-03-01 01:39:05 +00:00
Jim Grosbach 1d479dbc55 Generalize the register matching code in DAGISel a bit.
llvm-svn: 126731
2011-03-01 01:37:19 +00:00
Owen Anderson 0dc63104c6 Use the correct shift amount type.
llvm-svn: 126684
2011-02-28 21:10:10 +00:00
Owen Anderson 4f4df81861 Clean whitespace.
llvm-svn: 126683
2011-02-28 20:57:56 +00:00
Duncan Sands f571290d1e Legalize support for fpextend of vector. PR9309.
llvm-svn: 126574
2011-02-27 14:41:27 +00:00
Nadav Rotem b00913028f Fix typos in the comments.
llvm-svn: 126565
2011-02-27 07:40:43 +00:00
Tobias Grosser 3ac8689fa3 Pass the graph to the DOTGraphTraits.getEdgeAttributes().
This follows the interface of getNodeAttributes.

llvm-svn: 126562
2011-02-27 04:11:03 +00:00
Benjamin Kramer 26691d9660 Add some DAGCombines for (adde 0, 0, glue), which are useful to optimize legalized code for large integer arithmetic.
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible

It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.

llvm-svn: 126557
2011-02-26 22:48:07 +00:00
Owen Anderson b2c80da4ae Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS.
llvm-svn: 126518
2011-02-25 21:41:48 +00:00
Jim Grosbach 14a07365cb Fix formatting of debug helper string.
llvm-svn: 126471
2011-02-25 03:59:03 +00:00
Cameron Zwarich 4c82cd21ed Set NumSignBits to 1 if KnownZero/KnownOne are being zero extended. In theory it
is possible to do better if the high bit is set in either KnownZero/KnownOne, but
in practice NumSignBits is always 1 when we are zero extending because nothing
is known about that register.

llvm-svn: 126465
2011-02-25 01:11:01 +00:00
Cameron Zwarich d2f3041c7f We only want to zero extend the existing information if the bit width is
actually larger.

llvm-svn: 126464
2011-02-25 01:10:55 +00:00
Nadav Rotem 502f1b943f Enable support for vector sext and trunc:
Limit the folding of any_ext and sext  into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.

Similar to commit 126080 (for enabling zext).

llvm-svn: 126424
2011-02-24 21:01:34 +00:00
Cameron Zwarich a62fc89a04 Merge information about the number of zero, one, and sign bits of live-out
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.

llvm-svn: 126380
2011-02-24 10:00:25 +00:00
Cameron Zwarich 3cf9280214 Add a getNumSignBits() method to APInt.
llvm-svn: 126379
2011-02-24 10:00:20 +00:00
Cameron Zwarich 97eb52da7b Add a mechanism for invalidating the LiveOutInfo of a PHI, and use it whenever
a block is visited before all of its predecessors.

llvm-svn: 126378
2011-02-24 10:00:16 +00:00
Cameron Zwarich 988faf91bd Track blocks visited in reverse postorder.
llvm-svn: 126377
2011-02-24 10:00:13 +00:00
Cameron Zwarich 6470647383 Refactor the LiveOutInfo interface into a few methods on FunctionLoweringInfo
and make the actual map private.

llvm-svn: 126376
2011-02-24 10:00:08 +00:00
Cameron Zwarich b670d512e9 Have isel visit blocks in reverse postorder rather than an undefined order. This
allows for the information propagated across basic blocks to be merged at phis.

llvm-svn: 126375
2011-02-24 10:00:04 +00:00
Cameron Zwarich f8b22b3483 Roll out r126169 and r126170 in an attempt to fix the selfhost bot.
llvm-svn: 126185
2011-02-22 03:24:52 +00:00
Cameron Zwarich 800f85baf9 Merge information about the number of zero, one, and sign bits of live-out registers
at phis. This enables us to eliminate a lot of pointless zexts during the DAGCombine
phase. This fixes <rdar://problem/8760114>.

llvm-svn: 126170
2011-02-22 00:46:27 +00:00
Cameron Zwarich f248f945c8 Have isel visit blocks in reverse postorder rather than an undefined order. This
allows for the information propagated across basic blocks to be merged at phis.

llvm-svn: 126169
2011-02-22 00:46:22 +00:00
Devang Patel f3292b2196 Revert r124611 - "Keep track of incoming argument's location while emitting LiveIns."
In other words, do not keep track of argument's location.  The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body.
This requires some coordination with debugger to get this working. 
 - The debugger needs to be aware of prolog_end attribute attached with line table entries.
 - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+)

llvm-svn: 126155
2011-02-21 23:21:26 +00:00
Nadav Rotem 25f2ac948b Fix 9267; Add vector zext support.
The DAGCombiner folds the zext into complex load instructions. This patch
prevents this optimization on vectors since none of the supported targets
knows how to perform load+vector_zext in one instruction.

llvm-svn: 126080
2011-02-20 12:37:50 +00:00
Devang Patel b7ae3ccb84 Do not lose debug info of an inlined function argument even if the argument is only used through GEPs.
This time with a fix that avoids using invalidated DenseMap iterator.

llvm-svn: 125984
2011-02-18 22:43:42 +00:00
Cameron Zwarich 0a1a36dc46 Roll out r125794 to help diagnose the llvm-gcc-i386-linux-selfhost failure.
llvm-svn: 125830
2011-02-18 04:58:10 +00:00
Devang Patel f922a431ee Do not lose debug info of an inlined function argument even if the argument is only used through GEPs.
llvm-svn: 125794
2011-02-17 23:33:27 +00:00
Duncan Sands c6196aa481 Fix wrong logic in promotion of signed mul-with-overflow (I pointed this out at
the time but presumably my email got lost).  Examples where the previous logic
got it wrong: (1) a signed i8 multiply of 64 by 2 overflows, but the high part is
zero; (2) a signed i8 multiple of -128 by 2 overflows, but the high part is all
ones. 

llvm-svn: 125748
2011-02-17 12:42:48 +00:00