Commit Graph

1089 Commits

Author SHA1 Message Date
Evan Cheng ef0b7cc2d5 On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead.
llvm-svn: 66776
2009-03-12 05:59:15 +00:00
Bill Wendling 42adc73a2b Add a -no-implicit-float flag. This acts like -soft-float, but may generate
floating point instructions that are explicitly specified by the user.

llvm-svn: 66719
2009-03-11 22:30:01 +00:00
Mon P Wang 25c6a46a81 For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits.
llvm-svn: 66684
2009-03-11 18:47:57 +00:00
Mon P Wang ce6a26cb1a Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw.
llvm-svn: 66645
2009-03-11 06:35:11 +00:00
Chris Lattner 248ad00afd formatting change, reduce indentation. No functionality change.
llvm-svn: 66642
2009-03-11 05:48:52 +00:00
Dan Gohman ff659b5b86 Arithmetic instructions don't set EFLAGS bits OF and CF bits
the same say the "test" instruction does in overflow cases,
so eliminating the test is only safe when those bits aren't
needed, as is the case for COND_E and COND_NE, or if it
can be proven that no overflow will occur. For now, just
restrict the optimization to COND_E and COND_NE and don't
do any overflow analysis.

llvm-svn: 66318
2009-03-07 01:58:32 +00:00
Dan Gohman e014b193c9 When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand.
The extra operand didn't appear to cause any trouble, but it was
erroneous regardless.

llvm-svn: 66206
2009-03-05 21:29:28 +00:00
Dan Gohman 2c2f192c74 Fix the "test" optimization to recognize "dec" as an add of
negative one, as subtracts of immediates are canonicalized
to adds.

llvm-svn: 66180
2009-03-05 19:32:48 +00:00
Dan Gohman 55d7b2ac4f Re-apply 66008, now that the unfoldMemoryOperand bug is fixed.
llvm-svn: 66058
2009-03-04 19:44:21 +00:00
Dan Gohman 6728f892be Revert r66004 for now; it's causing a variety of test failures.
llvm-svn: 66008
2009-03-04 03:54:19 +00:00
Dan Gohman fe8d71f42a Teach the x86 backend to eliminate "test" instructions by using the EFLAGS
result from add, sub, inc, and dec instructions in simple cases.

llvm-svn: 66004
2009-03-04 02:33:24 +00:00
Rafael Espindola 000421eade Refactor TLS code and add some tests. The tests and expected results are:
pic |  declaration | linkage  | visibility |

!pic |  declaration | external | default    | tls1.ll     tls2.ll     | local exec
 pic |  declaration | external | default    | tls1-pic.ll tls2-pic.ll | general dynamic
!pic | !declaration | external | default    | tls3.ll     tls4.ll     | initial exec
 pic | !declaration | external | default    | tls3-pic.ll tls4-pic.ll | general dynamic

!pic |  declaration | external | hidden     | tls7.ll     tls8.ll     | local exec
 pic |  declaration | external | hidden     | X                       | local dynamic
!pic | !declaration | external | hidden     | tls9.ll     tls10.ll    | local exec
 pic | !declaration | external | hidden     | X                       | local dynamic

!pic |  declaration | internal | default    | tls5.ll     tls6.ll     | local exec
 pic |  declaration | internal | default    | X                       | local dynamic

The ones marked with an X have not been implemented since local dynamic is not implemented.

llvm-svn: 65632
2009-02-27 13:37:18 +00:00
Evan Cheng a49de9de2e Revert BuildVectorSDNode related patches: 65426, 65427, and 65296.
llvm-svn: 65482
2009-02-25 22:49:59 +00:00
Evan Cheng 9f8fddeed8 Only v1i16 (i.e. _m64) is returned via RAX / RDX.
llvm-svn: 65313
2009-02-23 09:03:22 +00:00
Nate Begeman e684da3e5d Generate better code for v8i16 shuffles on SSE2
Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret

llvm-svn: 65311
2009-02-23 08:49:38 +00:00
Scott Michel 9d31aca679 Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR
instruction. The class also consolidates the code for detecting constant
splats that's shared across PowerPC and the CellSPU backends (and might be
useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for
generating new BUILD_VECTOR nodes.

llvm-svn: 65296
2009-02-22 23:36:09 +00:00
Evan Cheng e4ffc030e2 Be bug compatible with gcc by returning MMX values in RAX.
llvm-svn: 65274
2009-02-22 08:05:12 +00:00
Evan Cheng 2a9bad5ac1 Support return of MMX values in 64-bit mode.
llvm-svn: 65152
2009-02-20 20:43:02 +00:00
Scott Michel cf0da6c597 Remove trailing whitespace to reduce later commit patch noise.
(Note: Eventually, commits like this will be handled via a pre-commit hook that
 does this automagically, as well as expand tabs to spaces and look for 80-col
 violations.)

llvm-svn: 64827
2009-02-17 22:15:04 +00:00
Evan Cheng c2fde91703 Teach x86 target -soft-float.
llvm-svn: 64496
2009-02-13 22:36:38 +00:00
Dale Johannesen 655775293f Arrange to print constants that match "n" and "i" constraints
in inline asm as signed (what gcc does).  Add partial support
for x86-specific "e" and "Z" constraints, with appropriate
signedness for printing.

llvm-svn: 64400
2009-02-12 20:58:09 +00:00
Dale Johannesen 9c310711bb Use getDebugLoc forwarder instead of getNode()->getDebugLoc.
No functional change.

llvm-svn: 64026
2009-02-07 19:59:05 +00:00
Dan Gohman 747e55bc9a Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing
ScheduleDAG's TLI member to use const.

llvm-svn: 64018
2009-02-07 16:15:20 +00:00
Dale Johannesen 62fd95d6ec Get rid of the last non-DebugLoc versions of getNode!
Many targets build placeholder nodes for special operands, e.g.
GlobalBaseReg on X86 and PPC for the PIC base.  There's no
sensible way to associate debug info with these.  I've left
them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. 
I'm not too happy about this but don't see a good improvement;
I considered adding a getPseudoOperand or something, but it
seems to me that'll just make it harder to read.

llvm-svn: 63992
2009-02-07 00:55:49 +00:00
Dale Johannesen 84935759d5 Remove more non-DebugLoc getNode variants. Use
getCALLSEQ_{END,START} to permit passing no DebugLoc
there.  UNDEF doesn't logically have DebugLoc; add
getUNDEF to encapsulate this.

llvm-svn: 63978
2009-02-06 23:05:02 +00:00
Dale Johannesen 400dc2e2e4 Remove more non-DebugLoc versions of getNode.
llvm-svn: 63969
2009-02-06 21:50:26 +00:00
Dale Johannesen 9f3f72f144 Get rid of one more non-DebugLoc getNode and
its corresponding getTargetNode.  Lots of
caller changes.

llvm-svn: 63904
2009-02-06 01:31:28 +00:00
Dale Johannesen 021052a705 Remove non-DebugLoc versions of getLoad and getStore.
Adjust the many callers of those versions.

llvm-svn: 63767
2009-02-04 20:06:27 +00:00
Dan Gohman 556d14d483 Minor code cleanups; no functionality change.
llvm-svn: 63740
2009-02-04 17:28:58 +00:00
Mon P Wang 4379a795fe Fixes a case where we generate an incorrect mask for pshfhw in the presence
of undefs and incorrectly determining if we have punpckldq.

llvm-svn: 63702
2009-02-04 01:16:59 +00:00
Dale Johannesen bbf13f54e0 Patch up omissions in DebugLoc propagation.
llvm-svn: 63693
2009-02-04 00:33:20 +00:00
Dale Johannesen abf66b8343 Add some DL propagation to places that didn't
have it yet.  More coming.

llvm-svn: 63673
2009-02-03 22:26:09 +00:00
Dale Johannesen 1eb1ef2cfd DebugLoc propagation. done with file.
llvm-svn: 63656
2009-02-03 20:21:25 +00:00
Dale Johannesen 66e03e6f7b DebugLoc propagation. 2/3 through file.
llvm-svn: 63650
2009-02-03 19:33:06 +00:00
Evan Cheng dc636c4080 ADD / SUB / SMUL / UMUL with overflow second result top bits must be zero.
llvm-svn: 63509
2009-02-02 09:15:04 +00:00
Evan Cheng 4988c597b3 Add comment.
llvm-svn: 63506
2009-02-02 08:19:07 +00:00
Evan Cheng 50e15bdf81 Teach LowerBRCOND to recognize (xor (setcc x), 1). The xor inverts the condition. It's normally transformed by the dag combiner, unless the condition is set by a arithmetic op with overflow.
llvm-svn: 63505
2009-02-02 08:07:36 +00:00
Torok Edwin a2d1f35e9a Implement -mno-sse: if SSE is disabled on x86-64, don't store XMM on stack for
var-args, and don't allow FP return values

llvm-svn: 63495
2009-02-01 18:15:56 +00:00
Duncan Sands 3ed768868d Fix PR3453 and probably a bunch of other potential
crashes or wrong code with codegen of large integers:
eliminate the legacy getIntegerVTBitMask and
getIntegerVTSignBit methods, which returned their
value as a uint64_t, so couldn't handle huge types.

llvm-svn: 63494
2009-02-01 18:06:53 +00:00
Dale Johannesen 555a375bb6 Make LowerCallTo and LowerArguments take a DebugLoc
argument.  Adjust all callers and overloaded versions.

llvm-svn: 63444
2009-01-30 23:10:59 +00:00
Bill Wendling 8fb81f1b3d Get rid of the non-DebugLoc-ified getNOT() method.
llvm-svn: 63442
2009-01-30 23:03:19 +00:00
Mon P Wang cbb20a6ee1 When PerformBuildVectorCombine, avoid creating a X86ISD::VZEXT_LOAD of
an illegal type.

llvm-svn: 63380
2009-01-30 07:07:40 +00:00
Dan Gohman e58ab79f33 Make x86's BT instruction matching more thorough, and add some
dagcombines that help it match in several more cases. Add
several more cases to test/CodeGen/X86/bt.ll. This doesn't
yet include matching for BT with an immediate operand, it
just covers more register+register cases.

llvm-svn: 63266
2009-01-29 01:59:02 +00:00
Mon P Wang 9150f735fa Fixed lowering of v816 shuffles.
llvm-svn: 63252
2009-01-28 23:11:14 +00:00
Mon P Wang 5a685a52c1 Add shuffle splat pattern for x86 sse shifts.
llvm-svn: 63193
2009-01-28 08:12:05 +00:00
Dan Gohman 8e4ac9b71a Take the next steps in making SDUse more consistent with LLVM Use, and
tidy up SDUse and related code.
 - Replace the operator= member functions with a set method, like
   LLVM Use has, and variants setInitial and setNode, which take
   care up updating use lists, like LLVM Use's does. This simplifies
   code that calls these functions.
 - getSDValue() is renamed to get(), as in LLVM Use, though most
   places can either use the implicit conversion to SDValue or the
   convenience functions instead.
 - Fix some more node vs. value terminology issues.

Also, eliminate the one remaining use of SDOperandPtr, and
SDOperandPtr itself.

llvm-svn: 62995
2009-01-26 04:35:06 +00:00
Nate Begeman a2550a8e96 De-identifying per sabre review
llvm-svn: 62988
2009-01-26 03:15:31 +00:00
Nate Begeman 8a51d8c8f7 Support pattern matching various x86 sse shifts.
llvm-svn: 62979
2009-01-26 00:52:55 +00:00
Bob Wilson c58900504b Add SelectionDAG::getNOT method to construct bitwise NOT operations,
corresponding to the "not" and "vnot" PatFrags.  Use the new method
in some places where it seems appropriate.

llvm-svn: 62768
2009-01-22 17:39:32 +00:00
Evan Cheng 8f367e53c7 Minor tweak to LowerUINT_TO_FP_i32. Bias (after scalar_to_vector) has two uses so we should make it the second source operand of ISD::OR so 2-address pass won't have to be smart about commuting.
%reg1024<def> = MOVSDrm %reg0, 1, %reg0, <cp#0>, Mem:LD(8,8) [ConstantPool + 0]
%reg1025<def> = MOVSD2PDrr %reg1024
%reg1026<def> = MOVDI2PDIrm <fi#-1>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack-1 + 0]
%reg1027<def> = ORPSrr %reg1025<kill>, %reg1026<kill>
%reg1028<def> = MOVPD2SDrr %reg1027<kill>
%reg1029<def> = SUBSDrr %reg1028<kill>, %reg1024<kill>
%reg1030<def> = CVTSD2SSrr %reg1029<kill>
MOVSSmr <fi#0>, 1, %reg0, 0, %reg1030<kill>, Mem:ST(4,4) [FixedStack0 + 0]
%reg1031<def> = LD_Fp32m80 <fi#0>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack0 + 0]
RET %reg1031<kill>, %ST0<imp-use,kill>

The reason 2-addr pass isn't smart enough to commute the ORPSrr is because it can't look pass the MOVSD2PDrr instruction.

llvm-svn: 62505
2009-01-19 08:19:57 +00:00
Evan Cheng 7e9ef4d776 Now not UINT_TO_FP is legal (it's marked custom), dag combiner won't
optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself.

llvm-svn: 62504
2009-01-19 08:08:22 +00:00
Bill Wendling f9291cf43c Extend thi
llvm-svn: 62415
2009-01-17 07:40:19 +00:00
Bill Wendling dd40f26877 Temporarily revert my last change. It is causing a bootstrap failure.
llvm-svn: 62405
2009-01-17 04:23:51 +00:00
Bill Wendling 4d5275905e Implement a special algorithm for converting uint_to_fp for i32 values on
X86. This code:

void f() {
  uint32_t x;
  float y = (float)x;
}

used to be:

     movl     %eax, -8(%ebp)
     movl     [2^52 double], -4(%ebp)
     movsd    -8(%ebp), %xmm0
     subsd    [2^52 double], %xmm0
     cvtsd2ss %xmm0, %xmm0

Is now:

   movsd        [2^52 double], %xmm0
   movsd        %xmm0, %xmm1
   movd         %ecx, %xmm2
   orps         %xmm2, %xmm1
   subsd        %xmm0, %xmm1
   cvtsd2ss     %xmm1, %xmm0

This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That
will be fixed in a later coalescer fix.

llvm-svn: 62404
2009-01-17 03:56:04 +00:00
Bill Wendling e04334730e Add support for non-zero __builtin_return_address values on X86.
llvm-svn: 62338
2009-01-16 19:25:27 +00:00
Mon P Wang ebfafee903 Expand insert/extract of a <4 x i32> with a variable index.
llvm-svn: 62281
2009-01-15 21:10:20 +00:00
Dan Gohman 0ad43ca6e5 Make getWidenVectorType const.
llvm-svn: 62265
2009-01-15 17:34:08 +00:00
Dan Gohman a63bede3c6 BT appears to be available on all >= i386 chips.
llvm-svn: 62196
2009-01-13 23:27:15 +00:00
Dan Gohman d3942af5cb Don't use a BT instruction if the AND has multiple uses.
llvm-svn: 62195
2009-01-13 23:25:30 +00:00
Devang Patel 5c6e1e3b7d Use DebugInfo interface to lower dbg_* intrinsics.
llvm-svn: 62127
2009-01-13 00:35:13 +00:00
Dan Gohman 33e6fcd56f X86_COND_C and X86_COND_NC are alternate mnemonics for
X86_COND_B and X86_COND_AE, respectively.

llvm-svn: 61835
2009-01-07 00:15:08 +00:00
Devang Patel 56a8bb670f squash warnings.
llvm-svn: 61707
2009-01-05 17:31:22 +00:00
Evan Cheng 1671a309fd Use movaps / movd to extract vector element 0 even with sse4.1. It's still cheaper than pextrw especially if the value is in memory.
llvm-svn: 61555
2009-01-02 05:29:08 +00:00
Duncan Sands 8feb694e8f Fix PR3274: when promoting the condition of a BRCOND node,
promote from i1 all the way up to the canonical SetCC type.
In order to discover an appropriate type to use, pass
MVT::Other to getSetCCResultType.  In order to be able to
do this, change getSetCCResultType to take a type as an
argument, not a value (this is also more logical).

llvm-svn: 61542
2009-01-01 15:52:00 +00:00
Chris Lattner 2a7c988627 Add a simple pattern for matching 'bt'.
llvm-svn: 61426
2008-12-25 05:34:37 +00:00
Chris Lattner 8175f27d3f translateX86CC can never fail. Simplify it based on this.
llvm-svn: 61423
2008-12-24 23:53:05 +00:00
Chris Lattner 4b46b74ece indentation
llvm-svn: 61407
2008-12-24 00:11:37 +00:00
Chris Lattner e9988b661d simplify some control flow and reduce indentation, no functionality change.
llvm-svn: 61404
2008-12-23 23:42:27 +00:00
Dan Gohman 25a767d7f4 Add instruction patterns and encodings for the x86 bt instructions.
llvm-svn: 61400
2008-12-23 22:45:23 +00:00
Dan Gohman 12f2490489 Clean up the atomic opcodes in SelectionDAG.
This removes all the _8, _16, _32, and _64 opcodes and replaces each
group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode
is now used to carry the size information. In tablegen, the size-specific
opcodes are replaced by size-independent opcodes that utilize the
ability to compose them with predicates.

This shrinks the per-opcode tables and makes the code that handles
atomics much more concise.

llvm-svn: 61389
2008-12-23 21:37:04 +00:00
Mon P Wang ec95070ca3 Fixed code generation for v8i16 and v16i8 splats on X86.
Fixed lowering of v8i16 shuffles for v8i16 when we fall back to extract/insert.

llvm-svn: 61365
2008-12-23 04:03:27 +00:00
Mon P Wang 998fd29ce1 Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1.
llvm-svn: 61211
2008-12-18 21:42:19 +00:00
Bill Wendling c4499feb1a - Use patterns instead of creating completely new instruction matching patterns,
which are identical to the original patterns.

- Change the multiply with overflow so that we distinguish between signed and
  unsigned multiplication. Currently, unsigned multiplication with overflow
  isn't working!

llvm-svn: 60963
2008-12-12 21:15:41 +00:00
Mon P Wang 9c2d26d208 Added support for SELECT v8i8 v4i16 for X86 (MMX)
Added support for TRUNC v8i16 to v8i8 for X86 (MMX)

llvm-svn: 60916
2008-12-12 01:25:51 +00:00
Bill Wendling 1a317678bc Redo the arithmetic with overflow architecture. I was changing the semantics of
ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace
the intrinsic with an ISD::SADDO node. Then custom lower that into an
X86ISD::ADD node with a associated SETCC that checks the correct condition code
(overflow or carry). Then that gets lowered into the correct X86::ADDOvf
instruction.

Similar for SUB and MUL instructions.

llvm-svn: 60915
2008-12-12 00:56:36 +00:00
Bill Wendling f482f379ef Whitespace changes.
llvm-svn: 60826
2008-12-10 02:01:32 +00:00
Bill Wendling db8ec2d75a Add sub/mul overflow intrinsics. This currently doesn't have a
target-independent way of determining overflow on multiplication. It's very
tricky. Patch by Zoltan Varga!

llvm-svn: 60800
2008-12-09 22:08:41 +00:00
Dale Johannesen 9efd2ce55b Make LoopStrengthReduce smarter about hoisting things out of
loops when they can be subsumed into addressing modes.

Change X86 addressing mode check to realize that
some PIC references need an extra register.
(I believe this is correct for Linux, if not, I'm sure
someone will tell me.)

llvm-svn: 60608
2008-12-05 21:47:27 +00:00
Evan Cheng 501089f6f4 Refactor code. No functionality change.
llvm-svn: 60478
2008-12-03 08:38:43 +00:00
Bill Wendling f8d1ef9842 CC should only be a ConstantSDNode at this point. Just use 'cast' instead of 'dyn_cast'.
llvm-svn: 60477
2008-12-03 08:32:02 +00:00
Bill Wendling 30e9dc81c8 Second stab at target-dependent lowering of everyone's favorite nodes: [SU]ADDO
- LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The
  EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C,
  depending on the type of ADDO requested.

- LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or
  COND_C set.

llvm-svn: 60388
2008-12-02 01:06:39 +00:00
Duncan Sands 3d960941b1 There are no longer any places that require a
MERGE_VALUES node with only one operand, so get
rid of special code that only existed to handle
that possibility.

llvm-svn: 60349
2008-12-01 11:41:29 +00:00
Duncan Sands 6ed40141f7 Change the interface to the type legalization method
ReplaceNodeResults: rather than returning a node which
must have the same number of results as the original
node (which means mucking around with MERGE_VALUES,
and which is also easy to get wrong since SelectionDAG
folding may mean you don't get the node you expect),
return the results in a vector.

llvm-svn: 60348
2008-12-01 11:39:25 +00:00
Bill Wendling 128f032cc8 Comment out code that isn't entirely correct.
llvm-svn: 60156
2008-11-27 07:18:35 +00:00
Bill Wendling 751a694ad3 Generate something sensible for an [SU]ADDO op when the overflow/carry flag is
the conditional for the BRCOND statement. For instance, it will generate:

    addl %eax, %ecx
    jo LOF

instead of

    addl %eax, %ecx
    ; About 10 instructions to compare the signs of LHS, RHS, and sum.
    jl LOF

llvm-svn: 60123
2008-11-26 22:37:40 +00:00
Bill Wendling 66835479d7 - Make lowering of "add with overflow" customizable by back-ends.
- Mark "add with overflow" as having a custom lowering for X86. Give it a null
  lowering representation for now.

llvm-svn: 59971
2008-11-24 19:21:46 +00:00
Mon P Wang 35a70ec131 Added missing description for -disable-mmx option.
llvm-svn: 59929
2008-11-24 02:10:43 +00:00
Duncan Sands 8d6e2e13d5 Rename SetCCResultContents to BooleanContents. In
practice these booleans are mostly produced by SetCC,
however the concept is more general.

llvm-svn: 59911
2008-11-23 15:47:28 +00:00
Mon P Wang 0aa8f0a549 Added -disable-mmx using a patch from Preston Gurd.
llvm-svn: 59901
2008-11-23 04:37:22 +00:00
Dale Johannesen bee1ad9707 Extend InlineAsm::C_Register to allow multiple specific registers
(actually, code already all worked, only the comment
changed).  Use this to implement 'A' constraint on x86.
Fixes PR 1779.

llvm-svn: 59266
2008-11-13 21:52:36 +00:00
Mon P Wang 9a8d60a7c0 Widening cleanup
llvm-svn: 58796
2008-11-06 05:31:54 +00:00
Evan Cheng 3cd5e8c97b Indentation.
llvm-svn: 58750
2008-11-05 06:03:38 +00:00
Dan Gohman 99cdf8893e Use MOVSSmr instead of EXTRACTPSmr in the case of extracting
vector element 0 for a store, as it's smaller and faster.

llvm-svn: 58483
2008-10-31 00:57:24 +00:00
Mon P Wang 58c3794c27 Add initial support for vector widening. Logic is set to widen for X86.
One will only see an effect if legalizetype is not active.  Will move
support to LegalizeType soon.

llvm-svn: 58426
2008-10-30 08:01:45 +00:00
Chris Lattner 38461f6b2f Fix a nasty miscompilation of 176.gcc on linux/x86 where we synthesized
a memset using 16-byte XMM stores, but where the stack realignment code
didn't work.  Until it does (PR2962) disable use of xmm regs in memcpy
and memset formation for linux and other targets with insufficiently
aligned stacks.

This is part of PR2888

llvm-svn: 58317
2008-10-28 05:49:35 +00:00
Duncan Sands 014f5bbaad Fix translateX86CC: if SetCCOpcode is SETULE and
LHS is a foldable load, then LHS and RHS are swapped
and SetCCOpcode is changed to SETUGT.  But the later
code is expecting operands to be the wrong way round
for SETUGT, but they are not in this case, resulting
in an inverted compare.  The solution is to move the
load normalization before the correction for SETUGT.
This bug was tickled by LegalizeTypes which happened
to legalize the testcase slightly differently to
LegalizeDAG.

llvm-svn: 58092
2008-10-24 13:03:10 +00:00
Dale Johannesen f6655a9e79 Remove allocation of unused stack slot.
llvm-svn: 57987
2008-10-22 17:26:06 +00:00
Duncan Sands 5ee1dde8fa Get this working with LegalizeTypes: (1) don't
assume that i64 has been turned into a BUILD_PAIR
node (when called from LegalizeTypes this hasn't
happened yet) and don't use a vector shuffle mask
with an illegal element type.

llvm-svn: 57972
2008-10-22 11:24:12 +00:00
Dale Johannesen cf4607fcce Adjust comments for pedantic satisfaction.
llvm-svn: 57940
2008-10-22 00:02:32 +00:00
Dale Johannesen 3d7ece1acb Add comments to explain uint64->f64 algorithm,
well, sort of.  (Algorithm by Ian Ollmann.)

llvm-svn: 57932
2008-10-21 23:07:49 +00:00
Dale Johannesen 28929589e7 Add an SSE2 algorithm for uint64->f64 conversion.
The same one Apple gcc uses, faster.  Also gets the
extreme case in gcc.c-torture/execute/ieee/rbug.c
correct which we weren't before; this is not
sufficient to get the test to pass though, there
is another bug.

llvm-svn: 57926
2008-10-21 20:50:01 +00:00
Dan Gohman 269246b034 Don't create TargetGlobalAddress nodes with offsets that don't fit
in the 32-bit signed offset field of addresses. Even though this
may be intended, some linkers refuse to relocate code where the
relocated address computation overflows.

Also, fix the sign-extension of constant offsets to use the
actual pointer size, rather than the size of the GlobalAddress
node, which may be different, for example on x86-64 where MVT::i32
is used when the address is being fit into the 32-bit displacement
field.

llvm-svn: 57885
2008-10-21 03:38:42 +00:00
Dan Gohman 97d95d6d85 Optimized FCMP_OEQ and FCMP_UNE for x86.
Where previously LLVM might emit code like this:

        ucomisd %xmm1, %xmm0
        setne   %al
        setp    %cl
        orb     %al, %cl
        jne     .LBB4_2

it now emits this:

        ucomisd %xmm1, %xmm0
        jne     .LBB4_2
        jp      .LBB4_2

It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.

To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.

Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.

llvm-svn: 57873
2008-10-21 03:29:32 +00:00
Duncan Sands 1d20ab5784 Have X86 custom lowering for LegalizeTypes use
LowerOperation if it doesn't know what else to do.
This methods should probably be factorized some,
but this is good enough for the moment.  Have
LowerATOMIC_BINARY_64 use EXTRACT_ELEMENT rather
than assuming the operand is a BUILD_PAIR (if it
is then getNode will automagically simplify the
EXTRACT_ELEMENT).  This way LowerATOMIC_BINARY_64
usable from LegalizeTypes.

llvm-svn: 57831
2008-10-20 15:56:33 +00:00
Dan Gohman 2fe6bee5b6 Teach DAGCombine to fold constant offsets into GlobalAddress nodes,
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)

This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.

This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.

Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.

The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.

llvm-svn: 57748
2008-10-18 02:06:02 +00:00
Chris Lattner 8e2ef196ae add support for 128 bit inputs on both x86-64 and x86-32.
llvm-svn: 57709
2008-10-17 18:15:05 +00:00
Chris Lattner c7e65f4377 Fix a bug where the x86 backend would reject 64-bit r constraints when
in 32-bit mode instead of assigning a register pair.  This has nothing to
do with PR2356, but I happened to notice it while working on it.

llvm-svn: 57704
2008-10-17 17:59:52 +00:00
Dan Gohman 4a87660127 Remove an unused variable.
llvm-svn: 57621
2008-10-16 01:47:47 +00:00
Evan Cheng 3b0f5e4d61 - Add target lowering hooks that specify which setcc conditions are illegal,
i.e. conditions that cannot be checked with a single instruction. For example,
SETONE and SETUEQ on x86.
- Teach legalizer to implement *illegal* setcc as a and / or of a number of
legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is
implemented as SETO & SETNE, SETUEQ is SETUO | SETEQ.
- Move x86 target over.

llvm-svn: 57542
2008-10-15 02:05:31 +00:00
Dan Gohman e7ced74558 FastISel support for exception-handling constructs.
- Move the EH landing-pad code and adjust it so that it works
   with FastISel as well as with SDISel.
 - Add FastISel support for @llvm.eh.exception and
   @llvm.eh.selector.

llvm-svn: 57539
2008-10-14 23:54:11 +00:00
Evan Cheng 07d53b1d33 Rename LoadX to LoadExt.
llvm-svn: 57526
2008-10-14 21:26:46 +00:00
Chris Lattner 2753955fc0 Change CALLSEQ_BEGIN and CALLSEQ_END to take TargetConstant's as
parameters instead of raw Constants.  This prevents the constants from
being selected by the isel pass, fixing PR2735.

llvm-svn: 57385
2008-10-11 22:08:30 +00:00
Dale Johannesen 4f0bd68cfe Add a "loses information" return value to APFloat::convert
and APFloat::convertToInteger.  Restore return value to
IEEE754.  Adjust all users accordingly.

llvm-svn: 57329
2008-10-09 23:00:39 +00:00
Evan Cheng 94d14f2d45 Fix PR2850 and PR2863. Only generate movddup for 128-bit SSE vector shuffles.
llvm-svn: 57210
2008-10-06 21:13:08 +00:00
Dale Johannesen 8c36a1c09c Make atomic Swap work, 64-bit on x86-32.
Make it all work in non-pic mode.

llvm-svn: 57034
2008-10-03 22:25:52 +00:00
Dale Johannesen 5d60c1ebb1 Pass MemOperand through for 64-bit atomics on 32-bit,
incidentally making the case where the memop is a
pointer deref work.  Fix cmp-and-swap regression.

llvm-svn: 57027
2008-10-03 19:41:08 +00:00
Dan Gohman 0d1e9a8e04 Switch the MachineOperand accessors back to the short names like
isReg, etc., from isRegister, etc.

llvm-svn: 57006
2008-10-03 15:45:36 +00:00
Dale Johannesen 867d549fce Handle some 64-bit atomics on x86-32, some of the time.
llvm-svn: 56963
2008-10-02 18:53:47 +00:00
Bill Wendling 68f12ee567 Implement the -fno-builtin option in the front-end, not in the back-end.
llvm-svn: 56900
2008-10-01 00:59:58 +00:00
Bill Wendling 1782584f56 Just don't transform this memset into "bzero" if no-builtin is specified.
llvm-svn: 56888
2008-09-30 22:05:33 +00:00
Bill Wendling bd09262e97 Add the new `-no-builtin' flag. This flag is meant to mimic the GCC
`-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero"
instead of "__bzero" on Darwin10+. This arguably violates the meaning of this
flag, but is currently sufficient. The meaning of this flag should become more
specific over time.

llvm-svn: 56885
2008-09-30 21:22:07 +00:00
Dale Johannesen f61a84ec43 Remove misuse of ReplaceNodeResults for atomics with
valid types.  No functional change.

llvm-svn: 56808
2008-09-29 22:25:26 +00:00
Evan Cheng 3774b2f292 Re-apply 56683 with fixes.
llvm-svn: 56748
2008-09-27 01:56:22 +00:00
Bill Wendling c966a737c5 Temporarily reverting r56683. This is causing a failure during the build of llvm-gcc:
/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -mmacosx-version-min=10.4 -O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include  -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Gir/devel/llvm/clean/llvm.obj/include -I/Volumes/Gir/devel/llvm/clean/llvm.src/include -fexceptions -fvisibility=hidden -DHIDE_EXPORTS -c ../../llvm-gcc.src/gcc/unwind-dw2-fde-darwin.c -o libgcc/./unwind-dw2-fde-darwin.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Gir/devel/llvm/clean/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311.
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:3521:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
{standard input}:3521:symbol: "_dwarf_reg_size_table" can't be undefined in a subtraction expression
{standard input}:3520:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb"
...

llvm-svn: 56703
2008-09-26 22:10:44 +00:00
Dan Gohman 6e0548336a Rename ConstantSDNode's getSignExtended to getSExtValue, for
consistancy with ConstantInt, and re-implement it in terms
of ConstantInt's getSExtValue.

llvm-svn: 56700
2008-09-26 21:54:37 +00:00
Evan Cheng d77cbe8947 Fix @llvm.frameaddress codegen. FP elimination optimization should be disabled when frame address is desired. Also add support for depth > 0.
llvm-svn: 56683
2008-09-26 19:48:35 +00:00
Dale Johannesen 0e32a2c935 Add "inreg" field to CallSDNode (doesn't increase
its size).  Adjust various lowering functions to
pass this info through from CallInst.  Use it to
implement sseregparm returns on X86.  Remove
X86_ssecall calling convention.

llvm-svn: 56677
2008-09-26 19:31:26 +00:00
Evan Cheng 9dbe45c000 Prefer movlhps over punpcklqdq, etc. in more cases.
llvm-svn: 56627
2008-09-25 23:35:16 +00:00
Devang Patel 4c758ea3e0 Large mechanical patch.
s/ParamAttr/Attribute/g
s/PAList/AttrList/g
s/FnAttributeWithIndex/AttributeWithIndex/g
s/FnAttr/Attribute/g

This sets the stage 
- to implement function notes as function attributes and 
- to distinguish between function attributes and return value attributes.

This requires corresponding changes in llvm-gcc and clang.

llvm-svn: 56622
2008-09-25 21:00:45 +00:00
Evan Cheng 74c9ed91b0 With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps
llvm-svn: 56620
2008-09-25 20:50:48 +00:00
Evan Cheng 4751549f9b X86ISD::VZEXT_LOAD should produce and fold a chain.
llvm-svn: 56593
2008-09-24 23:26:36 +00:00
Evan Cheng e0add20c1b Properly handle 'm' inline asm constraints. If a GV is being selected for the addressing mode, it requires the same logic for PIC relative addressing, etc.
llvm-svn: 56526
2008-09-24 00:05:32 +00:00
Dan Gohman 918fe08a56 Arrange for FastISel code to have access to the MachineModuleInfo
object. This will be needed to support debug info.

llvm-svn: 56508
2008-09-23 21:53:34 +00:00
Evan Cheng 9e9426cb82 Support x86 specific inline asm modifier 'J'.
llvm-svn: 56483
2008-09-22 23:57:37 +00:00
Dale Johannesen 7a74e71489 Make log, log2, log10, exp, exp2 use Expand by
default.

llvm-svn: 56471
2008-09-22 21:57:32 +00:00
Arnold Schwaighofer 796a271c5f Change the calling convention used when tail call optimization is enabled from CC_X86_32_TailCall to CC_X86_32_FastCC.
llvm-svn: 56436
2008-09-22 14:50:07 +00:00
Bill Wendling 24c79f28b1 Reverting r56249. On further investigation, this functionality isn't needed.
Apologies for the thrashing.

llvm-svn: 56251
2008-09-16 21:48:12 +00:00
Bill Wendling 8bc392fb1d - Change "ExternalSymbolSDNode" to "SymbolSDNode".
- Add linkage to SymbolSDNode (default to external).
- Change ISD::ExternalSymbol to ISD::Symbol.
- Change ISD::TargetExternalSymbol to ISD::TargetSymbol

These changes pave the way to allowing SymbolSDNodes with non-external linkage.

llvm-svn: 56249
2008-09-16 21:12:30 +00:00
Dan Gohman 38453eebdc Remove isImm(), isReg(), and friends, in favor of
isImmediate(), isRegister(), and friends, to avoid confusion
about having two different names with the same meaning. I'm
not attached to the longer names, and would be ok with
changing to the shorter names if others prefer it.

llvm-svn: 56189
2008-09-13 17:58:21 +00:00
Dan Gohman d3fe174c53 Define CallSDNode, an SDNode subclass for use with ISD::CALL.
Currently it just holds the calling convention and flags
for isVarArgs and isTailCall.

And it has several utility methods, which eliminate magic
5+2*i and similar index computations in several places.

CallSDNodes are not CSE'd. Teach UpdateNodeOperands to handle
nodes that are not CSE'd gracefully.

llvm-svn: 56183
2008-09-13 01:54:27 +00:00
Dan Gohman effb894453 Rename ConstantSDNode::getValue to getZExtValue, for consistency
with ConstantInt. This led to fixing a bug in TargetLowering.cpp
using getValue instead of getAPIntValue.

llvm-svn: 56159
2008-09-12 16:56:44 +00:00
Arnold Schwaighofer dd45bc25ac When tailcallopt is enabled all fastcc calls must have an aligned argument stack size. Add a test case.
llvm-svn: 56119
2008-09-11 20:28:43 +00:00
Dale Johannesen 58d084c05b The version of AtomicSDNode::AtomicSDNode used (only) for
cmp-and-swap reversed the Cmp and Swap arguments; comments
make it clear this is unintentional.  Unfortunately, the
x86 BE had a compensating reversal, which is removed here.
PPC is OK.

From inspection of the Alpha code I think it is OK, but
if somebody has that platform please check it out.  I
cannot test on that platform.

llvm-svn: 56091
2008-09-11 03:12:59 +00:00
Dan Gohman 39d82f902a Add X86FastISel support for static allocas, and refences
to static allocas. As part of this change, refactor the
address mode code for laods and stores.

llvm-svn: 56066
2008-09-10 20:11:02 +00:00
Evan Cheng 710c3cf36a Fix a fastcc + sret bug. If fastcc and sret, callee doesn't need to pop the hidden struct ptr; Re-enable fastcc.
llvm-svn: 56061
2008-09-10 18:25:29 +00:00
Dale Johannesen 4cc893bab6 Handle new intrinsics with vector arguments.
Patch by Paul Redmond.

llvm-svn: 56059
2008-09-10 17:31:40 +00:00
Duncan Sands 6d6a65310b Fix name.
llvm-svn: 56055
2008-09-10 13:22:10 +00:00
Duncan Sands 83e45acc25 Add trampoline support for the new FastCC calling
convention (not related to recent Ada testsuite
failures).

llvm-svn: 56054
2008-09-10 13:11:09 +00:00
Duncan Sands 536c399579 Turn off the new FastCC for the moment. It causes
a slew of Ada testsuite failures on x86-32 linux.
Seems to be related to the use of float.

llvm-svn: 56053
2008-09-10 13:09:24 +00:00
Anton Korobeynikov 6acb2219b6 Replace explicit pointer-size constants to TargetData query.
No functionality change.

llvm-svn: 55996
2008-09-09 18:22:57 +00:00
Anton Korobeynikov 2fd24e7713 Reapply 55899: First draft of EH support on x86/64-linux
Now with fix, which prevents subtle codegen bug to trigger on darwin.
No fix for bug though, it's still there.

llvm-svn: 55955
2008-09-08 21:12:47 +00:00
Anton Korobeynikov 4112634ca6 Reapply blindly reverted 55898: Implement FRAME_TO_ARGS_OFFSET for x86-64
llvm-svn: 55954
2008-09-08 21:12:11 +00:00
Bill Wendling 3871441861 Reverting r55898 as well. This wasn't reverted in the original revert...
llvm-svn: 55938
2008-09-08 19:42:32 +00:00
Bill Wendling 99b83712f3 Reverting r55898 to r55909. One of these patches was causing an ICE during the full bootstrap on Darwin:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_negdi2 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_negdi2_s.o
Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) &&
TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical
register live information"), function runOnMachineFunction, file
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp,
line 311.
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/
-B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include
-isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include
-O2  -O2 -g -O2  -DIN_GCC    -W -Wall -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include  -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2
-D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../llvm-gcc.src/gcc
-I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include
-I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include
-I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include
-I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include
-DSHARED -m64 -DL_lshrdi3 -c ../../llvm-gcc.src/gcc/libgcc2.c -o
libgcc/x86_64/_lshrdi3_s.o
../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
{standard input}:unknown:Undefined local symbol LBB21_11
{standard input}:unknown:Undefined local symbol LBB21_12
{standard input}:unknown:Undefined local symbol LBB21_13
{standard input}:unknown:Undefined local symbol LBB21_8

llvm-svn: 55928
2008-09-08 17:59:12 +00:00
Anton Korobeynikov 82b9540046 First draft of EH support on x86/64-linux
llvm-svn: 55899
2008-09-08 14:21:53 +00:00
Anton Korobeynikov cb0655d626 Implement FRAME_TO_ARGS_OFFSET for x86-64
llvm-svn: 55898
2008-09-08 14:21:10 +00:00
Evan Cheng 6f343bd543 Some code clean up.
llvm-svn: 55881
2008-09-07 09:07:23 +00:00
Evan Cheng 6c94b99c62 For whatever the reason, x86 CallingConv::Fast (i.e. fastcc) was not passing scalar arguments in registers. This patch defines a new fastcc CC which is slightly different from the FastCall CC. In addition to passing integer arguments in ECX and EDX, it also specify doubles are passed in 8-byte slots which are 8-byte aligned (instead of 4-byte aligned). This avoids a potential performance hazard where doubles span cacheline boundaries.
llvm-svn: 55807
2008-09-04 22:59:58 +00:00
Evan Cheng 3152edf474 Remove code that pad number of bytes to pop for X86_FastCall CC. The code doesn't do the "aligning" for Cygwin, Mingw, and Windows. But aligning it on Darwin and Linux breaks gcc compatibility. That ruled out all the platforms we support!
llvm-svn: 55756
2008-09-04 01:04:15 +00:00
Dale Johannesen da2d80688b Add intrinsics for log, log2, log10, exp, exp2.
No functional change (and no FE change to generate them).

llvm-svn: 55753
2008-09-04 00:47:13 +00:00
Dan Gohman 7bda51f5a4 Create HandlePHINodesInSuccessorBlocksFast, a version of
HandlePHINodesInSuccessorBlocks that works FastISel-style. This
allows PHI nodes to be updated correctly while using FastISel.

This also involves some code reorganization; ValueMap and
MBBMap are now members of the FastISel class, so they needn't
be passed around explicitly anymore. Also, SelectInstructions
is changed to SelectInstruction, and only does one instruction
at a time.

llvm-svn: 55746
2008-09-03 23:12:08 +00:00
Evan Cheng 24422d4928 Let tblgen only generate fastisel routines, not the class definition. This makes it easier for targets to define its own fastisel class.
llvm-svn: 55679
2008-09-03 00:03:49 +00:00
Evan Cheng 3fddc7e906 Swap fp comparison operands and change predicate to allow load folding (safely this time).
llvm-svn: 55553
2008-08-29 23:22:12 +00:00
Evan Cheng b3ed09703c Backing out 55521. Not safe.
llvm-svn: 55548
2008-08-29 22:13:21 +00:00
Evan Cheng 960b17a3c2 Swap fp comparison operands and change predicate to allow load folding.
llvm-svn: 55521
2008-08-28 23:48:31 +00:00
Gabor Greif 95d77f5466 remove tabs, fix > 80 cols
llvm-svn: 55511
2008-08-28 23:19:51 +00:00
Gabor Greif f304a7aa4d erect abstraction boundaries for accessing SDValue members, rename Val -> Node to reflect semantics
llvm-svn: 55504
2008-08-28 21:40:38 +00:00
Rafael Espindola 26d54b3ef3 Use resize instead of reserve. Reserve doesn't change size().
llvm-svn: 55486
2008-08-28 18:32:53 +00:00
Dale Johannesen 41be0d4445 Split the ATOMIC NodeType's to include the size, e.g.
ATOMIC_LOAD_ADD_{8,16,32,64} instead of ATOMIC_LOAD_ADD.
Increased the Hardcoded Constant OpActionsCapacity to match.
Large but boring; no functional change.

This is to support partial-word atomics on ppc; i8 is
not a valid type there, so by the time we get to lowering, the
ATOMIC_LOAD nodes looks the same whether the type was i8 or i32.
The information can be added to the AtomicSDNode, but that is the
largest SDNode; I don't fully understand the SDNode allocation,
but it is sensitive to the largest node size, so increasing
that must be bad.  This is the alternative.

llvm-svn: 55457
2008-08-28 02:44:49 +00:00
Gabor Greif abfdf928d8 disallow direct access to SDValue::ResNo, provide a getter instead
llvm-svn: 55394
2008-08-26 22:36:50 +00:00
Chris Lattner 09f8cef571 If an xmm register is referenced explicitly in an inline asm, make sure to
assign it to a version of the xmm register with the regclass that matches its
type.  This fixes PR2715, a bug handling some crazy xpcom case in mozilla.

llvm-svn: 55358
2008-08-26 06:19:02 +00:00
Evan Cheng f00f1e50b5 Try approach to moving call address load inside of callseq_start. Now it's done during the preprocess of x86 isel. callseq_start's chain is changed to load's chain node; while load's chain is the last of callseq_start or the loads or copytoreg nodes inserted to move arguments to the right spot.
llvm-svn: 55338
2008-08-25 21:27:18 +00:00
Bill Wendling 5b836c5f77 Temporarily reverting r55292. It's causing a bootstraping failure:
/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc ... src/libiberty/make-temp-file.c -o make-temp-file.o
Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->Dep->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp, line 508.
../../../../llvm-gcc.src/libiberty/hashtab.c:955: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://developer.apple.com/bugreporter> for instructions.
make[4]: *** [hashtab.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [multi-do] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-target-libiberty] Error 2
make: *** [all] Error 2

llvm-svn: 55295
2008-08-24 21:45:30 +00:00
Evan Cheng 8fa17424f7 Move callseq_start above the call address load to allow load to be folded into the call node.
llvm-svn: 55292
2008-08-24 19:19:55 +00:00
Bill Wendling 2fd7dbaf1d If part of the mask is "undef", then ignore it as we don't care what goes into it.
llvm-svn: 55147
2008-08-21 22:36:36 +00:00
Bill Wendling 765d3e0013 Fix whitespace. No functionality change.
llvm-svn: 55146
2008-08-21 22:35:37 +00:00
Evan Cheng 9534ea03e8 Fix a number of byval / memcpy / memset related codegen issues.
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.

llvm-svn: 55139
2008-08-21 21:00:15 +00:00
Mon P Wang 5c2ac4a5e0 Treat floating point ST1 the same as ST0 when lowering for a call result
llvm-svn: 55135
2008-08-21 19:54:16 +00:00
Dan Gohman 02c84b8910 Simplify FastISel's constructor argument list, make the FastISel
class hold a MachineRegisterInfo member, and make the
MachineBasicBlock be passed in to SelectInstructions rather
than the FastISel constructor.

llvm-svn: 55076
2008-08-20 21:05:57 +00:00
Dale Johannesen 6f765f392c Add remaining 64-bit atomic patterns for x86-64.
llvm-svn: 55029
2008-08-20 00:48:50 +00:00
Bill Wendling f00f3055d8 Revert r55018 and apply the correct "fix" for the 64-bit sub_and_fetch atomic.
Just expand it like the other X-bit sub_and_fetches.

llvm-svn: 55023
2008-08-20 00:28:16 +00:00
Dan Gohman daef7f43af Instantiate FastISel for X86.
llvm-svn: 55011
2008-08-19 21:45:35 +00:00
Dan Gohman 4619e93bd3 The X86 target will soon have an implementation of createFastISel.
llvm-svn: 55010
2008-08-19 21:32:53 +00:00
Dale Johannesen 5afbf510aa Add support for 8 and 16 bit forms of __sync
builtins on X86.

Change "lock" instructions to be on a separate line.
This is needed to work around a bug in the Darwin
assembler.

llvm-svn: 54999
2008-08-19 18:47:28 +00:00
Evan Cheng ab35bfdf18 Fix a (u)comiss intrinsic lowering bug. It was using anyext which can return junk in higher bits. Patch by Nate Begeman.
llvm-svn: 54903
2008-08-17 19:22:34 +00:00
Anton Korobeynikov 93584cd5a0 Use correct name for TLS address resolution routine on x86-64
llvm-svn: 54845
2008-08-16 12:58:29 +00:00
Dan Gohman 7c2bf62b14 Also avoid pinsrw and pinsrb with a variable insertelement index.
llvm-svn: 54803
2008-08-14 22:53:18 +00:00
Dan Gohman 65d83ccf26 Don't try to use the insertps instruction for vector
element inserts with non-constant indices. This fixes
CodeGen/X86/vector-variable-idx.ll on machines that
have SSE4.1.

llvm-svn: 54801
2008-08-14 22:43:26 +00:00
Evan Cheng 7823a411d5 Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64.
llvm-svn: 54376
2008-08-05 22:19:15 +00:00
Dan Gohman 8ef79ebd5f Add an assert to catch invalid VECTOR_SHUFFLE mask indices.
llvm-svn: 54329
2008-08-04 23:09:15 +00:00
Andrew Lenharth 77e3e86e70 Add atomic sub for other sizes
llvm-svn: 54314
2008-08-03 20:17:34 +00:00
Dan Gohman 2ce6f2ad5e Rename SDOperand to SDValue.
llvm-svn: 54128
2008-07-27 21:46:04 +00:00
Dan Gohman 91e5dcb680 Tidy SDNode::use_iterator, and complete the transition to have it
parallel its analogue, Value::value_use_iterator. The operator* method
now returns the user, rather than the use.

llvm-svn: 54127
2008-07-27 20:43:25 +00:00
Nate Begeman 283b2da27a Disable mov{L, LP, HP, HLP, *DUP} shuffles for mmx
mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code.

Also commit Mon Ping's VSETCC patch

llvm-svn: 54039
2008-07-25 19:05:58 +00:00
Evan Cheng a2b4b4ad99 Fix PR2485: do all 4-element SSE shuffles in max. of 2 shuffle instructions.
Based on patch by Nicolas Capens.

llvm-svn: 53939
2008-07-23 00:22:17 +00:00
Evan Cheng 0c23ed6364 Factor out SSE 4 wide shuffle lowering code into its own function. No functionality changes.
llvm-svn: 53933
2008-07-22 21:13:36 +00:00
Evan Cheng 0384670141 Fix PR2574: implement v2f32 scalar_to_vector.
llvm-svn: 53927
2008-07-22 18:39:19 +00:00
Duncan Sands b0e3938651 Add VerifyNode, a place to put sanity checks on
generic SDNode's (nodes with their own constructors
should do sanity checking in the constructor).  Add
sanity checks for BUILD_VECTOR and fix all the places
that were producing bogus BUILD_VECTORs, as found by
"make check".  My favorite is the BUILD_VECTOR with
only two operands that was being used to build a
vector with four elements!

llvm-svn: 53850
2008-07-21 10:20:31 +00:00
Bill Wendling 75840e6435 Fix for first part of PR2562. Generate the "pinsrw" instruction for inserts
into v4i16 vectors.

llvm-svn: 53807
2008-07-20 02:32:23 +00:00
Nate Begeman 55b7becb29 SSE codegen for vsetcc nodes
llvm-svn: 53719
2008-07-17 16:51:19 +00:00
Mon P Wang 1e2c6bfa41 When lowering certain atomics, we need to copy the memoperand from the old
atomic operation to the new one.

llvm-svn: 53714
2008-07-17 04:54:06 +00:00
Evan Cheng cf06fe476f x86-64 PIC JIT fixes: do not generate the extra load for external GV's.
llvm-svn: 53661
2008-07-16 01:34:02 +00:00
Dan Gohman 02c7c6cb33 Include a frame index in the "fixed stack" pseudo source value
instead of using the frame index for the SVOffset, which was
inconsistent.

llvm-svn: 53486
2008-07-11 22:44:52 +00:00
Bill Wendling 5774466a33 The frame address on an x86-64 box needs to be offset by -8, not -4.
llvm-svn: 53450
2008-07-11 07:18:52 +00:00
Dan Gohman 3b46030375 Pool-allocation for MachineInstrs, MachineBasicBlocks, and
MachineMemOperands. The pools are owned by MachineFunctions.

This drastically reduces the number of calls to malloc/free made
during the "Emit" phase of scheduling, as well as later phases
in CodeGen. Combined with other changes, this speeds up the
"instruction selection" phase of CodeGen by 10% in some cases.

llvm-svn: 53212
2008-07-07 23:14:23 +00:00
Duncan Sands 93e180342a Rather than having a different custom legalization
hook for each way in which a result type can be
legalized (promotion, expansion, softening etc),
just use one: ReplaceNodeResults, which returns
a node with exactly the same result types as the
node passed to it, but presumably with a bunch of
custom code behind the scenes.  No change if the
new LegalizeTypes infrastructure is not turned on.

llvm-svn: 53137
2008-07-04 11:47:58 +00:00
Duncan Sands 739a0548c4 Add a new getMergeValues method that does not need
to be passed the list of value types, and use this
where appropriate.  Inappropriate places are where
the value type list is already known and may be
long, in which case the existing method is more
efficient.

llvm-svn: 53035
2008-07-02 17:40:58 +00:00
Duncan Sands b55e5ece96 Highlight that getMergeValues optimization is
being suppressed here.

llvm-svn: 52952
2008-07-01 08:00:49 +00:00
Dan Gohman fb19f9402b Split ISD::LABEL into ISD::DBG_LABEL and ISD::EH_LABEL, eliminating
the need for a flavor operand, and add a new SDNode subclass,
LabelSDNode, for use with them to eliminate the need for a label id
operand.

Change instruction selection to let these label nodes through
unmodified instead of creating copies of them. Teach the MachineInstr
emitter how to emit a MachineInstr directly from an ISD label node.

This avoids the need for allocating SDNodes for the label id and
flavor value, as well as SDNodes for each of the post-isel label,
label id, and label flavor.

llvm-svn: 52943
2008-07-01 00:05:16 +00:00
Dan Gohman 4246cf8eea Update comments to new-style syntax.
llvm-svn: 52925
2008-06-30 21:00:56 +00:00
Dan Gohman 5c73a886b4 Rename ISD::LOCATION to ISD::DBG_STOPPOINT to better reflect its
purpose, and give it a custom SDNode subclass so that it doesn't
need to have line number, column number, filename string, and
directory string, all existing as individual SDNodes to be the
operands.

This was the only user of ISD::STRING, StringSDNode, etc., so
remove those and some associated code.

This makes stop-points considerably easier to read in
-view-legalize-dags output, and reduces overhead (creating new
nodes and copying std::strings into them) on code containing
debugging information.

llvm-svn: 52924
2008-06-30 20:59:49 +00:00
Duncan Sands 1ae6ef83ee Revert the SelectionDAG optimization that makes
it impossible to create a MERGE_VALUES node with
only one result: sometimes it is useful to be able
to create a node with only one result out of one of
the results of a node with more than one result, for
example because the new node will eventually be used
to replace a one-result node using ReplaceAllUsesWith,
cf X86TargetLowering::ExpandFP_TO_SINT.  On the other
hand, most users of MERGE_VALUES don't need this and
for them the optimization was valuable.  So add a new
utility method getMergeValues for creating MERGE_VALUES
nodes which by default performs the optimization.
Change almost everywhere to use getMergeValues (and
tidy some stuff up at the same time).

llvm-svn: 52893
2008-06-30 10:19:09 +00:00
Evan Cheng 3fc2372d3a - Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
  awful codegen.

llvm-svn: 52740
2008-06-25 20:52:59 +00:00
Dan Gohman aa01afd47c Remove the OrigVT member from AtomicSDNode, as it is redundant with
the base SDNode's VTList.

llvm-svn: 52722
2008-06-25 16:07:49 +00:00
Mon P Wang 6a490371c9 Added MemOperands to Atomic operations since Atomics touches memory.
Added abstract class MemSDNode for any Node that have an associated MemOperand
Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and
atomic.lss => atomic.load.sub

llvm-svn: 52706
2008-06-25 08:15:39 +00:00
Dale Johannesen e5f4ffbdf1 Add v2f32 (MMX) type to X86. Support is primitive:
load,store,call,return,bitcast.  This is enough to
make call and return work.

llvm-svn: 52691
2008-06-24 22:01:44 +00:00
Dan Gohman 1f2b2a4abe Remove unnecessary #includes.
llvm-svn: 52613
2008-06-22 19:21:26 +00:00
Eli Friedman 8d66e98c92 Fix a bug with <8 x i16> shuffle lowering on X86 where parts of the
shuffle could be skipped.  The check is invalid because the loop index i 
doesn't correspond to the element actually inserted. The correct check is
already done a few lines earlier, for whether the element is already in 
the right spot, so this shouldn't have any effect on the codegen for 
code that was already correct.

llvm-svn: 52486
2008-06-19 06:09:51 +00:00
Evan Cheng e47ca0940f Rather than avoiding to wrap ISD::DECLARE GV operand in X86ISD::Wrapper, simply handle it at dagisel time with x86 specific isel code.
llvm-svn: 52377
2008-06-17 02:01:22 +00:00
Andrew Lenharth f88d50bfcc add missing atomic intrinsic from gcc
llvm-svn: 52270
2008-06-14 05:48:15 +00:00
Anton Korobeynikov 729c4e95e2 Properly lower DYNAMIC_STACKALLOC - bracket all black magic with
CALLSEQ_BEGIN & CALLSEQ_END.

llvm-svn: 52225
2008-06-11 20:16:42 +00:00
Duncan Sands 11dd424539 Remove comparison methods for MVT. The main cause
of apint codegen failure is the DAG combiner doing
the wrong thing because it was comparing MVT's using
< rather than comparing the number of bits.  Removing
the < method makes this mistake impossible to commit.
Instead, add helper methods for comparing bits and use
them.

llvm-svn: 52098
2008-06-08 20:54:56 +00:00
Duncan Sands 13237ac3b9 Wrap MVT::ValueType in a struct to get type safety
and better control the abstraction.  Rename the type
to MVT.  To update out-of-tree patches, the main
thing to do is to rename MVT::ValueType to MVT, and
rewrite expressions like MVT::getSizeInBits(VT) in
the form VT.getSizeInBits().  Use VT.getSimpleVT()
to extract a MVT::SimpleValueType for use in switch
statements (you will get an assert failure if VT is
an extended value type - these shouldn't exist after
type legalization).
This results in a small speedup of codegen and no
new testsuite failures (x86-64 linux).

llvm-svn: 52044
2008-06-06 12:08:01 +00:00
Dan Gohman 714663ab94 Expand small memmovs using inline code. Set the X86 threshold for expanding
memmove to a more plausible value, now that it's actually being used.

llvm-svn: 51696
2008-05-29 19:42:22 +00:00
Evan Cheng 5e28227dbd Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq.
llvm-svn: 51667
2008-05-29 08:22:04 +00:00
Nate Begeman f1e18c7c44 Don't attempt to create VZEXT_LOAD out of an extload. This an issue where the
code generator would do something like this:

f64 = load f32 <anyext>, f32mem
v2f64 = insertelt undef, %0, 0
v2f64 = insertelt %1, 0.0, 1

into 

v2f64 = vzext_load f32mem

which on x86 is movsd, when you really wanted a cvtss2sd/movsd pair.

llvm-svn: 51624
2008-05-28 00:24:25 +00:00
Dan Gohman 3388d022ac Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.

llvm-svn: 51489
2008-05-23 17:49:40 +00:00
Evan Cheng 29e59ad6c9 Fix typos and comments.
llvm-svn: 51165
2008-05-15 22:13:02 +00:00
Evan Cheng ef377adca0 Make use of vector load and store operations to implement memcpy, memmove, and memset. Currently only X86 target is taking advantage of these.
llvm-svn: 51140
2008-05-15 08:39:06 +00:00
Dan Gohman eabd647cd5 Change target-specific classes to use more precise static types.
This eliminates the need for several awkward casts, including
the last dynamic_cast under lib/Target.

llvm-svn: 51091
2008-05-14 01:58:56 +00:00
Evan Cheng 1120279ae6 Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
pshufd $1, (%rdi), %xmm0
        movd %xmm0, %eax
=>
        movl 4(%rdi), %eax

llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng b980f6fb3d Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Nate Begeman d875c3e2fd Initial X86 codegen support for VSETCC.
llvm-svn: 51000
2008-05-12 20:34:32 +00:00
Evan Cheng 2609d5e779 Refactor isConsecutiveLoad from X86 to TargetLowering so DAG combiner can make use of it.
llvm-svn: 50991
2008-05-12 19:56:52 +00:00
Dan Gohman 906716c40f Fix a compile error on compilers that still want a return value
in a non-void function that calls abort.

llvm-svn: 50969
2008-05-12 16:17:19 +00:00
Evan Cheng 71b9afb053 When transforming a vector_shuffle to a load, the base address must not be an undef.
llvm-svn: 50940
2008-05-10 06:46:49 +00:00
Dan Gohman 3c0e11af64 For now, abort when an ISD::VAARG is encountered on x86-64, rather
than silently generate invalid code.

llvm-gcc does not currently use VAArgInst; it lowers va_arg in the
front-end.

llvm-svn: 50930
2008-05-10 01:26:14 +00:00
Evan Cheng bb48d55a88 If movl top bits are undef, let it be selected to movlps, etc.
llvm-svn: 50928
2008-05-10 00:58:41 +00:00
Evan Cheng 961339bbdb Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.

llvm-svn: 50918
2008-05-09 21:53:03 +00:00
Evan Cheng 78af38c392 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
llvm-svn: 50838
2008-05-08 00:57:18 +00:00
Mon P Wang 310a38d51e Improved generated code for atomic operators
llvm-svn: 50677
2008-05-05 22:56:23 +00:00
Evan Cheng dbfcce37fe Code clean up. No functionality change.
llvm-svn: 50675
2008-05-05 22:12:23 +00:00
Mon P Wang 3e58393c3d Added addition atomic instrinsics and, or, xor, min, and max.
llvm-svn: 50663
2008-05-05 19:05:59 +00:00
Anton Korobeynikov 9205c8562c Add General Dynamic TLS model for X86-64. Some parts looks really ugly (look for tlsaddr pattern),
but should work. Work is in progress, more models will follow

llvm-svn: 50630
2008-05-04 21:36:32 +00:00
Evan Cheng d9481366e3 Select vector shift with non-immediate i32 shift amount operand by first moving the operand into the right register.
llvm-svn: 50619
2008-05-04 09:15:50 +00:00
Arnold Schwaighofer be0de34ede Tail call optimization improvements:
Move platform independent code (lowering of possibly overwritten
arguments, check for tail call optimization eligibility) from
target X86ISelectionLowering.cpp to TargetLowering.h and
SelectionDAGISel.cpp.

Initial PowerPC tail call implementation:

Support ppc32 implemented and tested (passes my tests and
test-suite llvm-test).  
Support ppc64 implemented and half tested (passes my tests).
On ppc tail call optimization is performed if 
  caller and callee are fastcc
  call is a tail call (in tail call position, call followed by ret)
  no variable argument lists or byval arguments
  option -tailcallopt is enabled
Supported:
 * non pic tail calls on linux/darwin
 * module-local tail calls on linux(PIC/GOT)/darwin(PIC)
 * inter-module tail calls on darwin(PIC)
If constraints are not met a normal call will be emitted.

A test checking the argument lowering behaviour on x86-64 was added.

llvm-svn: 50477
2008-04-30 09:16:33 +00:00
Dan Gohman da44054867 Fix the SVOffset values for loads and stores produced by
memcpy/memset expansion. It was a bug for the SVOffset value
to be used in the actual address calculations.

llvm-svn: 50359
2008-04-28 17:15:20 +00:00
Anton Korobeynikov e183b3cd76 Properly lower vararg's FORMAL_ARGUMENTS node on win64
llvm-svn: 50325
2008-04-27 23:15:03 +00:00
Chris Lattner 724539c001 A few inline asm cleanups:
- Make targetlowering.h fit in 80 cols.
  - Make LowerAsmOperandForConstraint const.
  - Make lowerXConstraint -> LowerXConstraint
  - Make LowerXConstraint return a const char* instead of taking a string byref.

llvm-svn: 50312
2008-04-26 23:02:14 +00:00
Evan Cheng 1e78184a99 Extract the lower 64-bit if a MMX value is passed in a XMM register.
llvm-svn: 50292
2008-04-25 20:13:28 +00:00