Commit Graph

662 Commits

Author SHA1 Message Date
Dan Gohman 16d4bc3dc0 Follow Chris' suggestion; change the PseudoSourceValue accessors
to return pointers instead of references, since this is always what
is needed.

llvm-svn: 46857
2008-02-07 18:41:25 +00:00
Dan Gohman 63a8452e9c Add SourceValue information for outgoing argument stores on x86.
llvm-svn: 46854
2008-02-07 16:28:05 +00:00
Dan Gohman 2d489b5081 Re-apply the memory operand changes, with a fix for the static
initializer problem, a minor tweak to the way the
DAGISelEmitter finds load/store nodes, and a renaming of the
new PseudoSourceValue objects.

llvm-svn: 46827
2008-02-06 22:27:42 +00:00
Dale Johannesen d88f1d060e Implement sseregparm.
llvm-svn: 46764
2008-02-05 20:46:33 +00:00
Nick Lewycky f5b9938ef6 Don't use uninitialized values. Fixes vec_align.ll on X86 Linux.
llvm-svn: 46666
2008-02-02 08:29:58 +00:00
Evan Cheng efd142a920 SDIsel processes llvm.dbg.declare by recording the variable debug information descriptor and its corresponding stack frame index in MachineModuleInfo. This only works if the local variable is "homed" in the stack frame. It does not work for byval parameter, etc.
Added ISD::DECLARE node type to represent llvm.dbg.declare intrinsic. Now the intrinsic calls are lowered into a SDNode and lives on through out the codegen passes.
For now, since all the debugging information recording is done at isel time, when a ISD::DECLARE node is selected, it has the side effect of also recording the variable. This is a short term solution that should be fixed in time.

llvm-svn: 46659
2008-02-02 04:07:54 +00:00
Evan Cheng 27b32b87ed Revert 46556 and 46585. Dan please fix the PseudoSourceValue problem and re-commit.
llvm-svn: 46623
2008-01-31 21:00:00 +00:00
Dan Gohman ed346f2ed5 Avoid unnecessarily casting away const.
llvm-svn: 46590
2008-01-31 01:01:48 +00:00
Dan Gohman 9ba4d76816 Rename ISD::FLT_ROUNDS to ISD::FLT_ROUNDS_ to avoid conflicting
with the real FLT_ROUNDS (defined in <float.h>).

llvm-svn: 46587
2008-01-31 00:41:03 +00:00
Dan Gohman 3646fdda67 Create a new class, MemOperand, for describing memory references
in the backend. Introduce a new SDNode type, MemOperandSDNode, for
holding a MemOperand in the SelectionDAG IR, and add a MemOperand
list to MachineInstr, and code to manage them. Remove the offset
field from SrcValueSDNode; uses of SrcValueSDNode that were using
it are all all using MemOperandSDNode now.

Also, begin updating some getLoad and getStore calls to use the
PseudoSourceValue objects.

Most of this was written by Florian Brander, some
reorganization and updating to TOT by me.

llvm-svn: 46585
2008-01-31 00:25:39 +00:00
Evan Cheng 29cfb67e28 Even though InsertAtEndOfBasicBlock is an ugly hack it still deserves a proper name. Rename it to EmitInstrWithCustomInserter since it does not necessarily insert
instruction at the end.

llvm-svn: 46562
2008-01-30 18:18:23 +00:00
Evan Cheng 084a1cdcdd Work in progress. This patch *fixes* x86-64 calls which are modelled as StructRet but really should be return in registers, e.g. _Complex long double, some 128-bit aggregates. This is a short term solution that is necessary only because llvm, for now, cannot model i128 nor call's with multiple results.
Status: This only works for direct calls, and only the caller side is done. Disabled for now.

llvm-svn: 46527
2008-01-29 19:34:22 +00:00
Dale Johannesen 2b3bc30420 Handle 'X' constraint in asm's better.
llvm-svn: 46485
2008-01-29 02:21:21 +00:00
Chris Lattner d05d2011d0 Use fldz and fld1 for long double constants instead of a constant pool load.
llvm-svn: 46411
2008-01-27 06:19:31 +00:00
Chris Lattner 250789f1bd Remove some code for inferring alignment info from the x86 backend
now that the dag combiner does it.

llvm-svn: 46404
2008-01-26 20:07:42 +00:00
Chris Lattner f4523c35cb optimize fxor like for
llvm-svn: 46345
2008-01-25 06:14:17 +00:00
Chris Lattner 84ab724e06 Add target-specific dag combines for FAND(x,0) and FOR(x,0). This allows
us to compile:

double test(double X) {
  return copysign(0.0, X);
}

into:

_test:
	andpd	LCPI1_0(%rip), %xmm0
	ret

instead of:
_test:
	pxor	%xmm1, %xmm1
	andpd	LCPI1_0(%rip), %xmm1
	movapd	%xmm0, %xmm2
	andpd	LCPI1_1(%rip), %xmm2
	movapd	%xmm1, %xmm0
	orpd	%xmm2, %xmm0
	ret

llvm-svn: 46344
2008-01-25 05:46:26 +00:00
Chris Lattner a91f77eaac Significantly simplify and improve handling of FP function results on x86-32.
This case returns the value in ST(0) and then has to convert it to an SSE
register.  This causes significant codegen ugliness in some cases.  For 
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:

_bar:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.

Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always 
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.

This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case.  This gives 
us this code for the example above:

_bar:
	subl	$12, %esp
	call	L_foo$stub
	addl	$12, %esp
	ret

The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert).  This is gross, but
less gross than the code it is replacing :)

This also allows us to generate better code in several other cases.  For 
example on fp-stack-ret-conv.ll, we now generate:

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstps	8(%esp)
	movl	16(%esp), %eax
	cvtss2sd	8(%esp), %xmm0
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

where before we produced (incidentally, the old bad code is identical to what
gcc produces):

_test:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	cvtsd2ss	(%esp), %xmm0
	cvtss2sd	%xmm0, %xmm0
	movl	16(%esp), %eax
	movsd	%xmm0, (%eax)
	addl	$12, %esp
	ret

Note that we generate slightly worse code on pr1505b.ll due to a scheduling 
deficiency that is unrelated to this patch.

llvm-svn: 46307
2008-01-24 08:07:48 +00:00
Evan Cheng 35abd840a6 Let each target decide byval alignment. For X86, it's 4-byte unless the aggregare contains SSE vector(s). For x86-64, it's max of 8 or alignment of the type.
llvm-svn: 46286
2008-01-23 23:17:41 +00:00
Duncan Sands 95d46ef887 The last pieces needed for loading arbitrary
precision integers.  This won't actually work
(and most of the code is dead) unless the new
legalization machinery is turned on.  While
there, I rationalized the handling of i1, and
removed some bogus (and unused) sextload patterns.
For i1, this could result in microscopically
better code for some architectures (not X86).
It might also result in worse code if annotating
with AssertZExt nodes turns out to be more harmful
than helpful.

llvm-svn: 46280
2008-01-23 20:39:46 +00:00
Chris Lattner 1ea55cf816 This commit changes:
1. Legalize now always promotes truncstore of i1 to i8. 
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
   X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
   safe.

The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:

_foo:
	fldt	20(%esp)
	fldt	4(%esp)
	faddp	%st(1)
	movl	36(%esp), %eax
	fstps	(%eax)
	ret

instead of:

_foo:
	subl	$4, %esp
	fldt	24(%esp)
	fldt	8(%esp)
	faddp	%st(1)
	fstps	(%esp)
	movl	40(%esp), %eax
	movss	(%esp), %xmm0
	movss	%xmm0, (%eax)
	addl	$4, %esp
	ret

llvm-svn: 46140
2008-01-17 19:59:44 +00:00
Chris Lattner 72733e573b * Introduce a new SelectionDAG::getIntPtrConstant method
and switch various codegen pieces and the X86 backend over
  to using it.

* Add some comments to SelectionDAGNodes.h

* Introduce a second argument to FP_ROUND, which indicates
  whether the FP_ROUND changes the value of its input. If
  not it is safe to xform things like fp_extend(fp_round(x)) -> x.

llvm-svn: 46125
2008-01-17 07:00:52 +00:00
Duncan Sands 32b0ff6814 Trampoline support for x86-64. This looks like
it should work, but I have no machine to test
it on.  Committed because it will at least
cause no harm, and maybe someone can test it
for me!

llvm-svn: 46098
2008-01-16 22:55:25 +00:00
Chris Lattner e8bb9f2190 make it more clear that this predicate only applies to scalar FP types.
llvm-svn: 46058
2008-01-16 06:24:21 +00:00
Chris Lattner 14e616ef0b introduce a isTypeInSSEReg predicate, which allows us to simplify
some code.  No functionality change.

llvm-svn: 46055
2008-01-16 06:19:45 +00:00
Chris Lattner 8f7cec859e My previous commit had an incomplete message, it should have been:
make the 'fp return in ST(0)' optimization smart enough to
look through token factor nodes.  THis allows us to compile 
testcases like CodeGen/X86/fp-stack-retcopy.ll into:

_carg:
	subl	$12, %esp
	call	L_foo$stub
	fstpl	(%esp)
	fldl	(%esp)
	addl	$12, %esp
	ret

instead of:

_carg:
	subl	$28, %esp
	call	L_foo$stub
	fstpl	16(%esp)
	movsd	16(%esp), %xmm0
	movsd	%xmm0, 8(%esp)
	fldl	8(%esp)
	addl	$28, %esp
	ret

Still not optimal, but much better and this is a trivial patch.  Fixing 
the rest requires invasive surgery that is is not llvm 2.2 material.

llvm-svn: 46054
2008-01-16 05:56:59 +00:00
Chris Lattner ea001f1db7 make the 'fp return in ST(0)' optimization smart enough to
look through token factor

llvm-svn: 46053
2008-01-16 05:53:06 +00:00
Chris Lattner de5c74f18e various whitespace cleanups, no functionality change.
llvm-svn: 46052
2008-01-16 05:52:18 +00:00
Chris Lattner 3c3fefde06 no need to expand ISD::TRAP to X86ISD::TRAP, just match ISD::TRAP.
llvm-svn: 46015
2008-01-15 21:58:22 +00:00
Anton Korobeynikov 6bbbc4cbfa For PR1839: add initial support for __builtin_trap. llvm-gcc part is missed
as well as PPC codegen

llvm-svn: 46001
2008-01-15 07:02:33 +00:00
Duncan Sands 51fe7bbcf5 Whitespace tweak.
llvm-svn: 45940
2008-01-13 21:20:29 +00:00
Evan Cheng 7411b510b2 Code clean up.
llvm-svn: 45898
2008-01-12 01:08:07 +00:00
Arnold Schwaighofer 06da9e2d43 hrm - correct spelling.
Actually were not riding any arguments. Sadly there is no semantic spell checker that is going to safe you from such a mistake.

llvm-svn: 45868
2008-01-11 17:10:15 +00:00
Arnold Schwaighofer 6cf72fbbaf Improve tail call optimized call's argument lowering. Before this
commit all arguments where moved to the stack slot where they would
reside on a normal function call before the lowering to the tail call
stack slot. This was done to prevent arguments overwriting each other.
Now only arguments sourcing from a FORMAL_ARGUMENTS node or a
CopyFromReg node with virtual register (could also be a caller's
argument) are lowered indirectly.

 --This line, and those below, will be ignored--

M    X86/X86ISelLowering.cpp
M    X86/README.txt

llvm-svn: 45867
2008-01-11 16:49:42 +00:00
Arnold Schwaighofer bf1816ea7b Correct a copy and paste error.
llvm-svn: 45865
2008-01-11 14:34:56 +00:00
Evan Cheng a26552493b Mark byval parameter stack objects mutable for now.
llvm-svn: 45813
2008-01-10 02:24:25 +00:00
Evan Cheng fead113fe0 Do not use the stack pointer directly, issue a copyfromreg instead. Otherwise we can end up with something like ADD32ri %esp, x which two-address pass won't like.
llvm-svn: 45798
2008-01-10 00:37:26 +00:00
Evan Cheng 73d1017871 Remove comments that do not correspond to anything after recent refactoring.
llvm-svn: 45792
2008-01-10 00:09:10 +00:00
Evan Cheng 8242168ef4 Unbreak x86-64.
llvm-svn: 45725
2008-01-07 23:08:23 +00:00
Nate Begeman 22950d26f5 Remove an incorrect optimization that is performed correctly by
the target independent legalizer.

llvm-svn: 45631
2008-01-05 20:51:30 +00:00
Gordon Henriksen 9231958391 Refactoring the x86 and x86-64 calling convention implementations,
unifying the copied algorithms and saving over 500 LOC. There should
be no functionality change, but please test on your favorite x86
target.

llvm-svn: 45627
2008-01-05 16:56:59 +00:00
Gordon Henriksen f066fc477c First steps in in X86 calling convention cleanup.
llvm-svn: 45536
2008-01-03 16:47:34 +00:00
Chris Lattner a10fff51d9 Rename SSARegMap -> MachineRegisterInfo in keeping with the idea
that "machine" classes are used to represent the current state of
the code being compiled.  Given this expanded name, we can start 
moving other stuff into it.  For now, move the UsedPhysRegs and
LiveIn/LoveOuts vectors from MachineFunction into it.

Update all the clients to match.

This also reduces some needless #includes, such as MachineModuleInfo
from MachineFunction.

llvm-svn: 45467
2007-12-31 04:13:23 +00:00
Chris Lattner a5bb370aa4 Add new shorter predicates for testing machine operands for various types:
e.g. MO.isMBB() instead of MO.isMachineBasicBlock().  I don't plan on 
switching everything over, so new clients should just start using the 
shorter names.

Remove old long accessors, switching everything over to use the short
accessor: getMachineBasicBlock() -> getMBB(), 
getConstantPoolIndex() -> getIndex(), setMachineBasicBlock -> setMBB(), etc.

llvm-svn: 45464
2007-12-30 23:10:15 +00:00
Chris Lattner f3ebc3f3d2 Remove attribution from file headers, per discussion on llvmdev.
llvm-svn: 45418
2007-12-29 20:36:04 +00:00
Chris Lattner 07ccbfa64a Codegen:
as:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstps	(%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

instead of:

_bar:
	pushl	%esi
	subl	$8, %esp
	movl	16(%esp), %esi
	call	L_foo$stub
	fstpl	(%esi)
	cvtsd2ss	(%esi), %xmm0
	movss	%xmm0, (%esi)
	addl	$8, %esp
	popl	%esi
	#FP_REG_KILL
	ret

llvm-svn: 45401
2007-12-29 06:57:38 +00:00
Chris Lattner 8013bd339b avoid going through a stack slot to convert from fpstack to xmm reg
if we are just going to store it back anyway.  This improves things 
like:
double foo();
void bar(double *P) { *P = foo(); }

llvm-svn: 45399
2007-12-29 06:41:28 +00:00
Chris Lattner e3b05fe31b fix a questionable cast, thanks to Mike Stump for pointing this out.
llvm-svn: 45075
2007-12-16 20:26:54 +00:00
Evan Cheng 23d2d4dc6c Make better use of instructions that clear high bits; fix various 2-wide shuffle bugs.
llvm-svn: 45058
2007-12-15 03:00:47 +00:00
Evan Cheng 0e6408124e Fix ctlz and cttz. llvm definition requires them to return number of bits in of the src type when value is zero.
llvm-svn: 45029
2007-12-14 08:30:15 +00:00
Evan Cheng e9fbc3f014 Implement ctlz and cttz with bsr and bsf.
llvm-svn: 45024
2007-12-14 02:13:44 +00:00
Dan Gohman 7a7742c2fe Allow vector integer constants to be created with
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.

llvm-svn: 44954
2007-12-12 22:21:26 +00:00
Evan Cheng 0f42730722 Use shuffles to implement insert_vector_elt for i32, i64, f32, and f64.
llvm-svn: 44929
2007-12-12 07:55:34 +00:00
Evan Cheng 2a98956796 Lower a build_vector with all constants into a constpool load unless it can be done with a move to low part.
llvm-svn: 44921
2007-12-12 06:45:40 +00:00
Evan Cheng 4fbf459549 - Improved v8i16 shuffle lowering. It now uses pshuflw and pshufhw as much as
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
  (i32 extract_vector_element 0) does not require a pextrw.

llvm-svn: 44836
2007-12-11 01:46:18 +00:00
Nate Begeman a55a67ae91 x86 doesn't actually want to custom lower v3i32
llvm-svn: 44835
2007-12-11 01:41:33 +00:00
Evan Cheng b41d838d28 Add comment.
llvm-svn: 44686
2007-12-07 21:30:01 +00:00
Evan Cheng bfd373a53e Much improved v8i16 shuffles. (Step 1).
llvm-svn: 44676
2007-12-07 08:07:39 +00:00
Evan Cheng c829e5cdf0 Remove a bogus optimization. It's not possible to do a move to low element to a <8 x i16> or <16 x i8> vector.
llvm-svn: 44669
2007-12-06 22:14:22 +00:00
Duncan Sands ad0ea2d430 Fix PR1146: parameter attributes are longer part of
the function type, instead they belong to functions
and function calls.  This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll).  Hopefully
a bitcode guru (who might that be? :) ) will fix it.

llvm-svn: 44359
2007-11-27 13:23:08 +00:00
Chris Lattner 5728bdd4db Fix a long standing deficiency in the X86 backend: we would
sometimes emit "zero" and "all one" vectors multiple times,
for example:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M2
	ret

instead of:

_test2:
	pcmpeqd	%mm0, %mm0
	movq	%mm0, _M1
	movq	%mm0, _M2
	ret

This patch fixes this by always arranging for zero/one vectors
to be defined as v4i32 or v2i32 (SSE/MMX) instead of letting them be
any random type.  This ensures they get trivially CSE'd on the dag.
This fix is also important for LegalizeDAGTypes, as it gets unhappy
when the x86 backend wants BUILD_VECTOR(i64 0) to be legal even when
'i64' isn't legal.

This patch makes the following changes:

1) X86TargetLowering::LowerBUILD_VECTOR now lowers 0/1 vectors into
   their canonical types.
2) The now-dead patterns are removed from the SSE/MMX .td files.
3) All the patterns in the .td file that referred to immAllOnesV or
   immAllZerosV in the wrong form now use *_bc to match them with a
   bitcast wrapped around them.
4) X86DAGToDAGISel::SelectScalarSSELoad is generalized to handle 
   bitcast'd zero vectors, which simplifies the code actually.
5) getShuffleVectorZeroOrUndef is updated to generate a shuffle that
   is legal, instead of generating one that is illegal and expecting
   a later legalize pass to clean it up.
6) isZeroShuffle is generalized to handle bitcast of zeros.
7) several other minor tweaks.

This patch is definite goodness, but has the potential to cause random
code quality regressions.  Please be on the lookout for these and let 
me know if they happen.

llvm-svn: 44310
2007-11-25 00:24:49 +00:00
Chris Lattner f72ad16263 remove bogus assertion that broke CodeGen/Generic/cast-fp.ll on x86
among others.

llvm-svn: 44302
2007-11-24 18:37:20 +00:00
Chris Lattner f81d5886c6 Several changes:
1) Change the interface to TargetLowering::ExpandOperationResult to 
   take and return entire NODES that need a result expanded, not just
   the value.  This allows us to handle things like READCYCLECOUNTER,
   which returns two values.
2) Implement (extremely limited) support in LegalizeDAG::ExpandOp for MERGE_VALUES.
3) Reimplement custom lowering in LegalizeDAGTypes in terms of the new
   ExpandOperationResult.  This makes the result simpler and fully 
   general.
4) Implement (fully general) expand support for MERGE_VALUES in LegalizeDAGTypes.
5) Implement ExpandOperationResult support for ARM f64->i64 bitconvert and ARM
   i64 shifts, allowing them to work with LegalizeDAGTypes.
6) Implement ExpandOperationResult support for X86 READCYCLECOUNTER and FP_TO_SINT,
   allowing them to work with LegalizeDAGTypes.

LegalizeDAGTypes now passes several more X86 codegen tests when enabled and when
type legalization in LegalizeDAG is ifdef'd out.

llvm-svn: 44300
2007-11-24 07:07:01 +00:00
Anton Korobeynikov 91460e43f1 Implement codegen for flt_rounds on x86
llvm-svn: 44183
2007-11-16 01:31:51 +00:00
Bill Wendling f359fed9f9 Unify CALLSEQ_{START,END}. They take 4 parameters: the chain, two stack
adjustment fields, and an optional flag. If there is a "dynamic_stackalloc" in
the code, make sure that it's bracketed by CALLSEQ_START and CALLSEQ_END. If
not, then there is the potential for the stack to be changed while the stack's
being used by another instruction (like a call).

This can only result in tears...

llvm-svn: 44037
2007-11-13 00:44:25 +00:00
Arnold Schwaighofer d2c16ff905 Update tailcall code to include inline attribute operand for memcpy.
llvm-svn: 43978
2007-11-10 10:48:01 +00:00
Evan Cheng 797d56ff17 Much improved pic jumptable codegen:
Then:
        call    "L1$pb"
"L1$pb":
        popl    %eax
		...
LBB1_1: # entry
        imull   $4, %ecx, %ecx
        leal    LJTI1_0-"L1$pb"(%eax), %edx
        addl    LJTI1_0-"L1$pb"(%ecx,%eax), %edx
        jmpl    *%edx

        .align  2
        .set L1_0_set_3,LBB1_3-LJTI1_0
        .set L1_0_set_2,LBB1_2-LJTI1_0
        .set L1_0_set_5,LBB1_5-LJTI1_0
        .set L1_0_set_4,LBB1_4-LJTI1_0
LJTI1_0:
        .long    L1_0_set_3
        .long    L1_0_set_2

Now:
        call    "L1$pb"
"L1$pb":
        popl    %eax
		...
LBB1_1: # entry
        addl    LJTI1_0-"L1$pb"(%eax,%ecx,4), %eax
        jmpl    *%eax

		.align  2
		.set L1_0_set_3,LBB1_3-"L1$pb"
		.set L1_0_set_2,LBB1_2-"L1$pb"
		.set L1_0_set_5,LBB1_5-"L1$pb"
		.set L1_0_set_4,LBB1_4-"L1$pb"
LJTI1_0:
        .long    L1_0_set_3
        .long    L1_0_set_2

llvm-svn: 43924
2007-11-09 01:32:10 +00:00
Rafael Espindola fa0df55bdd Move the LowerMEMCPY and LowerMEMCPYCall to a common place.
Thanks for the suggestions Bill :-)

llvm-svn: 43742
2007-11-05 23:12:20 +00:00
Chris Lattner 296160d443 Fix PR1763 by allowing the 'q' constraint to work with 64-bit
regs on x86-64.

llvm-svn: 43669
2007-11-04 06:51:12 +00:00
Evan Cheng 2b93a20b09 Unbreak tailcall opt.
llvm-svn: 43646
2007-11-02 17:45:40 +00:00
Evan Cheng e453ff4913 Missing a getNumOperands check.
llvm-svn: 43630
2007-11-02 01:26:22 +00:00
Rafael Espindola 419b6d7ce4 Make ARM and X86 LowerMEMCPY identical by moving the isThumb check into getMaxInlineSizeThreshold
and by restructuring the X86 version.

New I just have to move this to a common place :-)

llvm-svn: 43554
2007-10-31 14:39:58 +00:00
Rafael Espindola 063f177300 Make ARM an X86 memcpy expansion more similar to each other.
Now both subtarget define getMaxInlineSizeThreshold and the expansion uses it.

This should not change generated code.

llvm-svn: 43552
2007-10-31 11:52:06 +00:00
Dale Johannesen b066c1f216 Make i64=expand_vector_elt(v2i64) work in 32-bit mode.
llvm-svn: 43535
2007-10-31 00:32:36 +00:00
Dale Johannesen 6aa304e529 Add missing MMX PSUBQ.
llvm-svn: 43488
2007-10-30 01:18:38 +00:00
Evan Cheng e106e2f142 Enable more fold (sext (load x)) -> (sext (truncate (sextload x)))
transformation. Previously, it's restricted by ensuring the number of load uses
is one. Now the restriction is loosened up by allowing setcc uses to be
"extended" (e.g. setcc x, c, eq -> setcc sext(x), sext(c), eq).

llvm-svn: 43465
2007-10-29 19:58:20 +00:00
Evan Cheng 7b3f7feaea Avoid doing something dumb like rewriting using a 64-bit iv in 32-bit mode.
llvm-svn: 43446
2007-10-29 07:57:50 +00:00
Evan Cheng 7f3d02471d Loosen up iv reuse to allow reuse of the same stride but a larger type when truncating from the larger type to smaller type is free.
e.g.
Turns this loop:
LBB1_1: # entry.bb_crit_edge
        xorl    %ecx, %ecx
        xorw    %dx, %dx
        movw    %dx, %si
LBB1_2: # bb
        movl    L_X$non_lazy_ptr, %edi
        movw    %si, (%edi)
        movl    L_Y$non_lazy_ptr, %edi
        movw    %dx, (%edi)
		addw    $4, %dx
		incw    %si
		incl    %ecx
		cmpl    %eax, %ecx
		jne     LBB1_2  # bb
	
into

LBB1_1: # entry.bb_crit_edge
        xorl    %ecx, %ecx
        xorw    %dx, %dx
LBB1_2: # bb
        movl    L_X$non_lazy_ptr, %esi
        movw    %cx, (%esi)
        movl    L_Y$non_lazy_ptr, %esi
        movw    %dx, (%esi)
        addw    $4, %dx
		incl    %ecx
        cmpl    %eax, %ecx
        jne     LBB1_2  # bb

llvm-svn: 43375
2007-10-26 01:56:11 +00:00
Dale Johannesen 8ee70112ea Allow for copysign having f80 second argument.
Fixes 5550319.

llvm-svn: 43205
2007-10-21 01:07:44 +00:00
Rafael Espindola 846c19dd70 Add support for byval function whose argument is not 32 bit aligned.
To do this it is necessary to add a "always inline" argument to the
memcpy node. For completeness I have also added this node to memmove
and memset.  I have also added getMem* functions, because the extra
argument makes it cumbersome to use getNode and because I get confused
by it :-)

llvm-svn: 43172
2007-10-19 10:41:11 +00:00
Chris Lattner 12d5da49d3 Change fp to sint legalization on x86-32 to do 2 x i32
loads instead of 1 x i64 loads.  This doesn't change any functionality yet.

llvm-svn: 43068
2007-10-17 06:17:29 +00:00
Chris Lattner 693cbeadff fix some funny indentation, add comments.
llvm-svn: 43066
2007-10-17 06:02:13 +00:00
Dale Johannesen e5530a35d4 Check for invalid cc's in f80 select.
llvm-svn: 43033
2007-10-16 18:09:08 +00:00
Arnold Schwaighofer b3d58b98d0 Correction to tail call optimization code. The new return address
was stored to the acutal stack slot before the parameters were
lowered to their stack slot. This could cause arguments to be
overwritten by the return address if the called function had less
parameters than the caller function. The update should remove the
last failing test case of llc-beta: SPASS.

llvm-svn: 43027
2007-10-16 09:05:00 +00:00
Evan Cheng 7bcfd8f880 LowerFP_TO_SINT must not create a stack object if it's not needed.
llvm-svn: 43004
2007-10-15 20:11:21 +00:00
Evan Cheng 4099f4f91a Unbreak x86-64.
llvm-svn: 42962
2007-10-14 10:09:39 +00:00
Arnold Schwaighofer e8d0bf2669 Correcting the corrections. Bad bad baaad emacs!
llvm-svn: 42935
2007-10-12 21:53:12 +00:00
Arnold Schwaighofer 1f0da1fefb Corrected many typing errors. And removed 'nest' parameter handling
for fastcc from X86CallingConv.td.  This means that nested functions
are not supported for calling convention 'fastcc'.

llvm-svn: 42934
2007-10-12 21:30:57 +00:00
Duncan Sands a6286bd502 Due to the new tail call optimization, trampolines can no
longer be created for fastcc functions.

llvm-svn: 42925
2007-10-12 19:37:31 +00:00
Dan Gohman 8d978da3b0 Mark vector ctpop, cttz, and ctlz as Expand on x86.
llvm-svn: 42905
2007-10-12 14:09:42 +00:00
Dan Gohman 482732af9d Set ISD::FPOW to Expand.
llvm-svn: 42881
2007-10-11 23:21:31 +00:00
Arnold Schwaighofer 9ccea99165 Added tail call optimization to the x86 back end. It can be
enabled by passing -tailcallopt to llc.  The optimization is
performed if the following conditions are satisfied:
* caller/callee are fastcc
* elf/pic is disabled OR
  elf/pic enabled + callee is in module + callee has
  visibility protected or hidden

llvm-svn: 42870
2007-10-11 19:40:01 +00:00
Evan Cheng f5ec10b64c Bug fix. X86 was emitting redundant setcc and test instructions before a conditional move.
llvm-svn: 42774
2007-10-08 22:16:29 +00:00
Dan Gohman a160361c85 Migrate X86 and ARM from using X86ISD::{,I}DIV and ARMISD::MULHILO{U,S} to
use ISD::{S,U}DIVREM and ISD::{S,U}MUL_HIO. Move the lowering code
associated with these operators into target-independent in LegalizeDAG.cpp
and TargetLowering.cpp.

llvm-svn: 42762
2007-10-08 18:33:35 +00:00
Evan Cheng 6912b50958 Not needed any more.
llvm-svn: 42623
2007-10-05 01:34:14 +00:00
Evan Cheng 5fb5a1f389 Enabling new condition code modeling scheme.
llvm-svn: 42459
2007-09-29 00:00:36 +00:00
Rafael Espindola 6c04ac1db0 Refactor the memcpy lowering for the x86 target.
The only generated code difference is that now we call memcpy when
the size of the array is unknown. This matches GCC behavior and is
better since the run time value can be arbitrarily large.

llvm-svn: 42433
2007-09-28 12:53:01 +00:00
Dale Johannesen b6d56401aa Enable codegen for long double abs, sin, cos
llvm-svn: 42368
2007-09-26 21:10:55 +00:00
Evan Cheng 9b7f0e6eb4 translateX86CC updates the last two operands.
llvm-svn: 42333
2007-09-26 00:45:55 +00:00
Dan Gohman 31599685c7 When both x/y and x%y are needed (x and y both scalar integer), compute
both results with a single div or idiv instruction. This uses new X86ISD
nodes for DIV and IDIV which are introduced during the legalize phase
so that the SelectionDAG's CSE can automatically eliminate redundant
computations.

llvm-svn: 42308
2007-09-25 18:23:27 +00:00
Dan Gohman 5e1a428344 Move the setOperationAction(ISD::DEBUG_LOC, MVT::Other, Expand) and
the check to see if the assembler supports .loc from X86TargetLowering
into the superclass TargetLowering.

llvm-svn: 42297
2007-09-25 15:10:49 +00:00
Evan Cheng e95f391ef1 Added support for new condition code modeling scheme (i.e. physical register dependency). These are a bunch of instructions that are duplicated so the x86 backend can support both the old and new schemes at the same time. They will be deleted after
all the kinks are worked out.

llvm-svn: 42285
2007-09-25 01:57:46 +00:00
Dan Gohman 1b2156fcae Add support on x86 for having Legalize lower ISD::LOCATION to ISD::DEBUG_LOC
instead of ISD::LABEL with a manual .debug_line entry when the assembler
supports .file and .loc directives.

llvm-svn: 42278
2007-09-24 21:54:14 +00:00
Chris Lattner 5b5484db63 claim that "st" is from the 80-bit register file. This causes x87-using inline
asm to die with:

ScheduleDAG.cpp:269: failed assertion `false && "Couldn't find the register class"'

instead of:
failed assertion `RegMap->getRegClass(VReg) == RC && "Register class of operand and regclass of use don't agree!"'

yay.

llvm-svn: 42259
2007-09-24 05:27:37 +00:00
Dale Johannesen e36c400255 Fix PR 1681. When X86 target uses +sse -sse2,
keep f32 in SSE registers and f64 in x87.  This
is effectively a new codegen mode.
Change addLegalFPImmediate to permit float and
double variants to do different things.
Adjust callers.

llvm-svn: 42246
2007-09-23 14:52:20 +00:00
Rafael Espindola 4730c04904 Don't add a default STACK_ALIGN (use the generic ABI alignment)
Implement calls to functions with byval arguments on X86

llvm-svn: 42192
2007-09-21 15:50:22 +00:00
Rafael Espindola f065f0e2a1 small cleanup: use LowerMemArgument in LowerFastCCArguments also
llvm-svn: 42189
2007-09-21 14:55:38 +00:00
Dale Johannesen 7d67e547b5 More long double fixes. x86_64 should build now.
llvm-svn: 42155
2007-09-19 23:55:34 +00:00
Dan Gohman 863bdc332d Emit integer x<1 as x<=0, as comparisons with zero (now includeing
64-bit) can use test instead of cmp with an immediate.

llvm-svn: 42026
2007-09-17 14:49:27 +00:00
Dale Johannesen 98d3a08d8f Remove the assumption that FP's are either float or
double from some of the many places in the optimizers
it appears, and do something reasonable with x86
long double.
Make APInt::dump() public, remove newline, use it to
dump ConstantSDNode's.
Allow APFloats in FoldingSet.
Expand X86 backend handling of long doubles (conversions
to/from int, mostly).

llvm-svn: 41967
2007-09-14 22:26:36 +00:00
Rafael Espindola 272f7304f0 Add support for functions with byval arguments on x86
llvm-svn: 41953
2007-09-14 15:48:13 +00:00
Dale Johannesen 245dceb06d Add APInt interfaces to APFloat (allows directly
access to bits).  Use them in place of float and
double interfaces where appropriate.
First bits of x86 long double constants handling 
(untested, probably does not work).

llvm-svn: 41858
2007-09-11 18:32:33 +00:00
Duncan Sands 86e0119822 Fold the adjust_trampoline intrinsic into
init_trampoline.  There is now only one
trampoline intrinsic.

llvm-svn: 41841
2007-09-11 14:10:23 +00:00
Dale Johannesen bed9dc423c Next round of APFloat changes.
Use APFloat in UpgradeParser and AsmParser.
Change all references to ConstantFP to use the
APFloat interface rather than double.  Remove
the ConstantFP double interfaces.
Use APFloat functions for constant folding arithmetic
and comparisons.
(There are still way too many places APFloat is
just a wrapper around host float/double, but we're
getting there.)

llvm-svn: 41747
2007-09-06 18:13:44 +00:00
Anton Korobeynikov 50ab26e835 Reapply r41578 with proper fix
llvm-svn: 41680
2007-09-03 00:36:06 +00:00
Rafael Espindola e636fc05d6 Initial support for calling functions with byval arguments on x86-64
llvm-svn: 41643
2007-08-31 15:06:30 +00:00
Dale Johannesen 3cf889f75e Enhance APFloat to retain bits of NaNs (fixes oggenc).
Use APFloat interfaces for more references, mostly
of ConstantFPSDNode.

llvm-svn: 41632
2007-08-31 04:03:46 +00:00
Dale Johannesen d246b2ca5c Change LegalFPImmediates to use APFloat.
Add APFloat interfaces to ConstantFP, SelectionDAG.
Fix integer bit in double->APFloat conversion.
Convert LegalizeDAG to use APFloat interface in
ConstantFPSDNode uses.

llvm-svn: 41587
2007-08-30 00:23:21 +00:00
Duncan Sands 7741427a09 Move getX86RegNum into X86RegisterInfo and use it
in the trampoline lowering.  Lookup the jump and
mov opcodes for the trampoline rather than hard
coding them.

llvm-svn: 41577
2007-08-29 19:01:20 +00:00
Rafael Espindola b602461f48 Add a comment about using libc memset/memcpy or generating inline code.
llvm-svn: 41502
2007-08-27 17:48:26 +00:00
Rafael Espindola ff33241e16 call libc memcpy/memset if array size is bigger then threshold.
Coping 100MB array (after a warmup) shows that glibc 2.6.1 implementation on
x86-64 (core 2) is 30% faster (from 0.270917s to 0.188079s)

llvm-svn: 41479
2007-08-27 10:18:20 +00:00
Chris Lattner d8c9cb9182 rename isOperandValidForConstraint to LowerAsmOperandForConstraint,
changing the interface to allow for future changes.

llvm-svn: 41384
2007-08-25 00:47:38 +00:00
Rafael Espindola 9c3d20d823 Partial implementation of calling functions with byval arguments:
*) The needed information is propagated to the DAG
 *) The X86-64 backend detects it and aborts

llvm-svn: 41179
2007-08-20 15:18:24 +00:00
Anton Korobeynikov 597c8b77e4 Move ReturnAddrIndex variable to X86MachineFunctionInfo structure. This fixed
hard to catch bugs with retaddr lowering

llvm-svn: 41104
2007-08-15 17:12:32 +00:00
Evan Cheng b2823dac69 Fix a typo pointd out by Maarten ter Huurne.
llvm-svn: 41059
2007-08-13 23:27:11 +00:00
Christopher Lamb b372abab14 Increase efficiency of sign_extend_inreg by using subregisters for truncation. As the README suggests sign_extend_subreg is selected to (sext(trunc)).
llvm-svn: 41010
2007-08-10 21:48:46 +00:00
Rafael Espindola 66011c17d5 propagate struct size and alignment of byval arguments to the DAG
llvm-svn: 40986
2007-08-10 14:44:42 +00:00
Dale Johannesen ba1a98a4e0 long double 9 of N. This finishes up the X86-32 bits
(constants are still not handled).  Adds ConvertActions
to control fp-to-fp conversions (these are currently
defaulted for all other targets, so no changes there).

llvm-svn: 40958
2007-08-09 01:04:01 +00:00
Dale Johannesen 57c6ac5fe5 Long double patch 7 of N, unless I lost count:).
Last x87 bits for full functionality (not
thoroughly tested, and long doubles do not work
in SSE modes at all - use -mcpu=i486 for now)

llvm-svn: 40886
2007-08-07 01:17:37 +00:00
Dale Johannesen b1888e73ad Long double patch 4 of N: initial x87 implementation.
Lots of problems yet but some simple things work.

llvm-svn: 40847
2007-08-05 18:49:15 +00:00
Dan Gohman 8932bff7fe Fix the alignment requirements of several unpck and shuf instructions.
Generalize isPSHUFDMask and add a unary SHUFPD pattern so that SHUFPD's
memory operand alignment can be tested as well, with a fix to avoid
breaking MMX's use of isPSHUFDMask.

llvm-svn: 40756
2007-08-02 21:17:01 +00:00
Evan Cheng d3d92890fc Can't handle offset and scale if rip-relative addressing is to be used.
llvm-svn: 40703
2007-08-01 23:46:47 +00:00
Evan Cheng 242a87734a This isn't safe when there are uses of load's chain result.
llvm-svn: 40617
2007-07-31 06:21:44 +00:00
Duncan Sands ce38853cc6 Trampoline codegen support for X86-32.
llvm-svn: 40566
2007-07-27 20:02:49 +00:00
Dan Gohman 4788552deb Re-apply 40504, but with a fix for the segfault it caused in oggenc:
Make the alignedload and alignedstore patterns always require 16-byte
alignment. This way when they are used in the "Fs" instructions, in which
a vector instruction is used for a scalar purpose, they can still require
the full vector alignment. And add a regression test for this.

llvm-svn: 40555
2007-07-27 17:16:43 +00:00
Evan Cheng 931de40afa Reverting 40504 for now. It's breaking oggenc.
llvm-svn: 40547
2007-07-27 01:37:47 +00:00
Dan Gohman 8455bd3fae Remove X86ISD::LOAD_PACK and X86ISD::LOAD_UA and associated code from the
x86 target, replacing them with the new alignment attributes on memory
references.

llvm-svn: 40504
2007-07-26 00:31:09 +00:00
Dan Gohman f906c7286f Use movaps to load a v4f32 build_vector of all-constant values into a
register instead of loading each element individually.

llvm-svn: 40478
2007-07-24 22:55:08 +00:00
Dan Gohman b6a8ae20c7 Fix some uses of dyn_cast to be uses of cast.
llvm-svn: 40443
2007-07-23 20:24:29 +00:00
Evan Cheng 64738536b3 Fix custom lowering of SSE FXOR.
llvm-svn: 40071
2007-07-19 23:36:01 +00:00
Anton Korobeynikov 383a324735 Long live the exception handling!
This patch fills the last necessary bits to enable exceptions
handling in LLVM. Currently only on x86-32/linux.

In fact, this patch adds necessary intrinsics (and their lowering) which
represent really weird target-specific gcc builtins used inside unwinder.

After corresponding llvm-gcc patch will land (easy) exceptions should be
more or less workable. However, exceptions handling support should not be 
thought as 'finished': I expect many small and not so small glitches
everywhere.

llvm-svn: 39855
2007-07-14 14:06:15 +00:00
Dan Gohman 57111e7a60 Define non-intrinsic instructions for vector min, max, sqrt, rsqrt, and rcp,
in addition to the intrinsic forms. Add spill-folding entries for these new
instructions, and for the scalar min and max instrinsic instructions which
were missing. And add some preliminary ISelLowering code for using the new
non-intrinsic vector sqrt instruction, and fneg and fabs.

llvm-svn: 38478
2007-07-10 00:05:58 +00:00
Anton Korobeynikov de9c825859 Proper flag __alloca call
llvm-svn: 37923
2007-07-05 20:36:08 +00:00
Dale Johannesen 3d7008cd49 Refactor X87 instructions. As a side effect, all
their names are changed.

llvm-svn: 37876
2007-07-04 21:07:47 +00:00
Dale Johannesen a2b3c175db Fix for PR 1505 (and 1489). Rewrite X87 register
model to include f32 variants.  Some factoring
improvments forthcoming.

llvm-svn: 37847
2007-07-03 00:53:03 +00:00
Evan Cheng 444d3ca53d No vector fneg.
llvm-svn: 37786
2007-06-29 00:18:15 +00:00
Evan Cheng 3bd318e298 Type of vector extract / insert index operand should be iPTR.
llvm-svn: 37784
2007-06-29 00:01:20 +00:00
Dan Gohman a866514528 Generalize MVT::ValueType and associated functions to be able to represent
extended vector types. Remove the special SDNode opcodes used for pre-legalize
vector operations, and the special MVT::Vector type used with them. Adjust
lowering and legalize to work with the normal SDNode kinds instead, and to
use the normal MVT functions to work with vector types instead of using the
two special operands that the pre-legalize nodes held.

This allows pre-legalize and post-legalize DAGs, and the code that operates
on them, to be more consistent. Pre-legalize vector operators can be handled
more consistently with scalar operators. And, -view-dag-combine1-dags and
-view-legalize-dags now look prettier for vector code.

llvm-svn: 37719
2007-06-25 16:23:39 +00:00
Dan Gohman 309d3d51b3 Move ComputeMaskedBits, MaskedValueIsZero, and ComputeNumSignBits from
TargetLowering to SelectionDAG so that they have more convenient
access to the current DAG, in preparation for the ValueType routines
being changed from standalone functions to members of SelectionDAG for
the pre-legalize vector type changes.

llvm-svn: 37704
2007-06-22 14:59:07 +00:00
Chris Lattner 944200be45 If a function is vararg, never pass inreg arguments in registers. Thanks to
Anton for half of this patch.

llvm-svn: 37641
2007-06-19 00:13:10 +00:00