llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	ca9d784bf1	change isGlobalStubReference to take target flags instead of a MachineOperand. llvm-svn: 75236	2009-07-10 06:29:59 +00:00
Chris Lattner	b9af63a4d2	GVRequiresExtraLoad is now never used for calls, simplify it based on this. llvm-svn: 75232	2009-07-10 05:52:02 +00:00
Chris Lattner	ace6ec26d9	actually, just eliminate PCRelGVRequiresExtraLoad. It makes the code more complex and slow than just directly testing what we care about. llvm-svn: 75231	2009-07-10 05:48:03 +00:00
Chris Lattner	7277a807f0	There is only one case where GVRequiresExtraLoad returns true for calls: split its handling out to PCRelGVRequiresExtraLoad, and simplify code based on this. llvm-svn: 75230	2009-07-10 05:45:15 +00:00
Chris Lattner	1cc7ae7c3b	the "isDirectCall" operand of GVRequiresRegister is always false, eliminate it. llvm-svn: 75229	2009-07-10 05:37:11 +00:00
Owen Anderson	0504e0a222	Thread LLVMContext through MVT and related parts of SDISel. llvm-svn: 75153	2009-07-09 17:57:24 +00:00
Chris Lattner	212c0506cd	simplify this logic a bit more. llvm-svn: 75118	2009-07-09 07:02:30 +00:00
Chris Lattner	72e3deca47	move reasoning about darwin $non_lazy_ptr stubs from asmprinter into isel. llvm-svn: 75117	2009-07-09 06:59:17 +00:00
Chris Lattner	e0dae0e716	make isel use MO_PIC_BASE_OFFSET when lowering globalvalues on darwin in pic mode, instead of having asmprinter just "know" to print them. llvm-svn: 75109	2009-07-09 05:47:33 +00:00
Chris Lattner	d047d06358	make isel decide whether to emit $stub's on darwin instead of asmprinter. llvm-svn: 75107	2009-07-09 05:27:35 +00:00
Chris Lattner	d5d80ab6b6	Make isel determine where to emit PLT-relative calls instead of having asmprinter do it. llvm-svn: 75104	2009-07-09 05:02:21 +00:00
Chris Lattner	fef11d6e77	simplify some code based on the fact that picstyles != none are only valid in pic or dynamic-no-pic mode. Also, x86-64 never used picstylegot. llvm-svn: 75101	2009-07-09 04:39:06 +00:00
Chris Lattner	62f568c037	all this logic always returns true because GOT mode is never active in x86-64 mode. Simplify it away, someone should evaluate this. llvm-svn: 75100	2009-07-09 04:27:47 +00:00
Chris Lattner	3945dd0c44	isPICStyleRIPRel() and friends are never true in -static mode. Simplify code based on this. llvm-svn: 75099	2009-07-09 04:24:46 +00:00
Chris Lattner	1c5bf9d26d	When in -static mode, force the PIC style to none. Doing this requires fixing code which conflated RIPRel PIC with x86-64. Fix these to just check for X86-64 directly. llvm-svn: 75092	2009-07-09 03:15:51 +00:00
Chris Lattner	384a738b4a	merge two identical functions and simplify things that are GOT specific llvm-svn: 75091	2009-07-09 02:55:47 +00:00
Chris Lattner	4d1e9bf717	hoist check for IsTailCall to callers. Eliminate redundant check for x86-64: GOT-style PIC is never used on x86-64. llvm-svn: 75090	2009-07-09 02:46:53 +00:00
Chris Lattner	88765d4831	change a few methods to be static functions. llvm-svn: 75089	2009-07-09 02:44:11 +00:00
Chris Lattner	47f64ea174	move handling of dllimport linkage in isel, not in asmprinter. llvm-svn: 75086	2009-07-09 00:58:53 +00:00
Torok Edwin	fb8d6d5b58	Implement changes from Chris's feedback. Finish converting lib/Target. llvm-svn: 75043	2009-07-08 20:53:28 +00:00
Torok Edwin	fa04002254	Convert more abort() calls to llvm_report_error(). Also remove trailing semicolon. llvm-svn: 75027	2009-07-08 19:04:27 +00:00
Torok Edwin	6dd2730024	Start converting to new error handling API. cerr+abort -> llvm_report_error assert(0)+abort -> LLVM_UNREACHABLE (assert(0)+llvm_unreachable-> abort() included) llvm-svn: 75018	2009-07-08 18:01:40 +00:00
Dale Johannesen	56a53d02e1	Don't accept globals as matching 'i' constraint in PIC modes (in accordance with existing comment). gcc.apple/asm-block-25.c llvm-svn: 74886	2009-07-07 00:18:49 +00:00
Tilmann Scheller	aea6059ed4	Add NumFixedArgs attribute to CallSDNode which indicates the number of fixed arguments in a vararg call. With the SVR4 ABI on PowerPC, vector arguments for vararg calls are passed differently depending on whether they are a fixed or a variable argument. Variable vector arguments always go into memory, fixed vector arguments are put into vector registers. If there are no free vector registers available, fixed vector arguments are put on the stack. The NumFixedArgs attribute allows to decide for an argument in a vararg call whether it belongs to the fixed or variable portion of the parameter list. llvm-svn: 74764	2009-07-03 06:44:53 +00:00
Bill Wendling	512ff7353e	Update comments to make it clear that the function alignment is the Log2 of the bytes and not bytes. llvm-svn: 74624	2009-07-01 18:50:55 +00:00
Bill Wendling	31ceb1bcba	Add an "alignment" field to the MachineFunction object. It makes more sense to have the alignment be calculated up front, and have the back-ends obey whatever alignment is decided upon. This allows for future work that would allow for precise no-op placement and the like. llvm-svn: 74564	2009-06-30 22:38:32 +00:00
David Greene	8adf1fdc80	Add a 256-bit register class and YMM registers. llvm-svn: 74469	2009-06-29 22:50:51 +00:00
Owen Anderson	45c299ef65	Add a target-specific DAG combine on X86 to fold the common pattern of fence-atomic-fence down to just the atomic op. This is possible thanks to X86's relatively strong memory model, which guarantees that locked instructions (which are used to implement atomics) are implicit fences. llvm-svn: 74435	2009-06-29 18:04:45 +00:00
David Greene	f92ba97cda	Add more vector ValueTypes for AVX and other extended vector instruction sets. llvm-svn: 74427	2009-06-29 16:47:10 +00:00
Chris Lattner	ae0acfcef0	pull @GOT, @GOTOFF, @GOTPCREL handling into isel from the asmprinter. llvm-svn: 74378	2009-06-27 05:39:56 +00:00
Chris Lattner	fea81da433	Reimplement rip-relative addressing in the X86-64 backend. The new implementation primarily differs from the former in that the asmprinter doesn't make a zillion decisions about whether or not something will be RIP relative or not. Instead, those decisions are made by isel lowering and propagated through to the asm printer. To achieve this, we: 1. Represent RIP relative addresses by setting the base of the X86 addr mode to X86::RIP. 2. When ISel Lowering decides that it is safe to use RIP, it lowers to X86ISD::WrapperRIP. When it is unsafe to use RIP, it lowers to X86ISD::Wrapper as before. 3. This removes isRIPRel from X86ISelAddressMode, representing it with a basereg of RIP instead. 4. The addressing mode matching logic in isel is greatly simplified. 5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate passed through various printoperand routines is gone now. 6. The various symbol printing routines in asmprinter now no longer infer when to emit (%rip), they just print the symbol. I think this is a big improvement over the previous situation. It does have two small caveats though: 1. I implemented a horrible "no-rip" modifier for the inline asm "P" constraint modifier. This is a short term hack, there is a much better, but more involved, solution. 2. I had to xfail an -aggressive-remat testcase because it isn't handling the use of RIP in the constant-pool reading instruction. This specific test is easy to fix without -aggressive-remat, which I intend to do next. llvm-svn: 74372	2009-06-27 04:16:01 +00:00
Chris Lattner	49ed726e46	Move all the TLS processing logic into isel, don't do it in asmprinter at all. llvm-svn: 74327	2009-06-26 21:20:29 +00:00
Chris Lattner	2ed6a9d7bd	move magic for PIC constantpool references from asmprinter to isel. llvm-svn: 74313	2009-06-26 19:22:52 +00:00
Chris Lattner	2aaad91bbe	start adding logic in isel to determine asm printer semantics, step N of M. llvm-svn: 74246	2009-06-26 00:43:52 +00:00
Chris Lattner	a3da048c82	indentation fix llvm-svn: 73840	2009-06-21 02:22:34 +00:00
Eli Friedman	48021d15d6	Misc accumulated tweaks to legalization logic for various targets. llvm-svn: 73476	2009-06-16 06:40:59 +00:00
Chris Lattner	c68a564cdd	I got J and K backward, many thanks to Eli for spotting this! llvm-svn: 73372	2009-06-15 04:39:05 +00:00
Chris Lattner	ea3621a6b1	implement support for the 'K' asm constraint, PR4347 llvm-svn: 73366	2009-06-15 04:01:39 +00:00
Arnold Schwaighofer	e3a018d707	Fix Bug 4278: X86-64 with -tailcallopt calling convention out of sync with regular cc. The only difference between the tail call cc and the normal cc was that one parameter register - R9 - was reserved for calling functions through a function pointer. After time the tail call cc has gotten out of sync with the regular cc. We can use R11 which is also caller saved but not used as parameter register for potential function pointers and remove the special tail call cc on x86-64. llvm-svn: 73233	2009-06-12 16:26:57 +00:00
Anton Korobeynikov	06039d1190	Silence a warning llvm-svn: 73152	2009-06-09 23:00:39 +00:00
Eli Friedman	0d4234416f	Get rid of some unnecessary code. llvm-svn: 73017	2009-06-07 07:28:45 +00:00
Eli Friedman	3234587213	Slightly generalize the code that handles shuffles of consecutive loads on x86 to handle more cases. Fix a bug in said code that would cause it to read past the end of an object. Rewrite the code in SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general. Remove PerformBuildVectorCombine, which is no longer necessary with these changes. In addition to simplifying the code, with this change, we can now catch a few more cases of consecutive loads. llvm-svn: 73012	2009-06-07 06:52:44 +00:00
Eli Friedman	75c496f920	Avoid crashing on a variable-index insertelement with element type i16. llvm-svn: 72991	2009-06-06 06:32:50 +00:00
Eli Friedman	1b1844ad1f	Get rid of some bogus patterns for X86vzmovl. Don't create VZEXT_MOVL nodes for vectors with an i16 element type. Add an optimization for building a vector which is all zeros/undef except for the bottom element, where the bottom element is an i8 or i16. llvm-svn: 72988	2009-06-06 06:05:10 +00:00
Eli Friedman	b45e8ce69a	PR2598: make sure to expand illegal forms of integer/floating-point conversions for x86, like <2 x i32> -> <2 x float> and <4 x i16> -> <4 x float>. llvm-svn: 72983	2009-06-06 03:57:58 +00:00
Devang Patel	d1c7d34924	Add new function attribute - noimplicitfloat Update code generator to use this attribute and remove NoImplicitFloat target option. Update llc to set this attribute when -no-implicit-float command line option is used. llvm-svn: 72959	2009-06-05 21:57:13 +00:00
Nate Begeman	624690c6b2	Adapt the x86 build_vector dagcombine to the current state of the legalizer. build vectors with i64 elements will only appear on 32b x86 before legalize. Since vector widening occurs during legalize, and produces i64 build_vector elements, the dag combiner is never run on these before legalize splits them into 32b elements. Teach the build_vector dag combine in x86 back end to recognize consecutive loads producing the low part of the vector. Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes since that was required implicitly. Add a testcase for the transform. Old: subl $28, %esp movl 32(%esp), %eax movl 4(%eax), %ecx movl %ecx, 4(%esp) movl (%eax), %eax movl %eax, (%esp) movaps (%esp), %xmm0 pmovzxwd %xmm0, %xmm0 movl 36(%esp), %eax movaps %xmm0, (%eax) addl $28, %esp ret New: movl 4(%esp), %eax pmovzxwd (%eax), %xmm0 movl 8(%esp), %eax movaps %xmm0, (%eax) ret llvm-svn: 72957	2009-06-05 21:37:30 +00:00
Devang Patel	54707b420a	Evan thinks NoImplicitFloat check is not required here. llvm-svn: 72954	2009-06-05 18:48:29 +00:00
Dan Gohman	11231d0c75	Remove unnecessary #includes. llvm-svn: 72782	2009-06-03 16:47:12 +00:00
Dale Johannesen	5234d3795f	Revert 72707 and 72709, for the moment. llvm-svn: 72712	2009-06-02 03:12:52 +00:00
Dale Johannesen	0b8ca79253	Make the implicit inputs and outputs of target-independent ADDC/ADDE use MVT::i1 (later, whatever it gets legalized to) instead of MVT::Flag. Remove CARRY_FALSE in favor of 0; adjust all target-independent code to use this format. Most targets will still produce a Flag-setting target-dependent version when selection is done. X86 is converted to use i32 instead, which means TableGen needs to produce different code in xxxGenDAGISel.inc. This keys off the new supportsHasI1 bit in xxxInstrInfo, currently set only for X86; in principle this is temporary and should go away when all other targets have been converted. All relevant X86 instruction patterns are modified to represent setting and using EFLAGS explicitly. The same can be done on other targets. The immediate behavior change is that an ADC/ADD pair are no longer tightly coupled in the X86 scheduler; they can be separated by instructions that don't clobber the flags (MOV). I will soon add some peephole optimizations based on using other instructions that set the flags to feed into ADC. llvm-svn: 72707	2009-06-01 23:27:20 +00:00
Bill Wendling	09f17a8479	Untabification. llvm-svn: 72604	2009-05-30 01:09:53 +00:00
Evan Cheng	a9cda8abf2	Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code. e.g. orl $65536, 8(%rax) => orb $1, 10(%rax) Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization. llvm-svn: 72507	2009-05-28 00:35:15 +00:00
Eli Friedman	a56159b7e9	Ger rid of some dead code. llvm-svn: 72494	2009-05-27 20:39:00 +00:00
Eli Friedman	acb851a8c0	Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and FP_TO_XINT. Necessary for some cleanups I'm working on. Updated from the previous version (r72431) to fix a bug and make some things a bit clearer. llvm-svn: 72445	2009-05-27 00:47:34 +00:00
Daniel Dunbar	d96b117872	Back out r72431, it is causing a number of compilation crashes with clang. llvm-svn: 72436	2009-05-26 21:27:02 +00:00
Eli Friedman	8c7bff96ed	Don't abuse the quirky behavior of LegalizeDAG for XINT_TO_FP and FP_TO_XINT. Necessary for some cleanups I'm working on. llvm-svn: 72431	2009-05-26 19:18:56 +00:00
Eli Friedman	2199ed399f	Make the X86 backend mark EXTRACT_SUBVECTOR as Expand, at least for the moment. llvm-svn: 72350	2009-05-23 22:44:52 +00:00
Eli Friedman	dfe4f25355	Make the x86 backend custom-lower UINT_TO_FP and FP_TO_UINT on 32-bit systems instead of attempting to promote them to a 64-bit SINT_TO_FP or FP_TO_SINT. This is in preparation for removing the type legalization code from LegalizeDAG: once type legalization is gone from LegalizeDAG, it won't be able to handle the i64 operand/result correctly. This isn't quite ideal, but I don't think any other operation for any target ends up in this situation, so treating this case specially seems reasonable. llvm-svn: 72324	2009-05-23 09:59:16 +00:00
Evan Cheng	ab0d23396a	Run code placement optimization for targets that want it (arm and x86 for now). llvm-svn: 71726	2009-05-13 21:42:09 +00:00
Chris Lattner	f1d9b91434	Fix PR4152: asm constraint validation happens before dag combine, so we need to work a bit to combine things like (x+c1+c2) into x+c3. llvm-svn: 71232	2009-05-08 18:23:14 +00:00
Nate Begeman	7e6e352735	Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. llvm-svn: 70425	2009-04-29 22:47:44 +00:00
Nate Begeman	5f829d896d	Implement review feedback for vector shuffle work. llvm-svn: 70372	2009-04-29 05:20:52 +00:00
Nate Begeman	8d6d4b9289	2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. llvm-svn: 70225	2009-04-27 18:41:29 +00:00
Rafael Espindola	c1396a2313	Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not very elegant, but neither is the tls specification :-( llvm-svn: 69968	2009-04-24 12:59:40 +00:00
Rafael Espindola	b93db668b3	Revert 69952. Causes testsuite failures on linux x86-64. llvm-svn: 69967	2009-04-24 12:40:33 +00:00
Nate Begeman	bb881d66f4	PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next. llvm-svn: 69952	2009-04-24 03:42:54 +00:00
Duncan Sands	7ce5cc6bd1	Get rid of what looks like a copy-and-pasted typo. Spotted by gcc-4.5. llvm-svn: 69673	2009-04-21 09:44:39 +00:00
Bob Wilson	f8b85477ae	Move duplicated AddLiveIn function from X86 and ARM backends to be a method in the MachineFunction class, renaming it to addLiveIn for consistency with the same method in MachineBasicBlock. Thanks for Anton for suggesting this. llvm-svn: 69615	2009-04-20 18:36:57 +00:00
Rafael Espindola	355fe12c82	For general dynamic TLS access we must use leaq foo@TLSGD(%rip), %rdi as part of the instruction sequence. Using a register other than %rdi and then copying it to %rdi is not valid. llvm-svn: 69350	2009-04-17 14:35:58 +00:00
Rafael Espindola	6d6c6043ea	X86-64 TLS support for local exec and initial exec. llvm-svn: 68947	2009-04-13 13:02:49 +00:00
Dan Gohman	de912e2475	Remove the obsolete SelectionDAG::getNodeValueTypes and simplify code that uses it by using SelectionDAG::getVTList instead. llvm-svn: 68744	2009-04-09 23:54:40 +00:00
Dan Gohman	f15454866c	Fix grammaros in comments. llvm-svn: 68666	2009-04-09 02:06:09 +00:00
Rafael Espindola	3b2df10c9e	Re-apply 68552. Tested by bootstrapping llvm-gcc and using that to build llvm. llvm-svn: 68645	2009-04-08 21:14:34 +00:00
Rafael Espindola	d173f4237d	Avoid a hard coded constant. llvm-svn: 68603	2009-04-08 08:09:33 +00:00
Dan Gohman	ad3e549a53	Implement support for using modeling implicit-zero-extension on x86-64 with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG instructions), and teach the DAGCombiner to take advantage of this on targets which support it. This eliminates many redundant zero-extension operations on x86-64. This adds a new TargetLowering hook, isZExtFree. It's similar to isTruncateFree, except it only applies to actual definitions, and not no-op truncates which may not zero the high bits. Also, this adds a new optimization to SimplifyDemandedBits: transform operations like x+y into (zext (add (trunc x), (trunc y))) on targets where all the casts are no-ops. In contexts where the high part of the add is explicitly masked off, this allows the mask operation to be eliminated. Fix the DAGCombiner to avoid undoing these transformations to eliminate casts on targets where the casts are no-ops. Also, this adds a new two-address lowering heuristic. Since two-address lowering runs before coalescing, it helps to be able to look through copies when deciding whether commuting and/or three-address conversion are profitable. Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle the case that a clobber range extended both before and beyond an existing live range. In that case, multiple live ranges need to be added. This was exposed by the new subreg coalescing code. Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the spiller behavior it was looking for no longer occurrs with the new instruction selection. llvm-svn: 68576	2009-04-08 00:15:30 +00:00
Bill Wendling	4aa25b79f9	Temporarily revert r68552. This was causing a failure in the self-hosting LLVM builds. --- Reverse-merging (from foreign repository) r68552 into '.': U test/CodeGen/X86/tls8.ll U test/CodeGen/X86/tls10.ll U test/CodeGen/X86/tls2.ll U test/CodeGen/X86/tls6.ll U lib/Target/X86/X86Instr64bit.td U lib/Target/X86/X86InstrSSE.td U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86RegisterInfo.cpp U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86CodeEmitter.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86InstrInfo.h U lib/Target/X86/X86ISelDAGToDAG.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h U lib/Target/X86/X86ISelLowering.h U lib/Target/X86/X86InstrInfo.cpp U lib/Target/X86/X86InstrBuilder.h U lib/Target/X86/X86RegisterInfo.td llvm-svn: 68560	2009-04-07 22:35:25 +00:00
Rafael Espindola	1edda06792	Reduce code duplication on the TLS implementation. This introduces a small regression on the generated code quality in the case we are just computing addresses, not loading values. Will work on it and on X86-64 support. llvm-svn: 68552	2009-04-07 21:37:46 +00:00
Mon P Wang	9c186c5d27	Added a x86 dag combine to increase the chances to use a movq for v2i64 on x86-32. llvm-svn: 68368	2009-04-03 02:43:30 +00:00
Chris Lattner	d2eb0a63a1	silence warning in release-asserts build. llvm-svn: 68253	2009-04-01 22:14:45 +00:00
Evan Cheng	d9d6e427d6	i128 shift libcalls are not available on x86. llvm-svn: 68133	2009-03-31 19:38:51 +00:00
Evan Cheng	a84a318873	When optimzing a mul by immediate into two, the resulting mul's should get a x86 specific node to avoid dag combiner from hacking on them further. llvm-svn: 68066	2009-03-30 21:36:47 +00:00
Rafael Espindola	6ff3dabbb4	Have only one definition of X86AddrNumOperands. llvm-svn: 67949	2009-03-28 18:55:31 +00:00
Evan Cheng	fd81c73cde	Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g. x * 40 => shlq $3, %rdi leaq (%rdi,%rdi,4), %rax This has the added benefit of allowing more multiply to be folded into addressing mode. e.g. a * 24 + b => leaq (%rdi,%rdi,2), %rax leaq (%rsi,%rax,8), %rax llvm-svn: 67917	2009-03-28 05:57:29 +00:00
Rafael Espindola	e728019392	I am trying to add a segment to the X86 addresses matching to improve TLS support (see http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090309/075220.html), but that code is VERY brittle. This patch just makes it a bit more resistant. llvm-svn: 67843	2009-03-27 15:26:30 +00:00
Evan Cheng	d88ebc352c	-no-implicit-float means explicit fp operations are legal. llvm-svn: 67784	2009-03-26 23:06:32 +00:00
Bill Wendling	aa28be652c	Pull transform from target-dependent code into target-independent code. llvm-svn: 67742	2009-03-26 06:14:09 +00:00
Bill Wendling	94f299f2c5	Match this pattern so that we can generate simpler code: %a = ... %b = and i32 %a, 2 %c = srl i32 %b, 1 %d = br i32 %c, into %a = ... %b = and %a, 2 %c = X86ISD::CMP %b, 0 %d = X86ISD::BRCOND %c ... This applies only when the AND constant value has one bit set and the SRL constant is equal to the log2 of the AND constant. The back-end is smart enough to convert the result into a TEST/JMP sequence. llvm-svn: 67728	2009-03-26 01:47:50 +00:00
Bill Wendling	798fd56d0f	These instructions have special lowering that may lower them to SSE instructions. Prevent that if we don't want implicit uses of SSE. llvm-svn: 66877	2009-03-13 08:41:47 +00:00
Evan Cheng	1fb8aedd1e	Fix some significant problems with constant pools that resulted in unnecessary paddings between constant pool entries, larger than necessary alignments (e.g. 8 byte alignment for .literal4 sections), and potentially other issues. 1. ConstantPoolSDNode alignment field is log2 value of the alignment requirement. This is not consistent with other SDNode variants. 2. MachineConstantPool alignment field is also a log2 value. 3. However, some places are creating ConstantPoolSDNode with alignment value rather than log2 values. This creates entries with artificially large alignments, e.g. 256 for SSE vector values. 4. Constant pool entry offsets are computed when they are created. However, asm printer group them by sections. That means the offsets are no longer valid. However, asm printer uses them to determine size of padding between entries. 5. Asm printer uses expensive data structure multimap to track constant pool entries by sections. 6. Asm printer iterate over SmallPtrSet when it's emitting constant pool entries. This is non-deterministic. Solutions: 1. ConstantPoolSDNode alignment field is changed to keep non-log2 value. 2. MachineConstantPool alignment field is also changed to keep non-log2 value. 3. Functions that create ConstantPool nodes are passing in non-log2 alignments. 4. MachineConstantPoolEntry no longer keeps an offset field. It's replaced with an alignment field. Offsets are not computed when constant pool entries are created. They are computed on the fly in asm printer and JIT. 5. Asm printer uses cheaper data structure to group constant pool entries. 6. Asm printer compute entry offsets after grouping is done. 7. Change JIT code to compute entry offsets on the fly. llvm-svn: 66875	2009-03-13 07:51:59 +00:00
Chris Lattner	99cc133710	generalize the previous code to use the full generality of LEA for i32/i64 expressions (we could also do i16 on cpus where i16 lea is fast, but I didn't add this). On the example, we now generate: _test: movl 4(%esp), %eax cmpl $42, (%eax) setl %al movzbl %al, %eax leal 4(%eax,%eax,8), %eax ret instead of: _test: movl 4(%esp), %eax cmpl $41, (%eax) movl $4, %ecx movl $13, %eax cmovg %ecx, %eax ret llvm-svn: 66869	2009-03-13 05:53:31 +00:00
Chris Lattner	4be6df5d86	optimize the case of cond ? 42 : 41 and friends. This compiles the example to: _test: movl 4(%esp), %eax cmpl $41, (%eax) setg %al movzbl %al, %eax orl $4294967294, %eax ret instead of: movl 4(%esp), %eax cmpl $41, (%eax) movl $4294967294, %ecx movl $4294967295, %eax cmova %ecx, %eax ret which is smaller in code size and faster. rdar://6668608 llvm-svn: 66868	2009-03-13 05:22:11 +00:00
Chris Lattner	4147f08e44	Move 3 "(add (select cc, 0, c), x) -> (select cc, x, (add, x, c))" related transformations out of target-specific dag combine into the ARM backend. These were added by Evan in r37685 with no testcases and only seems to help ARM (e.g. test/CodeGen/ARM/select_xform.ll). Add some simple X86-specific (for now) DAG combines that turn things like cond ? 8 : 0 -> (zext(cond) << 3). This happens frequently with the recently added cp constant select optimization, but is a very general xform. For example, we now compile the second example in const-select.ll to: _test: movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 seta %al movzbl %al, %eax movl 4(%esp), %ecx movsbl (%ecx,%eax,4), %eax ret instead of: _test: movl 4(%esp), %eax leal 4(%eax), %ecx movsd LCPI2_0, %xmm0 ucomisd 8(%esp), %xmm0 cmovbe %eax, %ecx movsbl (%ecx), %eax ret This passes multisource and dejagnu. llvm-svn: 66779	2009-03-12 06:52:53 +00:00
Evan Cheng	ef0b7cc2d5	On x86, if the only use of a i64 load is a i64 store, generate a pair of double load and store instead. llvm-svn: 66776	2009-03-12 05:59:15 +00:00
Bill Wendling	42adc73a2b	Add a -no-implicit-float flag. This acts like -soft-float, but may generate floating point instructions that are explicitly specified by the user. llvm-svn: 66719	2009-03-11 22:30:01 +00:00
Mon P Wang	25c6a46a81	For yonah, fix a vector shuffle case for v16i8 where we didn't properly clear some bits. llvm-svn: 66684	2009-03-11 18:47:57 +00:00
Mon P Wang	ce6a26cb1a	Fixed a v8i16 shuffle case that should generate a pshufb instead of a pshuflw/hw. llvm-svn: 66645	2009-03-11 06:35:11 +00:00
Chris Lattner	248ad00afd	formatting change, reduce indentation. No functionality change. llvm-svn: 66642	2009-03-11 05:48:52 +00:00
Dan Gohman	ff659b5b86	Arithmetic instructions don't set EFLAGS bits OF and CF bits the same say the "test" instruction does in overflow cases, so eliminating the test is only safe when those bits aren't needed, as is the case for COND_E and COND_NE, or if it can be proven that no overflow will occur. For now, just restrict the optimization to COND_E and COND_NE and don't do any overflow analysis. llvm-svn: 66318	2009-03-07 01:58:32 +00:00
Dan Gohman	e014b193c9	When creating X86ISD::INC and X86ISD::DEC nodes, only add one operand. The extra operand didn't appear to cause any trouble, but it was erroneous regardless. llvm-svn: 66206	2009-03-05 21:29:28 +00:00
Dan Gohman	2c2f192c74	Fix the "test" optimization to recognize "dec" as an add of negative one, as subtracts of immediates are canonicalized to adds. llvm-svn: 66180	2009-03-05 19:32:48 +00:00
Dan Gohman	55d7b2ac4f	Re-apply 66008, now that the unfoldMemoryOperand bug is fixed. llvm-svn: 66058	2009-03-04 19:44:21 +00:00
Dan Gohman	6728f892be	Revert r66004 for now; it's causing a variety of test failures. llvm-svn: 66008	2009-03-04 03:54:19 +00:00
Dan Gohman	fe8d71f42a	Teach the x86 backend to eliminate "test" instructions by using the EFLAGS result from add, sub, inc, and dec instructions in simple cases. llvm-svn: 66004	2009-03-04 02:33:24 +00:00
Rafael Espindola	000421eade	Refactor TLS code and add some tests. The tests and expected results are: pic \| declaration \| linkage \| visibility \| !pic \| declaration \| external \| default \| tls1.ll tls2.ll \| local exec pic \| declaration \| external \| default \| tls1-pic.ll tls2-pic.ll \| general dynamic !pic \| !declaration \| external \| default \| tls3.ll tls4.ll \| initial exec pic \| !declaration \| external \| default \| tls3-pic.ll tls4-pic.ll \| general dynamic !pic \| declaration \| external \| hidden \| tls7.ll tls8.ll \| local exec pic \| declaration \| external \| hidden \| X \| local dynamic !pic \| !declaration \| external \| hidden \| tls9.ll tls10.ll \| local exec pic \| !declaration \| external \| hidden \| X \| local dynamic !pic \| declaration \| internal \| default \| tls5.ll tls6.ll \| local exec pic \| declaration \| internal \| default \| X \| local dynamic The ones marked with an X have not been implemented since local dynamic is not implemented. llvm-svn: 65632	2009-02-27 13:37:18 +00:00
Evan Cheng	a49de9de2e	Revert BuildVectorSDNode related patches: 65426, 65427, and 65296. llvm-svn: 65482	2009-02-25 22:49:59 +00:00
Evan Cheng	9f8fddeed8	Only v1i16 (i.e. _m64) is returned via RAX / RDX. llvm-svn: 65313	2009-02-23 09:03:22 +00:00
Nate Begeman	e684da3e5d	Generate better code for v8i16 shuffles on SSE2 Generate better code for v16i8 shuffles on SSE2 (avoids stack) Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops. Document the shuffle matching logic and add some FIXMEs for later further cleanups. New tests that test the above. Examples: New: _shuf2: pextrw $7, %xmm0, %eax punpcklqdq %xmm1, %xmm0 pshuflw $128, %xmm0, %xmm0 pinsrw $2, %eax, %xmm0 Old: _shuf2: pextrw $2, %xmm0, %eax pextrw $7, %xmm0, %ecx pinsrw $2, %ecx, %xmm0 pinsrw $3, %eax, %xmm0 movd %xmm1, %eax pinsrw $4, %eax, %xmm0 ret ========= New: _shuf4: punpcklqdq %xmm1, %xmm0 pshufb LCPI1_0, %xmm0 Old: _shuf4: pextrw $3, %xmm0, %eax movsd %xmm1, %xmm0 pextrw $3, %xmm1, %ecx pinsrw $4, %ecx, %xmm0 pinsrw $5, %eax, %xmm0 ======== New: _shuf1: pushl %ebx pushl %edi pushl %esi pextrw $1, %xmm0, %eax rolw $8, %ax movd %xmm0, %ecx rolw $8, %cx pextrw $5, %xmm0, %edx pextrw $4, %xmm0, %esi pextrw $3, %xmm0, %edi pextrw $2, %xmm0, %ebx movaps %xmm0, %xmm1 pinsrw $0, %ecx, %xmm1 pinsrw $1, %eax, %xmm1 rolw $8, %bx pinsrw $2, %ebx, %xmm1 rolw $8, %di pinsrw $3, %edi, %xmm1 rolw $8, %si pinsrw $4, %esi, %xmm1 rolw $8, %dx pinsrw $5, %edx, %xmm1 pextrw $7, %xmm0, %eax rolw $8, %ax movaps %xmm1, %xmm0 pinsrw $7, %eax, %xmm0 popl %esi popl %edi popl %ebx ret Old: _shuf1: subl $252, %esp movaps %xmm0, (%esp) movaps %xmm0, 16(%esp) movaps %xmm0, 32(%esp) movaps %xmm0, 48(%esp) movaps %xmm0, 64(%esp) movaps %xmm0, 80(%esp) movaps %xmm0, 96(%esp) movaps %xmm0, 224(%esp) movaps %xmm0, 208(%esp) movaps %xmm0, 192(%esp) movaps %xmm0, 176(%esp) movaps %xmm0, 160(%esp) movaps %xmm0, 144(%esp) movaps %xmm0, 128(%esp) movaps %xmm0, 112(%esp) movzbl 14(%esp), %eax movd %eax, %xmm1 movzbl 22(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 42(%esp), %eax movd %eax, %xmm1 movzbl 50(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm1, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 77(%esp), %eax movd %eax, %xmm1 movzbl 84(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm1, %xmm2 movzbl 104(%esp), %eax movd %eax, %xmm1 punpcklbw %xmm1, %xmm0 punpcklbw %xmm2, %xmm0 movaps %xmm0, %xmm1 punpcklbw %xmm3, %xmm1 movzbl 127(%esp), %eax movd %eax, %xmm0 movzbl 135(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 155(%esp), %eax movd %eax, %xmm0 movzbl 163(%esp), %eax movd %eax, %xmm3 punpcklbw %xmm0, %xmm3 punpcklbw %xmm2, %xmm3 movzbl 188(%esp), %eax movd %eax, %xmm0 movzbl 197(%esp), %eax movd %eax, %xmm2 punpcklbw %xmm0, %xmm2 movzbl 217(%esp), %eax movd %eax, %xmm4 movzbl 225(%esp), %eax movd %eax, %xmm0 punpcklbw %xmm4, %xmm0 punpcklbw %xmm2, %xmm0 punpcklbw %xmm3, %xmm0 punpcklbw %xmm1, %xmm0 addl $252, %esp ret llvm-svn: 65311	2009-02-23 08:49:38 +00:00
Scott Michel	9d31aca679	Introduce the BuildVectorSDNode class that encapsulates the ISD::BUILD_VECTOR instruction. The class also consolidates the code for detecting constant splats that's shared across PowerPC and the CellSPU backends (and might be useful for other backends.) Also introduces SelectionDAG::getBUID_VECTOR() for generating new BUILD_VECTOR nodes. llvm-svn: 65296	2009-02-22 23:36:09 +00:00
Evan Cheng	e4ffc030e2	Be bug compatible with gcc by returning MMX values in RAX. llvm-svn: 65274	2009-02-22 08:05:12 +00:00
Evan Cheng	2a9bad5ac1	Support return of MMX values in 64-bit mode. llvm-svn: 65152	2009-02-20 20:43:02 +00:00
Scott Michel	cf0da6c597	Remove trailing whitespace to reduce later commit patch noise. (Note: Eventually, commits like this will be handled via a pre-commit hook that does this automagically, as well as expand tabs to spaces and look for 80-col violations.) llvm-svn: 64827	2009-02-17 22:15:04 +00:00
Evan Cheng	c2fde91703	Teach x86 target -soft-float. llvm-svn: 64496	2009-02-13 22:36:38 +00:00
Dale Johannesen	655775293f	Arrange to print constants that match "n" and "i" constraints in inline asm as signed (what gcc does). Add partial support for x86-specific "e" and "Z" constraints, with appropriate signedness for printing. llvm-svn: 64400	2009-02-12 20:58:09 +00:00
Dale Johannesen	9c310711bb	Use getDebugLoc forwarder instead of getNode()->getDebugLoc. No functional change. llvm-svn: 64026	2009-02-07 19:59:05 +00:00
Dan Gohman	747e55bc9a	Constify TargetInstrInfo::EmitInstrWithCustomInserter, allowing ScheduleDAG's TLI member to use const. llvm-svn: 64018	2009-02-07 16:15:20 +00:00
Dale Johannesen	62fd95d6ec	Get rid of the last non-DebugLoc versions of getNode! Many targets build placeholder nodes for special operands, e.g. GlobalBaseReg on X86 and PPC for the PIC base. There's no sensible way to associate debug info with these. I've left them built with getNode calls with explicit DebugLoc::getUnknownLoc operands. I'm not too happy about this but don't see a good improvement; I considered adding a getPseudoOperand or something, but it seems to me that'll just make it harder to read. llvm-svn: 63992	2009-02-07 00:55:49 +00:00
Dale Johannesen	84935759d5	Remove more non-DebugLoc getNode variants. Use getCALLSEQ_{END,START} to permit passing no DebugLoc there. UNDEF doesn't logically have DebugLoc; add getUNDEF to encapsulate this. llvm-svn: 63978	2009-02-06 23:05:02 +00:00
Dale Johannesen	400dc2e2e4	Remove more non-DebugLoc versions of getNode. llvm-svn: 63969	2009-02-06 21:50:26 +00:00
Dale Johannesen	9f3f72f144	Get rid of one more non-DebugLoc getNode and its corresponding getTargetNode. Lots of caller changes. llvm-svn: 63904	2009-02-06 01:31:28 +00:00
Dale Johannesen	021052a705	Remove non-DebugLoc versions of getLoad and getStore. Adjust the many callers of those versions. llvm-svn: 63767	2009-02-04 20:06:27 +00:00
Dan Gohman	556d14d483	Minor code cleanups; no functionality change. llvm-svn: 63740	2009-02-04 17:28:58 +00:00
Mon P Wang	4379a795fe	Fixes a case where we generate an incorrect mask for pshfhw in the presence of undefs and incorrectly determining if we have punpckldq. llvm-svn: 63702	2009-02-04 01:16:59 +00:00
Dale Johannesen	bbf13f54e0	Patch up omissions in DebugLoc propagation. llvm-svn: 63693	2009-02-04 00:33:20 +00:00
Dale Johannesen	abf66b8343	Add some DL propagation to places that didn't have it yet. More coming. llvm-svn: 63673	2009-02-03 22:26:09 +00:00
Dale Johannesen	1eb1ef2cfd	DebugLoc propagation. done with file. llvm-svn: 63656	2009-02-03 20:21:25 +00:00
Dale Johannesen	66e03e6f7b	DebugLoc propagation. 2/3 through file. llvm-svn: 63650	2009-02-03 19:33:06 +00:00
Evan Cheng	dc636c4080	ADD / SUB / SMUL / UMUL with overflow second result top bits must be zero. llvm-svn: 63509	2009-02-02 09:15:04 +00:00
Evan Cheng	4988c597b3	Add comment. llvm-svn: 63506	2009-02-02 08:19:07 +00:00
Evan Cheng	50e15bdf81	Teach LowerBRCOND to recognize (xor (setcc x), 1). The xor inverts the condition. It's normally transformed by the dag combiner, unless the condition is set by a arithmetic op with overflow. llvm-svn: 63505	2009-02-02 08:07:36 +00:00
Torok Edwin	a2d1f35e9a	Implement -mno-sse: if SSE is disabled on x86-64, don't store XMM on stack for var-args, and don't allow FP return values llvm-svn: 63495	2009-02-01 18:15:56 +00:00
Duncan Sands	3ed768868d	Fix PR3453 and probably a bunch of other potential crashes or wrong code with codegen of large integers: eliminate the legacy getIntegerVTBitMask and getIntegerVTSignBit methods, which returned their value as a uint64_t, so couldn't handle huge types. llvm-svn: 63494	2009-02-01 18:06:53 +00:00
Dale Johannesen	555a375bb6	Make LowerCallTo and LowerArguments take a DebugLoc argument. Adjust all callers and overloaded versions. llvm-svn: 63444	2009-01-30 23:10:59 +00:00
Bill Wendling	8fb81f1b3d	Get rid of the non-DebugLoc-ified getNOT() method. llvm-svn: 63442	2009-01-30 23:03:19 +00:00
Mon P Wang	cbb20a6ee1	When PerformBuildVectorCombine, avoid creating a X86ISD::VZEXT_LOAD of an illegal type. llvm-svn: 63380	2009-01-30 07:07:40 +00:00
Dan Gohman	e58ab79f33	Make x86's BT instruction matching more thorough, and add some dagcombines that help it match in several more cases. Add several more cases to test/CodeGen/X86/bt.ll. This doesn't yet include matching for BT with an immediate operand, it just covers more register+register cases. llvm-svn: 63266	2009-01-29 01:59:02 +00:00
Mon P Wang	9150f735fa	Fixed lowering of v816 shuffles. llvm-svn: 63252	2009-01-28 23:11:14 +00:00
Mon P Wang	5a685a52c1	Add shuffle splat pattern for x86 sse shifts. llvm-svn: 63193	2009-01-28 08:12:05 +00:00
Dan Gohman	8e4ac9b71a	Take the next steps in making SDUse more consistent with LLVM Use, and tidy up SDUse and related code. - Replace the operator= member functions with a set method, like LLVM Use has, and variants setInitial and setNode, which take care up updating use lists, like LLVM Use's does. This simplifies code that calls these functions. - getSDValue() is renamed to get(), as in LLVM Use, though most places can either use the implicit conversion to SDValue or the convenience functions instead. - Fix some more node vs. value terminology issues. Also, eliminate the one remaining use of SDOperandPtr, and SDOperandPtr itself. llvm-svn: 62995	2009-01-26 04:35:06 +00:00
Nate Begeman	a2550a8e96	De-identifying per sabre review llvm-svn: 62988	2009-01-26 03:15:31 +00:00
Nate Begeman	8a51d8c8f7	Support pattern matching various x86 sse shifts. llvm-svn: 62979	2009-01-26 00:52:55 +00:00
Bob Wilson	c58900504b	Add SelectionDAG::getNOT method to construct bitwise NOT operations, corresponding to the "not" and "vnot" PatFrags. Use the new method in some places where it seems appropriate. llvm-svn: 62768	2009-01-22 17:39:32 +00:00
Evan Cheng	8f367e53c7	Minor tweak to LowerUINT_TO_FP_i32. Bias (after scalar_to_vector) has two uses so we should make it the second source operand of ISD::OR so 2-address pass won't have to be smart about commuting. %reg1024<def> = MOVSDrm %reg0, 1, %reg0, <cp#0>, Mem:LD(8,8) [ConstantPool + 0] %reg1025<def> = MOVSD2PDrr %reg1024 %reg1026<def> = MOVDI2PDIrm <fi#-1>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack-1 + 0] %reg1027<def> = ORPSrr %reg1025<kill>, %reg1026<kill> %reg1028<def> = MOVPD2SDrr %reg1027<kill> %reg1029<def> = SUBSDrr %reg1028<kill>, %reg1024<kill> %reg1030<def> = CVTSD2SSrr %reg1029<kill> MOVSSmr <fi#0>, 1, %reg0, 0, %reg1030<kill>, Mem:ST(4,4) [FixedStack0 + 0] %reg1031<def> = LD_Fp32m80 <fi#0>, 1, %reg0, 0, Mem:LD(4,16) [FixedStack0 + 0] RET %reg1031<kill>, %ST0<imp-use,kill> The reason 2-addr pass isn't smart enough to commute the ORPSrr is because it can't look pass the MOVSD2PDrr instruction. llvm-svn: 62505	2009-01-19 08:19:57 +00:00
Evan Cheng	7e9ef4d776	Now not UINT_TO_FP is legal (it's marked custom), dag combiner won't optimize it to a SINT_TO_FP when the sign bit is known zero. X86 isel should perform the optimization itself. llvm-svn: 62504	2009-01-19 08:08:22 +00:00
Bill Wendling	f9291cf43c	Extend thi llvm-svn: 62415	2009-01-17 07:40:19 +00:00
Bill Wendling	dd40f26877	Temporarily revert my last change. It is causing a bootstrap failure. llvm-svn: 62405	2009-01-17 04:23:51 +00:00
Bill Wendling	4d5275905e	Implement a special algorithm for converting uint_to_fp for i32 values on X86. This code: void f() { uint32_t x; float y = (float)x; } used to be: movl %eax, -8(%ebp) movl [2^52 double], -4(%ebp) movsd -8(%ebp), %xmm0 subsd [2^52 double], %xmm0 cvtsd2ss %xmm0, %xmm0 Is now: movsd [2^52 double], %xmm0 movsd %xmm0, %xmm1 movd %ecx, %xmm2 orps %xmm2, %xmm1 subsd %xmm0, %xmm1 cvtsd2ss %xmm1, %xmm0 This is faster on X86. Note that there's an extra load of %xmm0 into %xmm1. That will be fixed in a later coalescer fix. llvm-svn: 62404	2009-01-17 03:56:04 +00:00
Bill Wendling	e04334730e	Add support for non-zero __builtin_return_address values on X86. llvm-svn: 62338	2009-01-16 19:25:27 +00:00
Mon P Wang	ebfafee903	Expand insert/extract of a <4 x i32> with a variable index. llvm-svn: 62281	2009-01-15 21:10:20 +00:00
Dan Gohman	0ad43ca6e5	Make getWidenVectorType const. llvm-svn: 62265	2009-01-15 17:34:08 +00:00
Dan Gohman	a63bede3c6	BT appears to be available on all >= i386 chips. llvm-svn: 62196	2009-01-13 23:27:15 +00:00
Dan Gohman	d3942af5cb	Don't use a BT instruction if the AND has multiple uses. llvm-svn: 62195	2009-01-13 23:25:30 +00:00
Devang Patel	5c6e1e3b7d	Use DebugInfo interface to lower dbg_* intrinsics. llvm-svn: 62127	2009-01-13 00:35:13 +00:00
Dan Gohman	33e6fcd56f	X86_COND_C and X86_COND_NC are alternate mnemonics for X86_COND_B and X86_COND_AE, respectively. llvm-svn: 61835	2009-01-07 00:15:08 +00:00
Devang Patel	56a8bb670f	squash warnings. llvm-svn: 61707	2009-01-05 17:31:22 +00:00
Evan Cheng	1671a309fd	Use movaps / movd to extract vector element 0 even with sse4.1. It's still cheaper than pextrw especially if the value is in memory. llvm-svn: 61555	2009-01-02 05:29:08 +00:00
Duncan Sands	8feb694e8f	Fix PR3274: when promoting the condition of a BRCOND node, promote from i1 all the way up to the canonical SetCC type. In order to discover an appropriate type to use, pass MVT::Other to getSetCCResultType. In order to be able to do this, change getSetCCResultType to take a type as an argument, not a value (this is also more logical). llvm-svn: 61542	2009-01-01 15:52:00 +00:00
Chris Lattner	2a7c988627	Add a simple pattern for matching 'bt'. llvm-svn: 61426	2008-12-25 05:34:37 +00:00
Chris Lattner	8175f27d3f	translateX86CC can never fail. Simplify it based on this. llvm-svn: 61423	2008-12-24 23:53:05 +00:00
Chris Lattner	4b46b74ece	indentation llvm-svn: 61407	2008-12-24 00:11:37 +00:00
Chris Lattner	e9988b661d	simplify some control flow and reduce indentation, no functionality change. llvm-svn: 61404	2008-12-23 23:42:27 +00:00
Dan Gohman	25a767d7f4	Add instruction patterns and encodings for the x86 bt instructions. llvm-svn: 61400	2008-12-23 22:45:23 +00:00
Dan Gohman	12f2490489	Clean up the atomic opcodes in SelectionDAG. This removes all the _8, _16, _32, and _64 opcodes and replaces each group with an unsuffixed opcode. The MemoryVT field of the AtomicSDNode is now used to carry the size information. In tablegen, the size-specific opcodes are replaced by size-independent opcodes that utilize the ability to compose them with predicates. This shrinks the per-opcode tables and makes the code that handles atomics much more concise. llvm-svn: 61389	2008-12-23 21:37:04 +00:00
Mon P Wang	ec95070ca3	Fixed code generation for v8i16 and v16i8 splats on X86. Fixed lowering of v8i16 shuffles for v8i16 when we fall back to extract/insert. llvm-svn: 61365	2008-12-23 04:03:27 +00:00
Mon P Wang	998fd29ce1	Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1. llvm-svn: 61211	2008-12-18 21:42:19 +00:00
Bill Wendling	c4499feb1a	- Use patterns instead of creating completely new instruction matching patterns, which are identical to the original patterns. - Change the multiply with overflow so that we distinguish between signed and unsigned multiplication. Currently, unsigned multiplication with overflow isn't working! llvm-svn: 60963	2008-12-12 21:15:41 +00:00
Mon P Wang	9c2d26d208	Added support for SELECT v8i8 v4i16 for X86 (MMX) Added support for TRUNC v8i16 to v8i8 for X86 (MMX) llvm-svn: 60916	2008-12-12 01:25:51 +00:00
Bill Wendling	1a317678bc	Redo the arithmetic with overflow architecture. I was changing the semantics of ISD::ADD to emit an implicit EFLAGS. This was horribly broken. Instead, replace the intrinsic with an ISD::SADDO node. Then custom lower that into an X86ISD::ADD node with a associated SETCC that checks the correct condition code (overflow or carry). Then that gets lowered into the correct X86::ADDOvf instruction. Similar for SUB and MUL instructions. llvm-svn: 60915	2008-12-12 00:56:36 +00:00
Bill Wendling	f482f379ef	Whitespace changes. llvm-svn: 60826	2008-12-10 02:01:32 +00:00
Bill Wendling	db8ec2d75a	Add sub/mul overflow intrinsics. This currently doesn't have a target-independent way of determining overflow on multiplication. It's very tricky. Patch by Zoltan Varga! llvm-svn: 60800	2008-12-09 22:08:41 +00:00
Dale Johannesen	9efd2ce55b	Make LoopStrengthReduce smarter about hoisting things out of loops when they can be subsumed into addressing modes. Change X86 addressing mode check to realize that some PIC references need an extra register. (I believe this is correct for Linux, if not, I'm sure someone will tell me.) llvm-svn: 60608	2008-12-05 21:47:27 +00:00
Evan Cheng	501089f6f4	Refactor code. No functionality change. llvm-svn: 60478	2008-12-03 08:38:43 +00:00
Bill Wendling	f8d1ef9842	CC should only be a ConstantSDNode at this point. Just use 'cast' instead of 'dyn_cast'. llvm-svn: 60477	2008-12-03 08:32:02 +00:00
Bill Wendling	30e9dc81c8	Second stab at target-dependent lowering of everyone's favorite nodes: [SU]ADDO - LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C, depending on the type of ADDO requested. - LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or COND_C set. llvm-svn: 60388	2008-12-02 01:06:39 +00:00
Duncan Sands	3d960941b1	There are no longer any places that require a MERGE_VALUES node with only one operand, so get rid of special code that only existed to handle that possibility. llvm-svn: 60349	2008-12-01 11:41:29 +00:00
Duncan Sands	6ed40141f7	Change the interface to the type legalization method ReplaceNodeResults: rather than returning a node which must have the same number of results as the original node (which means mucking around with MERGE_VALUES, and which is also easy to get wrong since SelectionDAG folding may mean you don't get the node you expect), return the results in a vector. llvm-svn: 60348	2008-12-01 11:39:25 +00:00
Bill Wendling	128f032cc8	Comment out code that isn't entirely correct. llvm-svn: 60156	2008-11-27 07:18:35 +00:00
Bill Wendling	751a694ad3	Generate something sensible for an [SU]ADDO op when the overflow/carry flag is the conditional for the BRCOND statement. For instance, it will generate: addl %eax, %ecx jo LOF instead of addl %eax, %ecx ; About 10 instructions to compare the signs of LHS, RHS, and sum. jl LOF llvm-svn: 60123	2008-11-26 22:37:40 +00:00
Bill Wendling	66835479d7	- Make lowering of "add with overflow" customizable by back-ends. - Mark "add with overflow" as having a custom lowering for X86. Give it a null lowering representation for now. llvm-svn: 59971	2008-11-24 19:21:46 +00:00
Mon P Wang	35a70ec131	Added missing description for -disable-mmx option. llvm-svn: 59929	2008-11-24 02:10:43 +00:00
Duncan Sands	8d6e2e13d5	Rename SetCCResultContents to BooleanContents. In practice these booleans are mostly produced by SetCC, however the concept is more general. llvm-svn: 59911	2008-11-23 15:47:28 +00:00
Mon P Wang	0aa8f0a549	Added -disable-mmx using a patch from Preston Gurd. llvm-svn: 59901	2008-11-23 04:37:22 +00:00
Dale Johannesen	bee1ad9707	Extend InlineAsm::C_Register to allow multiple specific registers (actually, code already all worked, only the comment changed). Use this to implement 'A' constraint on x86. Fixes PR 1779. llvm-svn: 59266	2008-11-13 21:52:36 +00:00
Mon P Wang	9a8d60a7c0	Widening cleanup llvm-svn: 58796	2008-11-06 05:31:54 +00:00
Evan Cheng	3cd5e8c97b	Indentation. llvm-svn: 58750	2008-11-05 06:03:38 +00:00
Dan Gohman	99cdf8893e	Use MOVSSmr instead of EXTRACTPSmr in the case of extracting vector element 0 for a store, as it's smaller and faster. llvm-svn: 58483	2008-10-31 00:57:24 +00:00
Mon P Wang	58c3794c27	Add initial support for vector widening. Logic is set to widen for X86. One will only see an effect if legalizetype is not active. Will move support to LegalizeType soon. llvm-svn: 58426	2008-10-30 08:01:45 +00:00
Chris Lattner	38461f6b2f	Fix a nasty miscompilation of 176.gcc on linux/x86 where we synthesized a memset using 16-byte XMM stores, but where the stack realignment code didn't work. Until it does (PR2962) disable use of xmm regs in memcpy and memset formation for linux and other targets with insufficiently aligned stacks. This is part of PR2888 llvm-svn: 58317	2008-10-28 05:49:35 +00:00
Duncan Sands	014f5bbaad	Fix translateX86CC: if SetCCOpcode is SETULE and LHS is a foldable load, then LHS and RHS are swapped and SetCCOpcode is changed to SETUGT. But the later code is expecting operands to be the wrong way round for SETUGT, but they are not in this case, resulting in an inverted compare. The solution is to move the load normalization before the correction for SETUGT. This bug was tickled by LegalizeTypes which happened to legalize the testcase slightly differently to LegalizeDAG. llvm-svn: 58092	2008-10-24 13:03:10 +00:00
Dale Johannesen	f6655a9e79	Remove allocation of unused stack slot. llvm-svn: 57987	2008-10-22 17:26:06 +00:00
Duncan Sands	5ee1dde8fa	Get this working with LegalizeTypes: (1) don't assume that i64 has been turned into a BUILD_PAIR node (when called from LegalizeTypes this hasn't happened yet) and don't use a vector shuffle mask with an illegal element type. llvm-svn: 57972	2008-10-22 11:24:12 +00:00
Dale Johannesen	cf4607fcce	Adjust comments for pedantic satisfaction. llvm-svn: 57940	2008-10-22 00:02:32 +00:00
Dale Johannesen	3d7ece1acb	Add comments to explain uint64->f64 algorithm, well, sort of. (Algorithm by Ian Ollmann.) llvm-svn: 57932	2008-10-21 23:07:49 +00:00
Dale Johannesen	28929589e7	Add an SSE2 algorithm for uint64->f64 conversion. The same one Apple gcc uses, faster. Also gets the extreme case in gcc.c-torture/execute/ieee/rbug.c correct which we weren't before; this is not sufficient to get the test to pass though, there is another bug. llvm-svn: 57926	2008-10-21 20:50:01 +00:00
Dan Gohman	269246b034	Don't create TargetGlobalAddress nodes with offsets that don't fit in the 32-bit signed offset field of addresses. Even though this may be intended, some linkers refuse to relocate code where the relocated address computation overflows. Also, fix the sign-extension of constant offsets to use the actual pointer size, rather than the size of the GlobalAddress node, which may be different, for example on x86-64 where MVT::i32 is used when the address is being fit into the 32-bit displacement field. llvm-svn: 57885	2008-10-21 03:38:42 +00:00
Dan Gohman	97d95d6d85	Optimized FCMP_OEQ and FCMP_UNE for x86. Where previously LLVM might emit code like this: ucomisd %xmm1, %xmm0 setne %al setp %cl orb %al, %cl jne .LBB4_2 it now emits this: ucomisd %xmm1, %xmm0 jne .LBB4_2 jp .LBB4_2 It has fewer instructions and uses fewer registers, but it does have more branches. And in the case that this code is followed by a non-fallthrough edge, it may be followed by a jmp instruction, resulting in three branch instructions in sequence. Some effort is made to avoid this situation. To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and FCMP_UNE in lowered form, and replace them with code that emits two branches, except in the case where it would require converting a fall-through edge to an explicit branch. Also, X86InstrInfo.cpp's branch analysis and transform code now knows now to handle blocks with multiple conditional branches. It uses loops instead of having fixed checks for up to two instructions. It can now analyze and transform code generated from FCMP_OEQ and FCMP_UNE. llvm-svn: 57873	2008-10-21 03:29:32 +00:00
Duncan Sands	1d20ab5784	Have X86 custom lowering for LegalizeTypes use LowerOperation if it doesn't know what else to do. This methods should probably be factorized some, but this is good enough for the moment. Have LowerATOMIC_BINARY_64 use EXTRACT_ELEMENT rather than assuming the operand is a BUILD_PAIR (if it is then getNode will automagically simplify the EXTRACT_ELEMENT). This way LowerATOMIC_BINARY_64 usable from LegalizeTypes. llvm-svn: 57831	2008-10-20 15:56:33 +00:00
Dan Gohman	2fe6bee5b6	Teach DAGCombine to fold constant offsets into GlobalAddress nodes, and add a TargetLowering hook for it to use to determine when this is legal (i.e. not in PIC mode, etc.) This allows instruction selection to emit folded constant offsets in more cases, such as the included testcase, eliminating the need for explicit arithmetic instructions. This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp that attempted to achieve the same effect, but wasn't as effective. Also, fix handling of offsets in GlobalAddressSDNodes in several places, including changing GlobalAddressSDNode's offset from int to int64_t. The Mips, Alpha, Sparc, and CellSPU targets appear to be unaware of GlobalAddress offsets currently, so set the hook to false on those targets. llvm-svn: 57748	2008-10-18 02:06:02 +00:00
Chris Lattner	8e2ef196ae	add support for 128 bit inputs on both x86-64 and x86-32. llvm-svn: 57709	2008-10-17 18:15:05 +00:00
Chris Lattner	c7e65f4377	Fix a bug where the x86 backend would reject 64-bit r constraints when in 32-bit mode instead of assigning a register pair. This has nothing to do with PR2356, but I happened to notice it while working on it. llvm-svn: 57704	2008-10-17 17:59:52 +00:00
Dan Gohman	4a87660127	Remove an unused variable. llvm-svn: 57621	2008-10-16 01:47:47 +00:00
Evan Cheng	3b0f5e4d61	- Add target lowering hooks that specify which setcc conditions are illegal, i.e. conditions that cannot be checked with a single instruction. For example, SETONE and SETUEQ on x86. - Teach legalizer to implement illegal setcc as a and / or of a number of legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is implemented as SETO & SETNE, SETUEQ is SETUO \| SETEQ. - Move x86 target over. llvm-svn: 57542	2008-10-15 02:05:31 +00:00
Dan Gohman	e7ced74558	FastISel support for exception-handling constructs. - Move the EH landing-pad code and adjust it so that it works with FastISel as well as with SDISel. - Add FastISel support for @llvm.eh.exception and @llvm.eh.selector. llvm-svn: 57539	2008-10-14 23:54:11 +00:00
Evan Cheng	07d53b1d33	Rename LoadX to LoadExt. llvm-svn: 57526	2008-10-14 21:26:46 +00:00
Chris Lattner	2753955fc0	Change CALLSEQ_BEGIN and CALLSEQ_END to take TargetConstant's as parameters instead of raw Constants. This prevents the constants from being selected by the isel pass, fixing PR2735. llvm-svn: 57385	2008-10-11 22:08:30 +00:00
Dale Johannesen	4f0bd68cfe	Add a "loses information" return value to APFloat::convert and APFloat::convertToInteger. Restore return value to IEEE754. Adjust all users accordingly. llvm-svn: 57329	2008-10-09 23:00:39 +00:00
Evan Cheng	94d14f2d45	Fix PR2850 and PR2863. Only generate movddup for 128-bit SSE vector shuffles. llvm-svn: 57210	2008-10-06 21:13:08 +00:00
Dale Johannesen	8c36a1c09c	Make atomic Swap work, 64-bit on x86-32. Make it all work in non-pic mode. llvm-svn: 57034	2008-10-03 22:25:52 +00:00
Dale Johannesen	5d60c1ebb1	Pass MemOperand through for 64-bit atomics on 32-bit, incidentally making the case where the memop is a pointer deref work. Fix cmp-and-swap regression. llvm-svn: 57027	2008-10-03 19:41:08 +00:00
Dan Gohman	0d1e9a8e04	Switch the MachineOperand accessors back to the short names like isReg, etc., from isRegister, etc. llvm-svn: 57006	2008-10-03 15:45:36 +00:00
Dale Johannesen	867d549fce	Handle some 64-bit atomics on x86-32, some of the time. llvm-svn: 56963	2008-10-02 18:53:47 +00:00
Bill Wendling	68f12ee567	Implement the -fno-builtin option in the front-end, not in the back-end. llvm-svn: 56900	2008-10-01 00:59:58 +00:00
Bill Wendling	1782584f56	Just don't transform this memset into "bzero" if no-builtin is specified. llvm-svn: 56888	2008-09-30 22:05:33 +00:00
Bill Wendling	bd09262e97	Add the new `-no-builtin' flag. This flag is meant to mimic the GCC `-fno-builtin' flag. Currently, it's used to replace "memset" with "_bzero" instead of "__bzero" on Darwin10+. This arguably violates the meaning of this flag, but is currently sufficient. The meaning of this flag should become more specific over time. llvm-svn: 56885	2008-09-30 21:22:07 +00:00
Dale Johannesen	f61a84ec43	Remove misuse of ReplaceNodeResults for atomics with valid types. No functional change. llvm-svn: 56808	2008-09-29 22:25:26 +00:00
Evan Cheng	3774b2f292	Re-apply 56683 with fixes. llvm-svn: 56748	2008-09-27 01:56:22 +00:00
Bill Wendling	c966a737c5	Temporarily reverting r56683. This is causing a failure during the build of llvm-gcc: /Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -mmacosx-version-min=10.4 -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Gir/devel/llvm/clean/llvm.obj/include -I/Volumes/Gir/devel/llvm/clean/llvm.src/include -fexceptions -fvisibility=hidden -DHIDE_EXPORTS -c ../../llvm-gcc.src/gcc/unwind-dw2-fde-darwin.c -o libgcc/./unwind-dw2-fde-darwin.o Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Gir/devel/llvm/clean/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311. ../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. {standard input}:3521:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb" {standard input}:3521:symbol: "_dwarf_reg_size_table" can't be undefined in a subtraction expression {standard input}:3520:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb" ... llvm-svn: 56703	2008-09-26 22:10:44 +00:00
Dan Gohman	6e0548336a	Rename ConstantSDNode's getSignExtended to getSExtValue, for consistancy with ConstantInt, and re-implement it in terms of ConstantInt's getSExtValue. llvm-svn: 56700	2008-09-26 21:54:37 +00:00
Evan Cheng	d77cbe8947	Fix @llvm.frameaddress codegen. FP elimination optimization should be disabled when frame address is desired. Also add support for depth > 0. llvm-svn: 56683	2008-09-26 19:48:35 +00:00
Dale Johannesen	0e32a2c935	Add "inreg" field to CallSDNode (doesn't increase its size). Adjust various lowering functions to pass this info through from CallInst. Use it to implement sseregparm returns on X86. Remove X86_ssecall calling convention. llvm-svn: 56677	2008-09-26 19:31:26 +00:00
Evan Cheng	9dbe45c000	Prefer movlhps over punpcklqdq, etc. in more cases. llvm-svn: 56627	2008-09-25 23:35:16 +00:00
Devang Patel	4c758ea3e0	Large mechanical patch. s/ParamAttr/Attribute/g s/PAList/AttrList/g s/FnAttributeWithIndex/AttributeWithIndex/g s/FnAttr/Attribute/g This sets the stage - to implement function notes as function attributes and - to distinguish between function attributes and return value attributes. This requires corresponding changes in llvm-gcc and clang. llvm-svn: 56622	2008-09-25 21:00:45 +00:00
Evan Cheng	74c9ed91b0	With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps llvm-svn: 56620	2008-09-25 20:50:48 +00:00
Evan Cheng	4751549f9b	X86ISD::VZEXT_LOAD should produce and fold a chain. llvm-svn: 56593	2008-09-24 23:26:36 +00:00
Evan Cheng	e0add20c1b	Properly handle 'm' inline asm constraints. If a GV is being selected for the addressing mode, it requires the same logic for PIC relative addressing, etc. llvm-svn: 56526	2008-09-24 00:05:32 +00:00
Dan Gohman	918fe08a56	Arrange for FastISel code to have access to the MachineModuleInfo object. This will be needed to support debug info. llvm-svn: 56508	2008-09-23 21:53:34 +00:00
Evan Cheng	9e9426cb82	Support x86 specific inline asm modifier 'J'. llvm-svn: 56483	2008-09-22 23:57:37 +00:00
Dale Johannesen	7a74e71489	Make log, log2, log10, exp, exp2 use Expand by default. llvm-svn: 56471	2008-09-22 21:57:32 +00:00
Arnold Schwaighofer	796a271c5f	Change the calling convention used when tail call optimization is enabled from CC_X86_32_TailCall to CC_X86_32_FastCC. llvm-svn: 56436	2008-09-22 14:50:07 +00:00
Bill Wendling	24c79f28b1	Reverting r56249. On further investigation, this functionality isn't needed. Apologies for the thrashing. llvm-svn: 56251	2008-09-16 21:48:12 +00:00
Bill Wendling	8bc392fb1d	- Change "ExternalSymbolSDNode" to "SymbolSDNode". - Add linkage to SymbolSDNode (default to external). - Change ISD::ExternalSymbol to ISD::Symbol. - Change ISD::TargetExternalSymbol to ISD::TargetSymbol These changes pave the way to allowing SymbolSDNodes with non-external linkage. llvm-svn: 56249	2008-09-16 21:12:30 +00:00
Dan Gohman	38453eebdc	Remove isImm(), isReg(), and friends, in favor of isImmediate(), isRegister(), and friends, to avoid confusion about having two different names with the same meaning. I'm not attached to the longer names, and would be ok with changing to the shorter names if others prefer it. llvm-svn: 56189	2008-09-13 17:58:21 +00:00
Dan Gohman	d3fe174c53	Define CallSDNode, an SDNode subclass for use with ISD::CALL. Currently it just holds the calling convention and flags for isVarArgs and isTailCall. And it has several utility methods, which eliminate magic 5+2*i and similar index computations in several places. CallSDNodes are not CSE'd. Teach UpdateNodeOperands to handle nodes that are not CSE'd gracefully. llvm-svn: 56183	2008-09-13 01:54:27 +00:00
Dan Gohman	effb894453	Rename ConstantSDNode::getValue to getZExtValue, for consistency with ConstantInt. This led to fixing a bug in TargetLowering.cpp using getValue instead of getAPIntValue. llvm-svn: 56159	2008-09-12 16:56:44 +00:00
Arnold Schwaighofer	dd45bc25ac	When tailcallopt is enabled all fastcc calls must have an aligned argument stack size. Add a test case. llvm-svn: 56119	2008-09-11 20:28:43 +00:00
Dale Johannesen	58d084c05b	The version of AtomicSDNode::AtomicSDNode used (only) for cmp-and-swap reversed the Cmp and Swap arguments; comments make it clear this is unintentional. Unfortunately, the x86 BE had a compensating reversal, which is removed here. PPC is OK. From inspection of the Alpha code I think it is OK, but if somebody has that platform please check it out. I cannot test on that platform. llvm-svn: 56091	2008-09-11 03:12:59 +00:00
Dan Gohman	39d82f902a	Add X86FastISel support for static allocas, and refences to static allocas. As part of this change, refactor the address mode code for laods and stores. llvm-svn: 56066	2008-09-10 20:11:02 +00:00
Evan Cheng	710c3cf36a	Fix a fastcc + sret bug. If fastcc and sret, callee doesn't need to pop the hidden struct ptr; Re-enable fastcc. llvm-svn: 56061	2008-09-10 18:25:29 +00:00
Dale Johannesen	4cc893bab6	Handle new intrinsics with vector arguments. Patch by Paul Redmond. llvm-svn: 56059	2008-09-10 17:31:40 +00:00
Duncan Sands	6d6a65310b	Fix name. llvm-svn: 56055	2008-09-10 13:22:10 +00:00
Duncan Sands	83e45acc25	Add trampoline support for the new FastCC calling convention (not related to recent Ada testsuite failures). llvm-svn: 56054	2008-09-10 13:11:09 +00:00
Duncan Sands	536c399579	Turn off the new FastCC for the moment. It causes a slew of Ada testsuite failures on x86-32 linux. Seems to be related to the use of float. llvm-svn: 56053	2008-09-10 13:09:24 +00:00
Anton Korobeynikov	6acb2219b6	Replace explicit pointer-size constants to TargetData query. No functionality change. llvm-svn: 55996	2008-09-09 18:22:57 +00:00
Anton Korobeynikov	2fd24e7713	Reapply 55899: First draft of EH support on x86/64-linux Now with fix, which prevents subtle codegen bug to trigger on darwin. No fix for bug though, it's still there. llvm-svn: 55955	2008-09-08 21:12:47 +00:00
Anton Korobeynikov	4112634ca6	Reapply blindly reverted 55898: Implement FRAME_TO_ARGS_OFFSET for x86-64 llvm-svn: 55954	2008-09-08 21:12:11 +00:00
Bill Wendling	3871441861	Reverting r55898 as well. This wasn't reverted in the original revert... llvm-svn: 55938	2008-09-08 19:42:32 +00:00
Bill Wendling	99b83712f3	Reverting r55898 to r55909. One of these patches was causing an ICE during the full bootstrap on Darwin: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/ -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include -I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include -DSHARED -m64 -DL_negdi2 -c ../../llvm-gcc.src/gcc/libgcc2.c -o libgcc/x86_64/_negdi2_s.o Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311. /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/bin/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/lib/ -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/include -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.4.0/sys-include -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.obj/include -I/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/include -DSHARED -m64 -DL_lshrdi3 -c ../../llvm-gcc.src/gcc/libgcc2.c -o libgcc/x86_64/_lshrdi3_s.o ../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. {standard input}:unknown:Undefined local symbol LBB21_11 {standard input}:unknown:Undefined local symbol LBB21_12 {standard input}:unknown:Undefined local symbol LBB21_13 {standard input}:unknown:Undefined local symbol LBB21_8 llvm-svn: 55928	2008-09-08 17:59:12 +00:00
Anton Korobeynikov	82b9540046	First draft of EH support on x86/64-linux llvm-svn: 55899	2008-09-08 14:21:53 +00:00
Anton Korobeynikov	cb0655d626	Implement FRAME_TO_ARGS_OFFSET for x86-64 llvm-svn: 55898	2008-09-08 14:21:10 +00:00
Evan Cheng	6f343bd543	Some code clean up. llvm-svn: 55881	2008-09-07 09:07:23 +00:00
Evan Cheng	6c94b99c62	For whatever the reason, x86 CallingConv::Fast (i.e. fastcc) was not passing scalar arguments in registers. This patch defines a new fastcc CC which is slightly different from the FastCall CC. In addition to passing integer arguments in ECX and EDX, it also specify doubles are passed in 8-byte slots which are 8-byte aligned (instead of 4-byte aligned). This avoids a potential performance hazard where doubles span cacheline boundaries. llvm-svn: 55807	2008-09-04 22:59:58 +00:00
Evan Cheng	3152edf474	Remove code that pad number of bytes to pop for X86_FastCall CC. The code doesn't do the "aligning" for Cygwin, Mingw, and Windows. But aligning it on Darwin and Linux breaks gcc compatibility. That ruled out all the platforms we support! llvm-svn: 55756	2008-09-04 01:04:15 +00:00
Dale Johannesen	da2d80688b	Add intrinsics for log, log2, log10, exp, exp2. No functional change (and no FE change to generate them). llvm-svn: 55753	2008-09-04 00:47:13 +00:00
Dan Gohman	7bda51f5a4	Create HandlePHINodesInSuccessorBlocksFast, a version of HandlePHINodesInSuccessorBlocks that works FastISel-style. This allows PHI nodes to be updated correctly while using FastISel. This also involves some code reorganization; ValueMap and MBBMap are now members of the FastISel class, so they needn't be passed around explicitly anymore. Also, SelectInstructions is changed to SelectInstruction, and only does one instruction at a time. llvm-svn: 55746	2008-09-03 23:12:08 +00:00
Evan Cheng	24422d4928	Let tblgen only generate fastisel routines, not the class definition. This makes it easier for targets to define its own fastisel class. llvm-svn: 55679	2008-09-03 00:03:49 +00:00
Evan Cheng	3fddc7e906	Swap fp comparison operands and change predicate to allow load folding (safely this time). llvm-svn: 55553	2008-08-29 23:22:12 +00:00
Evan Cheng	b3ed09703c	Backing out 55521. Not safe. llvm-svn: 55548	2008-08-29 22:13:21 +00:00
Evan Cheng	960b17a3c2	Swap fp comparison operands and change predicate to allow load folding. llvm-svn: 55521	2008-08-28 23:48:31 +00:00
Gabor Greif	95d77f5466	remove tabs, fix > 80 cols llvm-svn: 55511	2008-08-28 23:19:51 +00:00
Gabor Greif	f304a7aa4d	erect abstraction boundaries for accessing SDValue members, rename Val -> Node to reflect semantics llvm-svn: 55504	2008-08-28 21:40:38 +00:00
Rafael Espindola	26d54b3ef3	Use resize instead of reserve. Reserve doesn't change size(). llvm-svn: 55486	2008-08-28 18:32:53 +00:00
Dale Johannesen	41be0d4445	Split the ATOMIC NodeType's to include the size, e.g. ATOMIC_LOAD_ADD_{8,16,32,64} instead of ATOMIC_LOAD_ADD. Increased the Hardcoded Constant OpActionsCapacity to match. Large but boring; no functional change. This is to support partial-word atomics on ppc; i8 is not a valid type there, so by the time we get to lowering, the ATOMIC_LOAD nodes looks the same whether the type was i8 or i32. The information can be added to the AtomicSDNode, but that is the largest SDNode; I don't fully understand the SDNode allocation, but it is sensitive to the largest node size, so increasing that must be bad. This is the alternative. llvm-svn: 55457	2008-08-28 02:44:49 +00:00
Gabor Greif	abfdf928d8	disallow direct access to SDValue::ResNo, provide a getter instead llvm-svn: 55394	2008-08-26 22:36:50 +00:00
Chris Lattner	09f8cef571	If an xmm register is referenced explicitly in an inline asm, make sure to assign it to a version of the xmm register with the regclass that matches its type. This fixes PR2715, a bug handling some crazy xpcom case in mozilla. llvm-svn: 55358	2008-08-26 06:19:02 +00:00
Evan Cheng	f00f1e50b5	Try approach to moving call address load inside of callseq_start. Now it's done during the preprocess of x86 isel. callseq_start's chain is changed to load's chain node; while load's chain is the last of callseq_start or the loads or copytoreg nodes inserted to move arguments to the right spot. llvm-svn: 55338	2008-08-25 21:27:18 +00:00
Bill Wendling	5b836c5f77	Temporarily reverting r55292. It's causing a bootstraping failure: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc ... src/libiberty/make-temp-file.c -o make-temp-file.o Assertion failed: (Node2Index[SU->NodeNum] > Node2Index[I->Dep->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm.src/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp, line 508. ../../../../llvm-gcc.src/libiberty/hashtab.c:955: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. make[4]: * [hashtab.o] Error 1 make[4]: * Waiting for unfinished jobs.... make[3]: * [multi-do] Error 1 make[2]: * [all] Error 2 make[1]: * [all-target-libiberty] Error 2 make: * [all] Error 2 llvm-svn: 55295	2008-08-24 21:45:30 +00:00
Evan Cheng	8fa17424f7	Move callseq_start above the call address load to allow load to be folded into the call node. llvm-svn: 55292	2008-08-24 19:19:55 +00:00
Bill Wendling	2fd7dbaf1d	If part of the mask is "undef", then ignore it as we don't care what goes into it. llvm-svn: 55147	2008-08-21 22:36:36 +00:00
Bill Wendling	765d3e0013	Fix whitespace. No functionality change. llvm-svn: 55146	2008-08-21 22:35:37 +00:00
Evan Cheng	9534ea03e8	Fix a number of byval / memcpy / memset related codegen issues. 1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying. 2. Do not use byte repeat move and store operations. These are slow. llvm-svn: 55139	2008-08-21 21:00:15 +00:00
Mon P Wang	5c2ac4a5e0	Treat floating point ST1 the same as ST0 when lowering for a call result llvm-svn: 55135	2008-08-21 19:54:16 +00:00
Dan Gohman	02c84b8910	Simplify FastISel's constructor argument list, make the FastISel class hold a MachineRegisterInfo member, and make the MachineBasicBlock be passed in to SelectInstructions rather than the FastISel constructor. llvm-svn: 55076	2008-08-20 21:05:57 +00:00
Dale Johannesen	6f765f392c	Add remaining 64-bit atomic patterns for x86-64. llvm-svn: 55029	2008-08-20 00:48:50 +00:00
Bill Wendling	f00f3055d8	Revert r55018 and apply the correct "fix" for the 64-bit sub_and_fetch atomic. Just expand it like the other X-bit sub_and_fetches. llvm-svn: 55023	2008-08-20 00:28:16 +00:00
Dan Gohman	daef7f43af	Instantiate FastISel for X86. llvm-svn: 55011	2008-08-19 21:45:35 +00:00
Dan Gohman	4619e93bd3	The X86 target will soon have an implementation of createFastISel. llvm-svn: 55010	2008-08-19 21:32:53 +00:00
Dale Johannesen	5afbf510aa	Add support for 8 and 16 bit forms of __sync builtins on X86. Change "lock" instructions to be on a separate line. This is needed to work around a bug in the Darwin assembler. llvm-svn: 54999	2008-08-19 18:47:28 +00:00
Evan Cheng	ab35bfdf18	Fix a (u)comiss intrinsic lowering bug. It was using anyext which can return junk in higher bits. Patch by Nate Begeman. llvm-svn: 54903	2008-08-17 19:22:34 +00:00
Anton Korobeynikov	93584cd5a0	Use correct name for TLS address resolution routine on x86-64 llvm-svn: 54845	2008-08-16 12:58:29 +00:00
Dan Gohman	7c2bf62b14	Also avoid pinsrw and pinsrb with a variable insertelement index. llvm-svn: 54803	2008-08-14 22:53:18 +00:00
Dan Gohman	65d83ccf26	Don't try to use the insertps instruction for vector element inserts with non-constant indices. This fixes CodeGen/X86/vector-variable-idx.ll on machines that have SSE4.1. llvm-svn: 54801	2008-08-14 22:43:26 +00:00
Evan Cheng	7823a411d5	Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64. llvm-svn: 54376	2008-08-05 22:19:15 +00:00
Dan Gohman	8ef79ebd5f	Add an assert to catch invalid VECTOR_SHUFFLE mask indices. llvm-svn: 54329	2008-08-04 23:09:15 +00:00
Andrew Lenharth	77e3e86e70	Add atomic sub for other sizes llvm-svn: 54314	2008-08-03 20:17:34 +00:00
Dan Gohman	2ce6f2ad5e	Rename SDOperand to SDValue. llvm-svn: 54128	2008-07-27 21:46:04 +00:00
Dan Gohman	91e5dcb680	Tidy SDNode::use_iterator, and complete the transition to have it parallel its analogue, Value::value_use_iterator. The operator* method now returns the user, rather than the use. llvm-svn: 54127	2008-07-27 20:43:25 +00:00
Nate Begeman	283b2da27a	Disable mov{L, LP, HP, HLP, *DUP} shuffles for mmx mmx needs its own fancy shuffle logic based on unpack; for now we get correct but awful code. Also commit Mon Ping's VSETCC patch llvm-svn: 54039	2008-07-25 19:05:58 +00:00
Evan Cheng	a2b4b4ad99	Fix PR2485: do all 4-element SSE shuffles in max. of 2 shuffle instructions. Based on patch by Nicolas Capens. llvm-svn: 53939	2008-07-23 00:22:17 +00:00
Evan Cheng	0c23ed6364	Factor out SSE 4 wide shuffle lowering code into its own function. No functionality changes. llvm-svn: 53933	2008-07-22 21:13:36 +00:00
Evan Cheng	0384670141	Fix PR2574: implement v2f32 scalar_to_vector. llvm-svn: 53927	2008-07-22 18:39:19 +00:00
Duncan Sands	b0e3938651	Add VerifyNode, a place to put sanity checks on generic SDNode's (nodes with their own constructors should do sanity checking in the constructor). Add sanity checks for BUILD_VECTOR and fix all the places that were producing bogus BUILD_VECTORs, as found by "make check". My favorite is the BUILD_VECTOR with only two operands that was being used to build a vector with four elements! llvm-svn: 53850	2008-07-21 10:20:31 +00:00
Bill Wendling	75840e6435	Fix for first part of PR2562. Generate the "pinsrw" instruction for inserts into v4i16 vectors. llvm-svn: 53807	2008-07-20 02:32:23 +00:00
Nate Begeman	55b7becb29	SSE codegen for vsetcc nodes llvm-svn: 53719	2008-07-17 16:51:19 +00:00
Mon P Wang	1e2c6bfa41	When lowering certain atomics, we need to copy the memoperand from the old atomic operation to the new one. llvm-svn: 53714	2008-07-17 04:54:06 +00:00
Evan Cheng	cf06fe476f	x86-64 PIC JIT fixes: do not generate the extra load for external GV's. llvm-svn: 53661	2008-07-16 01:34:02 +00:00
Dan Gohman	02c7c6cb33	Include a frame index in the "fixed stack" pseudo source value instead of using the frame index for the SVOffset, which was inconsistent. llvm-svn: 53486	2008-07-11 22:44:52 +00:00
Bill Wendling	5774466a33	The frame address on an x86-64 box needs to be offset by -8, not -4. llvm-svn: 53450	2008-07-11 07:18:52 +00:00
Dan Gohman	3b46030375	Pool-allocation for MachineInstrs, MachineBasicBlocks, and MachineMemOperands. The pools are owned by MachineFunctions. This drastically reduces the number of calls to malloc/free made during the "Emit" phase of scheduling, as well as later phases in CodeGen. Combined with other changes, this speeds up the "instruction selection" phase of CodeGen by 10% in some cases. llvm-svn: 53212	2008-07-07 23:14:23 +00:00
Duncan Sands	93e180342a	Rather than having a different custom legalization hook for each way in which a result type can be legalized (promotion, expansion, softening etc), just use one: ReplaceNodeResults, which returns a node with exactly the same result types as the node passed to it, but presumably with a bunch of custom code behind the scenes. No change if the new LegalizeTypes infrastructure is not turned on. llvm-svn: 53137	2008-07-04 11:47:58 +00:00
Duncan Sands	739a0548c4	Add a new getMergeValues method that does not need to be passed the list of value types, and use this where appropriate. Inappropriate places are where the value type list is already known and may be long, in which case the existing method is more efficient. llvm-svn: 53035	2008-07-02 17:40:58 +00:00
Duncan Sands	b55e5ece96	Highlight that getMergeValues optimization is being suppressed here. llvm-svn: 52952	2008-07-01 08:00:49 +00:00
Dan Gohman	fb19f9402b	Split ISD::LABEL into ISD::DBG_LABEL and ISD::EH_LABEL, eliminating the need for a flavor operand, and add a new SDNode subclass, LabelSDNode, for use with them to eliminate the need for a label id operand. Change instruction selection to let these label nodes through unmodified instead of creating copies of them. Teach the MachineInstr emitter how to emit a MachineInstr directly from an ISD label node. This avoids the need for allocating SDNodes for the label id and flavor value, as well as SDNodes for each of the post-isel label, label id, and label flavor. llvm-svn: 52943	2008-07-01 00:05:16 +00:00
Dan Gohman	4246cf8eea	Update comments to new-style syntax. llvm-svn: 52925	2008-06-30 21:00:56 +00:00
Dan Gohman	5c73a886b4	Rename ISD::LOCATION to ISD::DBG_STOPPOINT to better reflect its purpose, and give it a custom SDNode subclass so that it doesn't need to have line number, column number, filename string, and directory string, all existing as individual SDNodes to be the operands. This was the only user of ISD::STRING, StringSDNode, etc., so remove those and some associated code. This makes stop-points considerably easier to read in -view-legalize-dags output, and reduces overhead (creating new nodes and copying std::strings into them) on code containing debugging information. llvm-svn: 52924	2008-06-30 20:59:49 +00:00
Duncan Sands	1ae6ef83ee	Revert the SelectionDAG optimization that makes it impossible to create a MERGE_VALUES node with only one result: sometimes it is useful to be able to create a node with only one result out of one of the results of a node with more than one result, for example because the new node will eventually be used to replace a one-result node using ReplaceAllUsesWith, cf X86TargetLowering::ExpandFP_TO_SINT. On the other hand, most users of MERGE_VALUES don't need this and for them the optimization was valuable. So add a new utility method getMergeValues for creating MERGE_VALUES nodes which by default performs the optimization. Change almost everywhere to use getMergeValues (and tidy some stuff up at the same time). llvm-svn: 52893	2008-06-30 10:19:09 +00:00
Evan Cheng	3fc2372d3a	- Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a shift. - Add a readme entry for a missing vector_shuffle optimization that results in awful codegen. llvm-svn: 52740	2008-06-25 20:52:59 +00:00
Dan Gohman	aa01afd47c	Remove the OrigVT member from AtomicSDNode, as it is redundant with the base SDNode's VTList. llvm-svn: 52722	2008-06-25 16:07:49 +00:00
Mon P Wang	6a490371c9	Added MemOperands to Atomic operations since Atomics touches memory. Added abstract class MemSDNode for any Node that have an associated MemOperand Changed atomic.lcs => atomic.cmp.swap, atomic.las => atomic.load.add, and atomic.lss => atomic.load.sub llvm-svn: 52706	2008-06-25 08:15:39 +00:00
Dale Johannesen	e5f4ffbdf1	Add v2f32 (MMX) type to X86. Support is primitive: load,store,call,return,bitcast. This is enough to make call and return work. llvm-svn: 52691	2008-06-24 22:01:44 +00:00
Dan Gohman	1f2b2a4abe	Remove unnecessary #includes. llvm-svn: 52613	2008-06-22 19:21:26 +00:00
Eli Friedman	8d66e98c92	Fix a bug with <8 x i16> shuffle lowering on X86 where parts of the shuffle could be skipped. The check is invalid because the loop index i doesn't correspond to the element actually inserted. The correct check is already done a few lines earlier, for whether the element is already in the right spot, so this shouldn't have any effect on the codegen for code that was already correct. llvm-svn: 52486	2008-06-19 06:09:51 +00:00
Evan Cheng	e47ca0940f	Rather than avoiding to wrap ISD::DECLARE GV operand in X86ISD::Wrapper, simply handle it at dagisel time with x86 specific isel code. llvm-svn: 52377	2008-06-17 02:01:22 +00:00
Andrew Lenharth	f88d50bfcc	add missing atomic intrinsic from gcc llvm-svn: 52270	2008-06-14 05:48:15 +00:00
Anton Korobeynikov	729c4e95e2	Properly lower DYNAMIC_STACKALLOC - bracket all black magic with CALLSEQ_BEGIN & CALLSEQ_END. llvm-svn: 52225	2008-06-11 20:16:42 +00:00
Duncan Sands	11dd424539	Remove comparison methods for MVT. The main cause of apint codegen failure is the DAG combiner doing the wrong thing because it was comparing MVT's using < rather than comparing the number of bits. Removing the < method makes this mistake impossible to commit. Instead, add helper methods for comparing bits and use them. llvm-svn: 52098	2008-06-08 20:54:56 +00:00
Duncan Sands	13237ac3b9	Wrap MVT::ValueType in a struct to get type safety and better control the abstraction. Rename the type to MVT. To update out-of-tree patches, the main thing to do is to rename MVT::ValueType to MVT, and rewrite expressions like MVT::getSizeInBits(VT) in the form VT.getSizeInBits(). Use VT.getSimpleVT() to extract a MVT::SimpleValueType for use in switch statements (you will get an assert failure if VT is an extended value type - these shouldn't exist after type legalization). This results in a small speedup of codegen and no new testsuite failures (x86-64 linux). llvm-svn: 52044	2008-06-06 12:08:01 +00:00
Dan Gohman	714663ab94	Expand small memmovs using inline code. Set the X86 threshold for expanding memmove to a more plausible value, now that it's actually being used. llvm-svn: 51696	2008-05-29 19:42:22 +00:00
Evan Cheng	5e28227dbd	Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq. llvm-svn: 51667	2008-05-29 08:22:04 +00:00
Nate Begeman	f1e18c7c44	Don't attempt to create VZEXT_LOAD out of an extload. This an issue where the code generator would do something like this: f64 = load f32 <anyext>, f32mem v2f64 = insertelt undef, %0, 0 v2f64 = insertelt %1, 0.0, 1 into v2f64 = vzext_load f32mem which on x86 is movsd, when you really wanted a cvtss2sd/movsd pair. llvm-svn: 51624	2008-05-28 00:24:25 +00:00
Dan Gohman	3388d022ac	Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add load-folding table entries for PMULDQ and PMULLD. llvm-svn: 51489	2008-05-23 17:49:40 +00:00
Evan Cheng	29e59ad6c9	Fix typos and comments. llvm-svn: 51165	2008-05-15 22:13:02 +00:00
Evan Cheng	ef377adca0	Make use of vector load and store operations to implement memcpy, memmove, and memset. Currently only X86 target is taking advantage of these. llvm-svn: 51140	2008-05-15 08:39:06 +00:00
Dan Gohman	eabd647cd5	Change target-specific classes to use more precise static types. This eliminates the need for several awkward casts, including the last dynamic_cast under lib/Target. llvm-svn: 51091	2008-05-14 01:58:56 +00:00
Evan Cheng	1120279ae6	Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset. pshufd $1, (%rdi), %xmm0 movd %xmm0, %eax => movl 4(%rdi), %eax llvm-svn: 51026	2008-05-13 08:35:03 +00:00
Evan Cheng	b980f6fb3d	Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other. llvm-svn: 51008	2008-05-12 23:04:07 +00:00
Nate Begeman	d875c3e2fd	Initial X86 codegen support for VSETCC. llvm-svn: 51000	2008-05-12 20:34:32 +00:00
Evan Cheng	2609d5e779	Refactor isConsecutiveLoad from X86 to TargetLowering so DAG combiner can make use of it. llvm-svn: 50991	2008-05-12 19:56:52 +00:00
Dan Gohman	906716c40f	Fix a compile error on compilers that still want a return value in a non-void function that calls abort. llvm-svn: 50969	2008-05-12 16:17:19 +00:00
Evan Cheng	71b9afb053	When transforming a vector_shuffle to a load, the base address must not be an undef. llvm-svn: 50940	2008-05-10 06:46:49 +00:00
Dan Gohman	3c0e11af64	For now, abort when an ISD::VAARG is encountered on x86-64, rather than silently generate invalid code. llvm-gcc does not currently use VAArgInst; it lowers va_arg in the front-end. llvm-svn: 50930	2008-05-10 01:26:14 +00:00
Evan Cheng	bb48d55a88	If movl top bits are undef, let it be selected to movlps, etc. llvm-svn: 50928	2008-05-10 00:58:41 +00:00
Evan Cheng	961339bbdb	Handle a few more cases of folding load i64 into xmm and zero top bits. Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch. llvm-svn: 50918	2008-05-09 21:53:03 +00:00
Evan Cheng	78af38c392	Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine. llvm-svn: 50838	2008-05-08 00:57:18 +00:00
Mon P Wang	310a38d51e	Improved generated code for atomic operators llvm-svn: 50677	2008-05-05 22:56:23 +00:00
Evan Cheng	dbfcce37fe	Code clean up. No functionality change. llvm-svn: 50675	2008-05-05 22:12:23 +00:00
Mon P Wang	3e58393c3d	Added addition atomic instrinsics and, or, xor, min, and max. llvm-svn: 50663	2008-05-05 19:05:59 +00:00
Anton Korobeynikov	9205c8562c	Add General Dynamic TLS model for X86-64. Some parts looks really ugly (look for tlsaddr pattern), but should work. Work is in progress, more models will follow llvm-svn: 50630	2008-05-04 21:36:32 +00:00
Evan Cheng	d9481366e3	Select vector shift with non-immediate i32 shift amount operand by first moving the operand into the right register. llvm-svn: 50619	2008-05-04 09:15:50 +00:00
Arnold Schwaighofer	be0de34ede	Tail call optimization improvements: Move platform independent code (lowering of possibly overwritten arguments, check for tail call optimization eligibility) from target X86ISelectionLowering.cpp to TargetLowering.h and SelectionDAGISel.cpp. Initial PowerPC tail call implementation: Support ppc32 implemented and tested (passes my tests and test-suite llvm-test). Support ppc64 implemented and half tested (passes my tests). On ppc tail call optimization is performed if caller and callee are fastcc call is a tail call (in tail call position, call followed by ret) no variable argument lists or byval arguments option -tailcallopt is enabled Supported: * non pic tail calls on linux/darwin * module-local tail calls on linux(PIC/GOT)/darwin(PIC) * inter-module tail calls on darwin(PIC) If constraints are not met a normal call will be emitted. A test checking the argument lowering behaviour on x86-64 was added. llvm-svn: 50477	2008-04-30 09:16:33 +00:00
Dan Gohman	da44054867	Fix the SVOffset values for loads and stores produced by memcpy/memset expansion. It was a bug for the SVOffset value to be used in the actual address calculations. llvm-svn: 50359	2008-04-28 17:15:20 +00:00
Anton Korobeynikov	e183b3cd76	Properly lower vararg's FORMAL_ARGUMENTS node on win64 llvm-svn: 50325	2008-04-27 23:15:03 +00:00
Chris Lattner	724539c001	A few inline asm cleanups: - Make targetlowering.h fit in 80 cols. - Make LowerAsmOperandForConstraint const. - Make lowerXConstraint -> LowerXConstraint - Make LowerXConstraint return a const char* instead of taking a string byref. llvm-svn: 50312	2008-04-26 23:02:14 +00:00
Evan Cheng	1e78184a99	Extract the lower 64-bit if a MMX value is passed in a XMM register. llvm-svn: 50292	2008-04-25 20:13:28 +00:00
Evan Cheng	ccde6dd016	Special handling for MMX values being passed in either GPR64 or lower 64-bits of XMM registers. llvm-svn: 50289	2008-04-25 19:11:04 +00:00
Evan Cheng	df38b35a1e	MMX argument passing fixes: On Darwin / Linux x86-32, v8i8, v4i16, v2i32 values are passed in MM[0-2]. On Darwin / Linux x86-32, v1i64 values are passed in memory. On Darwin x86-64, v8i8, v4i16, v2i32 values are passed in XMM[0-7]. On Darwin x86-64, v1i64 values are passed in 64-bit GPRs. llvm-svn: 50257	2008-04-25 07:56:45 +00:00
Evan Cheng	9165e165dc	Fix bug in x86 memcpy / memset lowering. If there are trailing bytes not handled by rep instructions, a new memcpy / memset is introduced for them. However, since source / destination addresses are already adjusted, their offsets should be zero. llvm-svn: 50239	2008-04-25 00:26:43 +00:00
Dan Gohman	f166d2d0d6	Implement an x86-64 ABI detail of passing structs by hidden first argument. The x86-64 ABI requires the incoming value of %rdi to be copied to %rax on exit from a function that is returning a large C struct. Also, add a README-X86-64 entry detailing the missed optimization opportunity and proposing an alternative approach. llvm-svn: 50075	2008-04-21 23:59:07 +00:00
Chris Lattner	3b18762f40	Switch to using Simplified ConstantFP::get API. llvm-svn: 49977	2008-04-20 00:41:09 +00:00
Dan Gohman	ad4071a9e1	Fix the handling of va_copy on x86-64. As of llvm-gcc r49920 llvm-gcc is now lowering va_copy on x86-64, so this completes the fix for PR2230. llvm-svn: 49922	2008-04-18 20:55:41 +00:00
Roman Levenstein	a3ee1a38a3	Ongoing work on improving the instruction selection infrastructure: Rename SDOperandImpl back to SDOperand. Introduce the SDUse class that represents a use of the SDNode referred by an SDOperand. Now it is more similar to Use/Value classes. Patch is approved by Dan Gohman. llvm-svn: 49795	2008-04-16 16:15:27 +00:00
Dan Gohman	d43d3beeb0	Add support for the form of the SSE41 extractps instruction that puts its result in a 32-bit GPR. llvm-svn: 49762	2008-04-16 02:32:24 +00:00
Dan Gohman	8c99ccaf96	Recreate the size SDNode instead of reusing the old one in the x86 memcpy lowering code; this ensures that the size node has the desired result type. This fixes a regression from r49572 with @llvm.memcpy.i64 on x86-32. llvm-svn: 49761	2008-04-16 01:32:32 +00:00
Dan Gohman	2505d86783	Fix const-correctness issues with the SrcValue handling in the memory intrinsic expansion code. llvm-svn: 49666	2008-04-14 17:55:48 +00:00
Arnold Schwaighofer	634fc9a33a	This patch corrects the handling of byval arguments for tailcall optimized x86-64 (and x86) calls so that they work (... at least for my test cases). Should fix the following problems: Problem 1: When i introduced the optimized handling of arguments for tail called functions (using a sequence of copyto/copyfrom virtual registers instead of always lowering to top of the stack) i did not handle byval arguments correctly e.g they did not work at all :). Problem 2: On x86-64 after the arguments of the tail called function are moved to their registers (which include ESI/RSI etc), tail call optimization performs byval lowering which causes xSI,xDI, xCX registers to be overwritten. This is handled in this patch by moving the arguments to virtual registers first and after the byval lowering the arguments are moved from those virtual registers back to RSI/RDI/RCX. llvm-svn: 49584	2008-04-12 18:11:06 +00:00
Dan Gohman	544ab2c50b	Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal on any current target and aren't optimized in DAGCombiner. Instead of using intermediate nodes, expand the operations, choosing between simple loads/stores, target-specific code, and library calls, immediately. Previously, the code to emit optimized code for these operations was only used at initial SelectionDAG construction time; now it is used at all times. This fixes some cases where rep;movs was being used for small copies where simple loads/stores would be better. This also cleans up code that checks for alignments less than 4; let the targets make that decision instead of doing it in target-independent code. This allows x86 to use rep;movs in low-alignment cases. Also, this fixes a bug that resulted in the use of rep;stos for memsets of 0 with non-constant memory size when the alignment was at least 4. It's better to use the library in this case, which can be significantly faster when the size is large. This also preserves more SourceValue information when memory intrinsics are lowered into simple loads/stores. llvm-svn: 49572	2008-04-12 04:36:06 +00:00
Dan Gohman	8c7cf88f7e	Fix a bug that prevented x86-64 from using rep.movsq for 8-byte-aligned data. llvm-svn: 49571	2008-04-12 02:35:39 +00:00
Dan Gohman	33b3300178	Make isVectorClearMaskLegal's operand list const. llvm-svn: 49446	2008-04-09 20:09:42 +00:00
Roman Levenstein	51f532f92d	Re-commit of the r48822, where the infinite looping problem discovered by Dan Gohman is fixed. llvm-svn: 49330	2008-04-07 10:06:32 +00:00
Evan Cheng	f77b5ef3d0	Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps. llvm-svn: 49244	2008-04-05 00:30:36 +00:00
Evan Cheng	025cea1126	Backing out 48222 temporarily. llvm-svn: 49124	2008-04-03 03:13:16 +00:00
Dan Gohman	cb9f8f6e4e	Don't use __bzero for memset if the second argument isn't zero. llvm-svn: 49050	2008-04-01 20:56:18 +00:00
Dan Gohman	980d7200c1	Speculatively micro-optimize memory-zeroing calls on Darwin 10. llvm-svn: 49048	2008-04-01 20:38:36 +00:00
Dale Johannesen	efa81a6979	Accept 'y' constraint (MMX) in inline asm. llvm-svn: 49011	2008-04-01 00:57:48 +00:00
Dan Gohman	fd2eb00cc2	Fix a tokenfactor node to use the load chain rather than the load value. This fixes PR2177. llvm-svn: 48932	2008-03-28 23:45:16 +00:00
Roman Levenstein	358e04a185	Use a linked data structure for the uses lists of an SDNode, just like LLVM Value/Use does and MachineRegisterInfo/MachineOperand does. This allows constant time for all uses list maintenance operations. The idea was suggested by Chris. Reviewed by Evan and Dan. Patch is tested and approved by Dan. On normal use-cases compilation speed is not affected. On very big basic blocks there are compilation speedups in the range of 15-20% or even better. llvm-svn: 48822	2008-03-26 12:39:26 +00:00
Evan Cheng	615488ab45	- SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction. llvm-svn: 48746	2008-03-24 21:52:23 +00:00
Anton Korobeynikov	7f125b2ba5	Add convenient helper for win64 check. Simplify things slightly. llvm-svn: 48691	2008-03-22 20:57:27 +00:00
Anton Korobeynikov	7b4f4e1a86	Initial support for Win64 calling conventions. Still in early state. llvm-svn: 48690	2008-03-22 20:37:30 +00:00
Duncan Sands	d97eea372a	Introduce a new node for holding call argument flags. This is needed by the new legalize types infrastructure which wants to expand the 64 bit constants previously used to hold the flags on 32 bit machines. There are two functional changes: (1) in LowerArguments, if a parameter has the zext attribute set then that is marked in the flags; before it was being ignored; (2) PPC had some bogus code for handling two word arguments when using the ELF 32 ABI, which was hard to convert because of the bogusness. As suggested by the original author (Nicolas Geoffray), I've disabled it for the moment. Tested with "make check" and the Ada ACATS testsuite. llvm-svn: 48640	2008-03-21 09:14:45 +00:00
Chris Lattner	68b11e14bc	remove Evan's "ugly hack" that sorta attempted to get x86-64 return conventions correct, but was never enabled. We can now do the "right thing" with multiple return values. llvm-svn: 48635	2008-03-21 06:50:21 +00:00
Evan Cheng	7a3e750fd2	Fix this xform: (sra (shl X, m), result_size) -> (sign_extend (trunc (shl X, result_size - n - m))) llvm-svn: 48578	2008-03-20 02:18:41 +00:00
Christopher Lamb	8fe9109469	Fix X86's isTruncateFree to not claim that truncate to i1 is free. This fixes Bill's testcase that failed for r48491. llvm-svn: 48542	2008-03-19 08:30:06 +00:00
Evan Cheng	484064370a	Fix a x86-64 isel lowering bug that's been around forever. A x86-64 varargs function implicitly reads X86::AL, don't clobber it! llvm-svn: 48515	2008-03-18 23:36:35 +00:00
Chris Lattner	8a923e7c28	Reimplement the parameter attributes support, phase #1 . hilights: 1. There is now a "PAListPtr" class, which is a smart pointer around the underlying uniqued parameter attribute list object, and manages its refcount. It is now impossible to mess up the refcount. 2. PAListPtr is now the main interface to the underlying object, and the underlying object is now completely opaque. 3. Implementation details like SmallVector and FoldingSet are now no longer part of the interface. 4. You can create a PAListPtr with an arbitrary sequence of ParamAttrsWithIndex's, no need to make a SmallVector of a specific size (you can just use an array or scalar or vector if you wish). 5. All the client code that had to check for a null pointer before dereferencing the pointer is simplified to just access the PAListPtr directly. 6. The interfaces for adding attrs to a list and removing them is a bit simpler. Phase #2 will rename some stuff (e.g. PAListPtr) and do other less invasive changes. llvm-svn: 48289	2008-03-12 17:45:29 +00:00
Chris Lattner	120ad01fcb	start handling the 'f' x87 constraint. llvm-svn: 48239	2008-03-11 19:06:29 +00:00
Chris Lattner	1bd44363f2	Change the model for FP Stack return to use fp operands on the RET instruction instead of using FpSET_ST0_32. This also generalizes the code to handling returning of multiple FP results. llvm-svn: 48209	2008-03-11 03:23:40 +00:00
Chris Lattner	4b3a7fa823	Eliminate the FP_GET_ST0/FP_SET_ST0 target-specific dag nodes, just lower to copyfromreg/copytoreg instead. llvm-svn: 48174	2008-03-10 21:08:41 +00:00
Evan Cheng	ae2c56d93e	Default ISD::PREFETCH to expand. llvm-svn: 48169	2008-03-10 19:38:10 +00:00
Scott Michel	a6729e8666	Give TargetLowering::getSetCCResultType() a parameter so that ISD::SETCC's return ValueType can depend its operands' ValueType. This is a cosmetic change, no functionality impacted. llvm-svn: 48145	2008-03-10 15:42:14 +00:00
Dale Johannesen	4e622ec86d	Increase ISD::ParamFlags to 64 bits. Increase the ByValSize field to 32 bits, thus enabling correct handling of ByVal structs bigger than 0x1ffff. Abstract interface a bit. Fixes gcc.c-torture/execute/pr23135.c and gcc.c-torture/execute/pr28982b.c in gcc testsuite (were ICE'ing on ppc32, quietly producing wrong code on x86-32.) llvm-svn: 48122	2008-03-10 02:17:22 +00:00
Chris Lattner	4c869594bc	rename FP_SETRESULT -> FP_SET_ST0 llvm-svn: 48094	2008-03-09 07:08:44 +00:00
Chris Lattner	d587e580a6	rename FpGETRESULT32 -> FpGET_ST0_32 etc. Add support for isel'ing value preserving FP roundings from one fp stack reg to another into a noop, instead of stack traffic. llvm-svn: 48093	2008-03-09 07:05:32 +00:00
Chris Lattner	b6387c8a74	Finish implementing a readme entry: when inserting an i64 variable into a vector of zeros or undef, and when the top part is obviously zero, we can just use movd + shuffle. This allows us to compile vec_set-B.ll into: _test3: movl $1234567, %eax andl 4(%esp), %eax movd %eax, %xmm0 ret instead of: _test3: subl $28, %esp movl $1234567, %eax andl 32(%esp), %eax movl %eax, (%esp) movl $0, 4(%esp) movq (%esp), %xmm0 addl $28, %esp ret llvm-svn: 48090	2008-03-09 05:42:06 +00:00
Chris Lattner	eef374c197	Implement a readme entry, compiling #include <xmmintrin.h> __m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);} into: movl $1, %eax movd %eax, %xmm0 ret instead of a constant pool load. llvm-svn: 48063	2008-03-09 01:05:04 +00:00
Chris Lattner	ad58828354	1) Improve comments. 2) Don't try to insert an i64 value into the low part of a vector with movq on an x86-32 target. This allows us to compile: __m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);} into: _doload64: movaps LCPI1_0, %xmm0 ret instead of: _doload64: subl $28, %esp movl $0, 4(%esp) movl $1, (%esp) movq (%esp), %xmm0 addl $28, %esp ret llvm-svn: 48057	2008-03-08 22:59:52 +00:00
Chris Lattner	8a6ebd23a8	minor simplifications to this code, don't create a dead SCALAR_TO_VECTOR on paths that end up not using it. llvm-svn: 48056	2008-03-08 22:48:29 +00:00
Evan Cheng	95cf661534	Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0\|1\|2} and prefetchnta instructions. llvm-svn: 48042	2008-03-08 00:58:38 +00:00
Chris Lattner	d4defb00df	mark frem as expand for all legal fp types on x86, regardless of whether we're using SSE or not. This fixes PR2122. llvm-svn: 48006	2008-03-07 06:36:32 +00:00
Andrew Lenharth	357061a74d	64bit CAS on 32bit x86. llvm-svn: 47929	2008-03-05 01:15:49 +00:00
Andrew Lenharth	4fee9f35b5	x86-64 atomics llvm-svn: 47903	2008-03-04 21:13:33 +00:00
Dan Gohman	a986eea82f	Add support for lowering i64 SRA_PARTS and friends on x86-64. llvm-svn: 47865	2008-03-03 22:22:09 +00:00
Andrew Lenharth	f5c90ec12c	make CAS work llvm-svn: 47799	2008-03-01 22:27:48 +00:00
Andrew Lenharth	d032c33300	all but CAS working on x86 llvm-svn: 47798	2008-03-01 21:52:34 +00:00
Evan Cheng	c799065cc3	Add a quick and dirty "loop aligner pass". x86 uses it to align its loops to 16-byte boundaries. llvm-svn: 47703	2008-02-28 00:43:03 +00:00
Chris Lattner	83263b8cfb	Make X86TargetLowering::LowerSINT_TO_FP return without creating a dead stack slot and store if the SINT_TO_FP is actually legal. This allows us to compile: double a(double b) {return (unsigned)b;} to: _a: cvttsd2siq %xmm0, %rax movl %eax, %eax cvtsi2sdq %rax, %xmm0 ret instead of: _a: subq $8, %rsp cvttsd2siq %xmm0, %rax movl %eax, %eax cvtsi2sdq %rax, %xmm0 addq $8, %rsp ret crazy. llvm-svn: 47660	2008-02-27 05:57:41 +00:00
Arnold Schwaighofer	3bfca3e942	Refactor according to Evan's and Anton's suggestions. llvm-svn: 47635	2008-02-26 22:21:54 +00:00
Arnold Schwaighofer	1f17bf6171	Correct function comments. llvm-svn: 47606	2008-02-26 17:50:59 +00:00
Arnold Schwaighofer	69a10f4112	Add support for intermodule tail calls on x86/32bit with GOT-style position independent code. Before only tail calls to protected/hidden functions within the same module were optimized. Now all function calls are tail call optimized. llvm-svn: 47594	2008-02-26 10:21:54 +00:00
Arnold Schwaighofer	b01b99ec78	Change the lowering of arguments for tail call optimized calls. Before arguments that could overwrite each other were explicitly lowered to a stack slot, not giving the register allocator a chance to optimize. Now a sequence of copyto/copyfrom virtual registers ensures that arguments are loaded in (virtual) registers before they are lowered to the stack slot (and might overwrite each other). Also parameter stack slots are marked mutable for (potentially) tail calling functions. llvm-svn: 47593	2008-02-26 09:19:59 +00:00
Dale Johannesen	65b404d61c	Revise previous patch per review. llvm-svn: 47573	2008-02-25 22:29:22 +00:00

... 6 7 8 9 10 ...

1332 Commits