llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	c21742112b	Emit segmented-stack specific code into function prologues for X86. Modify the pass added in the previous patch to call this new code. This new prologues generated will call a libgcc routine (__morestack) to allocate more stack space from the heap when required Patch by Sanjoy Das. llvm-svn: 138812	2011-08-30 19:39:58 +00:00
Eli Friedman	850b9a9a84	Explicitly zero out parts of a vector which are required to be zero by the algorithm in LowerUINT_TO_FP_i32. This only has a substantial effect on the generated code when the input is extracted from a vector register; other ways of loading an i32 do the appropriate zeroing implicitly. Fixes PR10802. llvm-svn: 138768	2011-08-29 21:15:46 +00:00
Benjamin Kramer	61a1ff543c	Silence GCC warnings and make an array const. llvm-svn: 138706	2011-08-27 17:36:14 +00:00
Eli Friedman	5e5704277f	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00
Bruno Cardoso Lopes	8347b86293	Add support for AVX 256-bit version of MOVDDUP! llvm-svn: 138588	2011-08-25 21:40:37 +00:00
Bruno Cardoso Lopes	388eacee2c	Make isMOVDDUP mask check more strict and update comments! llvm-svn: 138587	2011-08-25 21:40:34 +00:00
Bruno Cardoso Lopes	296256fb32	Add support for 256-bit versions of VSHUFPD and VSHUFPS. llvm-svn: 138546	2011-08-25 02:58:26 +00:00
Bruno Cardoso Lopes	2953d7b320	Move all SHUFP* patterns close to the SHUFP* definitions. Also be explicit about which subtarget they refer to, and add AVX versions of the ones we currently don't. Make the mask check more strict, to be clear it won't be used to match to 256-bit versions! llvm-svn: 138514	2011-08-24 23:17:55 +00:00
Eli Friedman	9c73a57b20	Hook up 64-bit atomic load/store on x86-32. I plan to write more efficient implementations eventually. llvm-svn: 138505	2011-08-24 22:33:28 +00:00
Eli Friedman	38cd821dc4	Fix whitespace. llvm-svn: 138487	2011-08-24 21:17:30 +00:00
Eli Friedman	342e8df0e0	Basic x86 code generation for atomic load and store instructions. llvm-svn: 138478	2011-08-24 20:50:09 +00:00
Craig Topper	de92622aa5	Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711. llvm-svn: 138427	2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes	9e9f2ce32d	Fix a nasty bug where a v4i64 was being wrong emitted with 32-bit permutations. Also tidy up some patterns and make them close to their instruction definition! llvm-svn: 138392	2011-08-23 22:06:37 +00:00
Nick Lewycky	4c8ff77f1b	PerformSubCombine to work on integers larger than i128. Fixes a crasher. llvm-svn: 138354	2011-08-23 19:01:24 +00:00
Craig Topper	6612e35b0d	Add support for breaking 256-bit v16i16 and v32i8 VSETCC into two 128-bit ones, avoiding sclarization. Add vex form of pcmpeqq and pcmpgtq. Fixes more cases for PR10712. llvm-svn: 138321	2011-08-23 04:36:33 +00:00
Bruno Cardoso Lopes	74f090d44c	Add support for breaking 256-bit int VETCC into two 128-bit ones, avoding scalarization of the compare. Reduces code from 59 to 6 instructions. Fix PR10712. llvm-svn: 138271	2011-08-22 20:31:04 +00:00
Bruno Cardoso Lopes	1a87fcb9ba	Fix PR10688. Add support for spliting 256-bit vector shifts when the shift amount is variable llvm-svn: 137885	2011-08-17 22:12:20 +00:00
Bruno Cardoso Lopes	be5e987379	Introduce matching patterns for vbroadcast AVX instruction. The idea is to match splats in the form (splat (scalar_to_vector (load ...))) whenever the load can be folded. All the logic and instruction emission is working but because of PR8156, there are no ways to match loads, cause they can never be folded for splats. Thus, the tests are XFAILed, but I've tested and exercised all the logic using a relaxed version for checking the foldable loads, as if the bug was already fixed. This should work out of the box once PR8156 gets fixed since MayFoldLoad will work as expected. llvm-svn: 137810	2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes	6d33c7f303	Update comments about vector splat handling in x86 llvm-svn: 137808	2011-08-17 02:29:13 +00:00
Bruno Cardoso Lopes	ed786a346e	Now that we have a canonical way to handle 256-bit splats: vinsertf128 $1 + vpermilps $0, remove the old code that used to first do the splat in a 128-bit vector and then insert it into a larger one. This is better because the handling code gets simpler and also makes a better room for the upcoming vbroadcast! llvm-svn: 137807	2011-08-17 02:29:10 +00:00
Bruno Cardoso Lopes	2e99f1b3aa	Instead of always leaving the work to the generic legalizer when there is no support for native 256-bit shuffles, be more smart in some cases, for example, when you can extract specific 128-bit parts and use regular 128-bit shuffles for them. Example: For this shuffle: shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 1, i32 0, i32 7, i32 6> This was expanded to: vextractf128 $1, %ymm1, %xmm2 vpextrq $0, %xmm2, %rax vmovd %rax, %xmm1 vpextrq $1, %xmm2, %rax vmovd %rax, %xmm2 vpunpcklqdq %xmm1, %xmm2, %xmm1 vpextrq $0, %xmm0, %rax vmovd %rax, %xmm2 vpextrq $1, %xmm0, %rax vmovd %rax, %xmm0 vpunpcklqdq %xmm2, %xmm0, %xmm0 vinsertf128 $1, %xmm1, %ymm0, %ymm0 ret Now we get: vshufpd $1, %xmm0, %xmm0, %xmm0 vextractf128 $1, %ymm1, %xmm1 vshufpd $1, %xmm1, %xmm1, %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 llvm-svn: 137733	2011-08-16 18:21:54 +00:00
Bruno Cardoso Lopes	cbe7feeab9	Fix PR10656. It's only profitable to use 128-bit inserts and extracts when AVX mode is one. Otherwise is just more work for the type legalizer. llvm-svn: 137661	2011-08-15 21:45:54 +00:00
Bruno Cardoso Lopes	c53dd2ac01	Fix comment! llvm-svn: 137521	2011-08-12 21:54:42 +00:00
Bruno Cardoso Lopes	f15dfe5818	The VPERM2F128 is a AVX instruction which permutes between two 256-bit vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. llvm-svn: 137519	2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes	8fbf023c9b	Add a dag combine to xform 256-bit shuffles into simple vector inserts and extracts. This simple combine makes us generate only 1 instruction instead of 11 in the v8 case. llvm-svn: 137362	2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes	043c820800	Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict. llvm-svn: 137324	2011-08-11 18:59:13 +00:00
Nadav Rotem	efdd183f52	Add a comment, per Bruno's CR. llvm-svn: 137313	2011-08-11 17:05:47 +00:00
Nadav Rotem	1542d5a00a	[AVX] If the data which is going to be saved is already in two XMM registers (for example, after integer operation), do not pack the registers into a YMM before saving. Its better to save as two XMM registers. Before: vinsertf128 $1, %xmm3, %ymm0, %ymm3 vinsertf128 $0, %xmm1, %ymm3, %ymm1 vmovaps %ymm1, 416(%rsp) After: vmovaps %xmm3, 416+16(%rsp) vmovaps %xmm1, 416(%rsp) llvm-svn: 137308	2011-08-11 16:41:21 +00:00
Bruno Cardoso Lopes	a2d8bb97b9	Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing infinite recursive calls in legalize. Fix PR10562 llvm-svn: 137296	2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes	572c9aaf53	Use the splat index to generate the desired shuffle. Otherwise we could only get undefs and the vector shuffle becomes an undef, generating wrong code. llvm-svn: 137295	2011-08-11 02:49:41 +00:00
Eli Friedman	3ae39f8ad1	Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2). Fixes PR9693. llvm-svn: 137292	2011-08-11 01:48:05 +00:00
Nadav Rotem	410a11fe82	When performing a truncating store, it is sometimes possible to rearrange the data in-register prior to saving to memory. When we reorder the data in memory we prevent the need to save multiple scalars to memory, making a single regular store. llvm-svn: 137238	2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes	278ffd7d8e	Fix a bug in vpermilps mask checking. Fix PR10560 llvm-svn: 137194	2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes	72323966c8	Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556 llvm-svn: 137179	2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes	6963062a99	Use fp unpack instructions to unpack int types. Until we have AVX2, this is the best we can do for these patterns. This fix PR10554. llvm-svn: 137161	2011-08-09 22:18:37 +00:00
Bruno Cardoso Lopes	24dd1d4a27	Revert r137114 llvm-svn: 137127	2011-08-09 17:39:01 +00:00
Bruno Cardoso Lopes	ad3453cf2d	Handle sitofp between v4f64 <- v4i32. Fix PR10559 llvm-svn: 137114	2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes	af6a85484c	Make LowerVSETCC aware of AVX types and add patterns to match them. llvm-svn: 137090	2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes	c96953c12a	Add support for several vector shifts operations while in AVX mode. Fix PR10581 llvm-svn: 137067	2011-08-08 21:31:08 +00:00
Evan Cheng	19e3f80579	Fix an obvious type. Patch by Ivan Krasin. llvm-svn: 136899	2011-08-04 18:38:15 +00:00
Bill Wendling	e234f6ae0c	Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR. Fixes PR10527. llvm-svn: 136853	2011-08-04 00:32:58 +00:00
Benjamin Kramer	103e2ec2df	Remove unused variables. llvm-svn: 136803	2011-08-03 19:53:48 +00:00
Eli Friedman	04c5025cd5	Don't create a ridiculous EXTRACT_ELEMENT. PR10563. The testcase looks extremely fragile, so I'm adding an assertion which should catch any cases like this. llvm-svn: 136711	2011-08-02 18:38:35 +00:00
Bruno Cardoso Lopes	5ada908140	Make this kind of lowering to be supported by 256-bit instructions: shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0> To: shuffle (vload ptr)), undef, <1, 1, 1, 1> Fix PR10494 llvm-svn: 136691	2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes	a8e3673816	Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise the legalizer. This commit together with the two previous ones fixes PR10495. llvm-svn: 136654	2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes	616fe60548	Teach PreprocessISelDAG to be aware of vector types and to not process them. llvm-svn: 136653	2011-08-01 21:54:05 +00:00
Bruno Cardoso Lopes	bd30a4b584	Lower CONCAT_VECTORS to use two VINSERTF128 instructions instead of using a stack store. llvm-svn: 136652	2011-08-01 21:54:02 +00:00
Bruno Cardoso Lopes	7513939ddd	Since vectors with all ones can't be created with a 256-bit instruction, avoid returning early for v8i32 types, which would only be valid for vector with all zeros. Also split the handling of zeros and ones into separate checking logic since they are handled differently. This fixes PR10547 llvm-svn: 136642	2011-08-01 19:51:53 +00:00
Eli Friedman	adec587d5c	Misc optimizer+codegen work for 'cmpxchg' and 'atomicrmw'. They appear to be working on x86 (at least for trivial testcases); other architectures will need more work so that they actually emit the appropriate instructions for orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC, Mips, and Alpha backends need such changes.) llvm-svn: 136457	2011-07-29 03:05:32 +00:00
Bruno Cardoso Lopes	65ce5ea3ba	Fix two tests that I crashed in the previous commits. The mask elts on the second half must be reindexed. llvm-svn: 136454	2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes	81eb193f2e	Match VPERMIL masks more strictly and update the target specific mask generation to always catch the weird cases. llvm-svn: 136453	2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes	795f558532	Add DecodeShuffle shuffle support for VPERMIPD variantes llvm-svn: 136452	2011-07-29 01:31:11 +00:00
Bruno Cardoso Lopes	c00f6728bc	Fix a bug while generating target specific VPERMIL masks: skip undef mask elements. This fixes PR10529. llvm-svn: 136450	2011-07-29 01:31:04 +00:00
Bruno Cardoso Lopes	b9ba465de8	Enable usage of SSE4 extracts and inserts in their 128-bit AVX forms. Also tidy up code a bit. llvm-svn: 136449	2011-07-29 01:31:02 +00:00
Bruno Cardoso Lopes	6aee388423	Cleanup PALIGNR handling and remove the old palign pattern fragment. Also make PALIGNR masks to don't match 256-bits, which isn't supported It's also a step to solve PR10489 llvm-svn: 136448	2011-07-29 01:30:59 +00:00
Bruno Cardoso Lopes	8c19a8b5d5	Invert the subvector insertion to be more likely to be taken as a COPY llvm-svn: 136324	2011-07-28 01:26:53 +00:00
Bruno Cardoso Lopes	9e2a301216	Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move a convert pattern close to the instruction definition. llvm-svn: 136320	2011-07-28 01:26:39 +00:00
Eli Friedman	26a484852e	Code generation for 'fence' instruction. llvm-svn: 136283	2011-07-27 22:21:52 +00:00
Jeffrey Yasskin	6381c0100b	Explicitly cast narrowing conversions inside {}s that will become errors in C++0x. llvm-svn: 136211	2011-07-27 06:22:51 +00:00
Bruno Cardoso Lopes	f9324f4f6b	Move some code around to open opportunity for more shuffle matching llvm-svn: 136201	2011-07-27 00:56:37 +00:00
Bruno Cardoso Lopes	27a30a7792	The vpermilps and vpermilpd have different behaviour regarding the usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. llvm-svn: 136200	2011-07-27 00:56:34 +00:00
Benjamin Kramer	124ac2b997	Add a neat little two's complement hack for x86. On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add, which can be commuted and encoded efficiently. This code is generated for __builtin_clz and friends. llvm-svn: 136167	2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes	f8fe47bd2b	Recognize unpckh* masks and match 256-bit versions. The new versions are different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157	2011-07-26 22:03:40 +00:00
Eli Friedman	93dc04d5ca	Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130. llvm-svn: 136148	2011-07-26 21:02:58 +00:00
Bruno Cardoso Lopes	d77b383199	More movsldup/movshdup cleanup. Rewrite the mask matching function and add support for 256-bit versions (but no instruction selection yet, coming next). llvm-svn: 136050	2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes	5b268a4b82	More cleanup, subtarget info isn't used here. llvm-svn: 136049	2011-07-26 02:39:25 +00:00
Bruno Cardoso Lopes	9212bf275d	Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 This also fixes PR10452 llvm-svn: 136004	2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes	123dff0f58	- Handle special scalar_to_vector case: splats. Using a native 128-bit shuffle before inserting on a 256-bit vector. - Add AVX versions of movd/movq instructions - Introduce a few COPY patterns to match insert_subvector instructions. This turns a trivial insert_subvector instruction into a register copy, coalescing the xmm into a ymm and avoid emiting on more instruction. llvm-svn: 136002	2011-07-25 23:05:25 +00:00
Bruno Cardoso Lopes	276eb8debf	Reintroduce r135730, this is indeed the right approach, there is no native 256-bit vector instruction to do scalar_to_vector. llvm-svn: 136001	2011-07-25 23:05:16 +00:00
Eli Friedman	ea8c66fea5	Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask. Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle. llvm-svn: 135980	2011-07-25 21:36:45 +00:00
Rafael Espindola	77242dd537	Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64 too. Patch by Jeff Muizelaar. llvm-svn: 135789	2011-07-22 18:56:05 +00:00
Dan Gohman	c535278cf1	Fix x86's XALUO lowering to return its replacement values instead of doing the RAUW calls for the overflow value itself. This makes it more consistent with how the rest of LegalizeDAG works. llvm-svn: 135788	2011-07-22 18:45:15 +00:00
Benjamin Kramer	959b7e9df7	GCC complains about the angle of this line. Remove the escaped newline. llvm-svn: 135739	2011-07-22 01:02:57 +00:00
Bruno Cardoso Lopes	1872173841	Remove the 128-bit special handling from SCALAR_TO_VECTOR. This isn't the way to go. Doing this here will prevent several node matches later, and would have to force looking all the way through several VINSERTF128/VEXTRACTF128 chains to optimize simple things. llvm-svn: 135730	2011-07-22 00:15:10 +00:00
Bruno Cardoso Lopes	612e56174b	-Inspected a AVX code block added by someone in early Feb. This was never used and was actually very wrong, fix it and make it simpler. Also remove the ConcatVectors function, which is unused now. - Fix a introduction of useless nodes in r126664 and r126264. The VUNPCKL* should never be introduced cause we don't want duplicate nodes for 128 AVX and non-AVX modes, the actual instruction difference only exists during isel, but not for target specific DAG nodes. We only introduce V* target nodes when there is no 128-bit version already there. - Fix a fragile test and make it more useful. llvm-svn: 135729	2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes	91eff5140f	Add a DAGCombine for transforming 128->256 casts into a simple vxorps + vinsertf128 pair of instructions llvm-svn: 135727	2011-07-22 00:15:00 +00:00
Bruno Cardoso Lopes	dbebd01269	Introduce a new function to lower 256-bit vectors which are not direclty supported and should be promoted and handled by smaller shuffles llvm-svn: 135726	2011-07-22 00:14:56 +00:00
Bruno Cardoso Lopes	95d037721b	Rename function to be more specific and be more strict about its usage llvm-svn: 135725	2011-07-22 00:14:53 +00:00
Bruno Cardoso Lopes	178fb40612	- Register v16i16 as valid VR256 register class - Add more bitcasts for v16i16 - Since 135661 and 135662 already added the splat logic, just add one more splat test for v16i16 llvm-svn: 135663	2011-07-21 02:24:08 +00:00
Bruno Cardoso Lopes	b878caa5e2	Add support for 256-bit versions of VPERMIL instruction. This is a new instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 llvm-svn: 135662	2011-07-21 01:55:47 +00:00
Bruno Cardoso Lopes	fb4920eb25	Improve splat promotion to handle AVX types: v32i8 and v16i16. Also refactor the code and add a bunch of comments. The final shuffle emitted by handling 256-bit types is suitable for the VPERM shuffle instruction which is going to be introduced in a next commit (with a testcase which cover this commit) llvm-svn: 135661	2011-07-21 01:55:42 +00:00
Bruno Cardoso Lopes	0bdeacf03b	Tidy up code llvm-svn: 135656	2011-07-21 01:55:27 +00:00
Evan Cheng	bbf3b0de8b	Goodbye TargetAsmInfo. This eliminate last bit of CodeGen and Target in llvm-mc. There is still a bit more refactoring left to do in Targets. But we are now very close to fixing all the layering issues in MC. llvm-svn: 135611	2011-07-20 19:50:42 +00:00
Evan Cheng	d60fa58ba1	Sink getDwarfRegNum, getLLVMRegNum, getSEHRegNum from TargetRegisterInfo down to MCRegisterInfo. Also initialize the mapping at construction time. This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step towards fixing the layering violation. llvm-svn: 135424	2011-07-18 20:57:22 +00:00
Chris Lattner	229907cd11	land David Blaikie's patch to de-constify Type, with a few tweaks. llvm-svn: 135375	2011-07-18 04:54:35 +00:00
Bruno Cardoso Lopes	8df9cfc279	Fix a couple of things: 1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us canonize the loads and handle things the same way we use to handle for 128-bit registers. Despite of what one of the removed comments explained, the load promotion would not mess with VPERM, it's only a matter of doing the appropriate bitcasts when this instructions comes to be introduced. Also make LOAD v8i32 legal. 2) Doing 1) exposed two bugs: - v4i64 was being promoted to itself for several opcodes (introduced in r124447 by David Greene) causing endless recursion and the stack to explode. - there was no support for allOnes BUILD_VECTORs and ANDNP would fail to match because it was generating early target constant pools during lowering. 3) The testcases are already checked-in, doing 1) exposed the bugs in the current testcases. 4) Tidy up code to be more clear and explicit about AVX. llvm-svn: 135313	2011-07-15 22:24:33 +00:00
Eric Christopher	92464be28c	Check register class matching instead of width of type matching when determining validity of matching constraint. Allow i1 types access to the GR8 reg class for x86. Fixes PR10352 and rdar://9777108 llvm-svn: 135180	2011-07-14 20:13:52 +00:00
Nadav Rotem	771f29677f	[VECTOR-SELECT] During type legalization we often use the SIGN_EXTEND_INREG SDNode. When this SDNode is legalized during the LegalizeVector phase, it is scalarized because non-simple types are automatically marked to be expanded. In this patch we add support for lowering SIGN_EXTEND_INREG manually. This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements' flag. llvm-svn: 135144	2011-07-14 11:11:14 +00:00
Bruno Cardoso Lopes	9613b64916	Make X86ISD::ANDNP more general and Codegen 256-bit VANDNP. A more general version of X86ISD::ANDNP also opened the room for a little bit of refactoring. llvm-svn: 135088	2011-07-13 21:36:51 +00:00
Bruno Cardoso Lopes	7ba479d22f	The target specific node PANDN name is misleading. That happens because it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN instruction. Rename it. llvm-svn: 135087	2011-07-13 21:36:47 +00:00
Julien Lerouge	112fcc164a	Add _allrem, _aullrem and _allmul to the runtime for MSVC. http://llvm.org/bugs/show_bug.cgi?id=10305 llvm-svn: 134744	2011-07-08 21:40:25 +00:00
Cameron Zwarich	f03fa189ca	Add an intrinsic and codegen support for fused multiply-accumulate. The intent is to use this for architectures that have a native FMA instruction. llvm-svn: 134742	2011-07-08 21:39:21 +00:00
Nick Lewycky	9badf60203	Let the inline asm 'q' constraint match float, and on 64-bit double too. Fixes PR9602! llvm-svn: 134665	2011-07-08 00:19:27 +00:00
Eric Christopher	7a2a0f80de	Go ahead and emit the barrier on x86-64 even without sse2. The processor supports it just fine. Fixes PR9675 and rdar://9740801 llvm-svn: 134664	2011-07-08 00:04:56 +00:00
Eric Christopher	9721396dab	Add support for the X86 'l' constraint. Fixes PR10149 and rdar://9738585 llvm-svn: 134648	2011-07-07 22:29:07 +00:00
Eric Christopher	7e5f2350d3	Use getRegForInlineAsmConstraint instead of custom defining regclasses via vectors. Part of rdar://9643582 llvm-svn: 134079	2011-06-29 17:23:50 +00:00
Jakob Stoklund Olesen	7297e7e223	Clean up the handling of the x87 fp stack to make it more robust. Drop the FpMov instructions, use plain COPY instead. Drop the FpSET/GET instruction for accessing fixed stack positions. Instead use normal COPY to/from ST registers around inline assembly, and provide a single new FpPOP_RETVAL instruction that can access the return value(s) from a call. This is still necessary since you cannot tell from the CALL instruction alone if it returns anything on the FP stack. Teach fast isel to use this. This provides a much more robust way of handling fixed stack registers - we can tolerate arbitrary FP stack instructions inserted around calls and inline assembly. Live range splitting could sometimes break x87 code by inserting spill code in unfortunate places. As a bonus we handle floating point inline assembly correctly now. llvm-svn: 134018	2011-06-28 18:32:28 +00:00
Chad Rosier	15db390f8f	Replace dyn_cast<> with cast<> since the cast is already guarded by the necessary check. llvm-svn: 133874	2011-06-25 18:51:28 +00:00
Chad Rosier	bde13d3f76	Enable tail call optimization in the presence of a byval (x86-32 and x86-64). <rdar://problem/9483883> llvm-svn: 133858	2011-06-25 02:04:56 +00:00
Chad Rosier	e553e75b15	Hoist simple check above more complex checking to avoid unnecessary overheads. No functional change intended. llvm-svn: 133824	2011-06-24 21:15:36 +00:00
Evan Cheng	3a0c5e52ff	Remove TargetOptions.h dependency from X86Subtarget. llvm-svn: 133726	2011-06-23 17:54:54 +00:00
Benjamin Kramer	25e17b0f89	Remove unused but set variables. llvm-svn: 133347	2011-06-18 11:09:41 +00:00
John McCall	4b7a8d68ae	Add a new function attribute, nonlazybind, which inhibits lazy-loading optimizations when emitting calls to the function; instead those calls may use faster relocations which require the function to be immediately resolved upon loading the dynamic object featuring the call. This is useful when it is known that the function will be called frequently and pervasively and therefore there is no merit in delaying binding of the function. Currently only implemented for x86-64, where it turns into a call through the global offset table. Patch by Dan Gohman, who assures me that he's going to add LangRef documentation for this once it's committed. llvm-svn: 133080	2011-06-15 20:36:13 +00:00
Eric Christopher	0713a9d8fc	Add a parameter to CCState so that it can access the MachineFunction. No functional change. Part of PR6965 llvm-svn: 132763	2011-06-08 23:55:35 +00:00
Stuart Hastings	e0d3426e1a	Followup to 132458, omit unnecessary stack copy when x87 input is a load. rdar://problem/6373334 llvm-svn: 132696	2011-06-06 23:15:58 +00:00
Stuart Hastings	be605494ac	Reapply 132424 with fixes. This fixes PR10068. rdar://problem/5993888 llvm-svn: 132606	2011-06-03 23:53:54 +00:00
Eric Christopher	de9399bf76	Have LowerOperandForConstraint handle multiple character constraints. Part of rdar://9119939 llvm-svn: 132510	2011-06-02 23:16:42 +00:00
Rafael Espindola	aa318ae495	Revert 132424 to fix PR10068. llvm-svn: 132479	2011-06-02 19:57:47 +00:00
Stuart Hastings	8d530ad22a	Omit unnecessary stack copy when x87 input is a load. rdar://problem/6373334 llvm-svn: 132458	2011-06-02 15:57:11 +00:00
Stuart Hastings	7adc95f69e	Recommit 132404 with fixes. rdar://problem/5993888 llvm-svn: 132424	2011-06-01 21:33:14 +00:00
Stuart Hastings	aab130d995	Revert 132404 to appease a buildbot. rdar://problem/5993888 llvm-svn: 132419	2011-06-01 19:52:20 +00:00
Stuart Hastings	7b7c102f2c	Add support for x86 CMPEQSS and friends. These instructions do a floating-point comparison, generate a mask of 0s or 1s, and generally DTRT with NaNs. Only profitable when the user wants a materialized 0 or 1 at runtime. rdar://problem/5993888 llvm-svn: 132404	2011-06-01 17:17:45 +00:00
Stuart Hastings	9f20804216	FGETSIGN support for x86, using movmskps/pd. Will be enabled with a patch to TargetLowering.cpp. rdar://problem/5660695 llvm-svn: 132388	2011-06-01 04:39:42 +00:00
Stuart Hastings	493a12bf5e	Reverting 132105: it broke some LLVM-GCC DejaGNU tests. llvm-svn: 132108	2011-05-26 04:09:49 +00:00
Stuart Hastings	276f231c2f	Correctly handle a one-word struct passed byval on x86_64. rdar://problem/6920088 llvm-svn: 132105	2011-05-26 02:44:56 +00:00
Evan Cheng	88f9137fd7	- Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is non-zero. - Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero. rdar://9490949 llvm-svn: 131948	2011-05-24 01:48:22 +00:00
Chad Rosier	552f8c4819	Don't attempt to tail call optimize for Win64. llvm-svn: 131709	2011-05-20 00:59:28 +00:00
Evan Cheng	e8d2e9eb35	Revert r131664 and fix it in instcombine instead. rdar://9467055 llvm-svn: 131708	2011-05-20 00:54:37 +00:00
Eric Christopher	4014e5e208	Oddly people want to use the 'r' constraint for fp constants on x86. Fixes rdar://9218925 Fixes PR9601 llvm-svn: 131682	2011-05-19 21:33:47 +00:00
Evan Cheng	2b9bd38678	crc32 with 64-bit output zeros upper 32-bits. rdar://9467055 llvm-svn: 131664	2011-05-19 18:57:12 +00:00
Chad Rosier	f4e832b14e	Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible. llvm-svn: 131560	2011-05-18 19:59:50 +00:00
Eli Friedman	d000a2c26e	Clean up the mess created by r131467+r131469. llvm-svn: 131471	2011-05-17 18:02:22 +00:00
Stuart Hastings	c65d8eda7b	Revert 131467 due to buildbot complaint. llvm-svn: 131469	2011-05-17 16:59:46 +00:00
Stuart Hastings	3cf5308890	Fix an obscure issue in X86_64 parameter passing: if a tiny byval is passed as the fifth parameter, insure it's passed correctly (in R9). rdar://problem/6920088 llvm-svn: 131467	2011-05-17 16:45:55 +00:00
Nadav Rotem	d8edb1d5cc	Fix a bug in PerformEXTRACT_VECTOR_ELTCombine. The code created an ADD SDNode with two different types, in cases where the index and the ptr had different types. llvm-svn: 131461	2011-05-17 08:31:57 +00:00
Eli Friedman	d4a3609d30	Remove dead code. Fix associated test to use FileCheck. llvm-svn: 131424	2011-05-16 21:28:22 +00:00
Nadav Rotem	8f971c27fb	Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector. llvm-svn: 131179	2011-05-11 08:12:09 +00:00
Eli Friedman	2518f8376d	Make the logic for determining function alignment more explicit. No functionality change. llvm-svn: 131012	2011-05-06 20:34:06 +00:00
Daniel Dunbar	cd01ed5bd6	ADT/Triple: Renambe isOSX... methods to isMacOSX for consistency with the OS triple component. llvm-svn: 129838	2011-04-20 00:14:25 +00:00
Daniel Dunbar	100455a3c8	Target/X86: Eliminate uses of getDarwinVers(). llvm-svn: 129813	2011-04-19 21:04:12 +00:00
Chris Lattner	0ab5e2cded	Fix a ton of comment typos found by codespell. Patch by Luis Felipe Strano Moraes! llvm-svn: 129558	2011-04-15 05:18:47 +00:00
Evan Cheng	ee9d45dd55	Don't try to create zero-sized stack objects. llvm-svn: 128586	2011-03-30 23:44:13 +00:00
Benjamin Kramer	8d2227373d	Make helper static. llvm-svn: 128338	2011-03-26 12:38:19 +00:00
NAKAMURA Takumi	521eb7c11e	Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets. FIXME: Some cleanups would be needed. llvm-svn: 128206	2011-03-24 07:07:00 +00:00
Andrew Trick	4ab9a16569	Revert r128175. I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181	2011-03-23 23:11:02 +00:00
Andrew Trick	4046a0de91	Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS. (target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175	2011-03-23 22:16:02 +00:00
Evan Cheng	0663f23bd8	Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified. llvm-svn: 127981	2011-03-21 01:19:09 +00:00
Daniel Dunbar	327cd36f74	Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR", it broke a lot of things. llvm-svn: 127954	2011-03-19 21:47:14 +00:00
Evan Cheng	824a711305	SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953	2011-03-19 17:17:39 +00:00
Nadav Rotem	e7a101ccab	Add support for legalizing UINT_TO_FP of vectors on platforms which do not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951	2011-03-19 13:09:10 +00:00
Eli Friedman	59721e3238	Revert r127852; it's apparently causing an ICE on mingw. llvm-svn: 127909	2011-03-18 21:12:29 +00:00
Eli Friedman	1a916a3c0c	Add a target-specific branchless method for double-width relational comparisons on x86. Essentially, the way this works is that SUB+SBB sets the relevant flags the same way a double-width CMP would. This is a substantial improvement over the generic lowering in LLVM. The output is also shorter than the gcc-generated output; I haven't done any detailed benchmarking, though. llvm-svn: 127852	2011-03-18 02:34:11 +00:00
Cameron Zwarich	2ef0c69df1	Move more logic into getTypeForExtArgOrReturn. llvm-svn: 127809	2011-03-17 14:53:37 +00:00
Cameron Zwarich	34e7b3f77e	Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn(). llvm-svn: 127807	2011-03-17 14:21:56 +00:00
Cameron Zwarich	ac106273d4	The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte rather than an int. Thankfully, this only causes LLVM to miss optimizations, not generate incorrect code. This just fixes the zext at the return. We still insert an i32 ZextAssert when reading a function's arguments, but it is followed by a truncate and another i8 ZextAssert so it is not optimized. llvm-svn: 127766	2011-03-16 22:20:18 +00:00
Eric Christopher	cf56a5034f	Change the x86 32-bit scheduler to register pressure and fix up the corresponding testcases back to the previous versions. Fixes some performance regressions only seen on 32-bit. llvm-svn: 127441	2011-03-11 01:05:58 +00:00
Stuart Hastings	d17ae4e939	Revert 127359; it broke lencod. llvm-svn: 127382	2011-03-10 00:25:53 +00:00
Stuart Hastings	9955e2f912	X86 byval copies no longer always_inline. <rdar://problem/8706628> llvm-svn: 127359	2011-03-09 21:10:30 +00:00
NAKAMURA Takumi	58d1f93b03	Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4. llvm-svn: 127328	2011-03-09 11:33:15 +00:00
Benjamin Kramer	679cfb54ec	X86: Fix the (saddo/ssub x, 1) -> incl/decl selection to check the right operand for 1. Found by inspection. llvm-svn: 127247	2011-03-08 15:20:20 +00:00
Eric Christopher	eb19e9e9fc	Turn on list-ilp scheduling by default on x86 and x86-64, fix up testcases accordingly. Some are currently xfailed and will be filed as bugs to be fixed or understood. Performance results: roughly neutral on SPEC some micro benchmarks in the llvm suite are up between 100 and 150%, only a pair of regressions that are due to be investigated john-the-ripper saw: 10% improvement in traditional DES 8% improvement in BSDI DES 59% improvement in FreeBSD MD5 67% improvement in OpenBSD Blowfish 14% improvement in LM DES Small compile time impact. llvm-svn: 127208	2011-03-08 02:42:25 +00:00
Cameron Zwarich	df61694417	Move getRegPressureLimit() from TargetLoweringInfo to TargetRegisterInfo. llvm-svn: 127175	2011-03-07 21:56:36 +00:00
Andrew Trick	641e2d4f8c	Increased the register pressure limit on x86_64 from 8 to 12 regs. This is the only change in this checkin that may affects the default scheduler. With better register tracking and heuristics, it doesn't make sense to artificially lower the register limit so much. Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to give the scheduler a way to account for div and sqrt on targets that don't have an itinerary. It is currently defaults to 10 (the actual number doesn't matter much), but only takes effect on non-default schedulers: list-hybrid and list-ilp. Added several heuristics that can be individually disabled for the non-default sched=list-ilp mode. This helps us determine how much better we can do on a given benchmark than the default scheduler. Certain compute intensive loops run much faster in this mode with the right set of heuristics, and it doesn't seem to have much negative impact elsewhere. Not all of the heuristics are needed, but we still need to experiment to decide which should be disabled by default for sched=list-ilp. llvm-svn: 127067	2011-03-05 08:00:22 +00:00
David Greene	dd567b214b	[AVX] Fix mask predicates for 256-bit UNPCKLPS/D and implement missing patterns for them. Add a SIMD test subdirectory to hold tests for SIMD instruction selection correctness and quality. ' llvm-svn: 126845	2011-03-02 17:23:43 +00:00
David Greene	20a1cbefad	[AVX] Add decode support for VUNPCKLPS/D instructions, both 128-bit and 256-bit forms. Because the number of elements in a vector does not determine the vector type (4 elements could be v4f32 or v4f64), pass the full type of the vector to decode routines. llvm-svn: 126664	2011-02-28 19:06:56 +00:00
Owen Anderson	b2c80da4ae	Allow targets to specify a the type of the RHS of a shift parameterized on the type of the LHS. llvm-svn: 126518	2011-02-25 21:41:48 +00:00
Chris Lattner	0152b7bc7c	remove command line option debugging hook. llvm-svn: 126441	2011-02-24 21:53:03 +00:00
David Greene	9a6040dc86	[AVX] General VUNPCKL codegen support. llvm-svn: 126264	2011-02-22 23:31:46 +00:00
Devang Patel	f3292b2196	Revert r124611 - "Keep track of incoming argument's location while emitting LiveIns." In other words, do not keep track of argument's location. The debugger (gdb) is not prepared to see line table entries for arguments. For the debugger, "second" line table entry marks beginning of function body. This requires some coordination with debugger to get this working. - The debugger needs to be aware of prolog_end attribute attached with line table entries. - The compiler needs to accurately mark prolog_end in line table entries (at -O0 and at -O1+) llvm-svn: 126155	2011-02-21 23:21:26 +00:00
Eric Christopher	ac6b001f56	If both operands are loads from stores in memory we can't use movlpd/movlps since one needs to be a register operand. Just use movss instead of forcing an operand into a register. Fixes PR9239 llvm-svn: 126072	2011-02-20 05:04:42 +00:00
Eric Christopher	c509ff6944	Fix typos. llvm-svn: 126018	2011-02-19 03:19:09 +00:00
David Greene	3a2b508e8f	[AVX] Recorganize X86ShuffleDecode into its own library (LLVMX86Utils.a) to break cyclic library dependencies between LLVMX86CodeGen.a and LLVMX86AsmParser.a. Previously this code was in a header file and marked static but AVX requires some additional functionality here that won't be used by all clients. Since including unused static functions causes a gcc compiler warning, keeping it as a header would break builds that use -Werror. Putting this in its own library solves both problems at once. llvm-svn: 125765	2011-02-17 19:18:59 +00:00
Stuart Hastings	81c4306005	Swap VT and DebugLoc operands of getExtLoad() for consistency with other getNode() methods. Radar 9002173. llvm-svn: 125665	2011-02-16 16:23:55 +00:00
Chris Lattner	46c01a30f4	Enhance ComputeMaskedBits to know that aligned frameindexes have their low bits set to zero. This allows us to optimize out explicit stack alignment code like in stack-align.ll:test4 when it is redundant. Doing this causes the code generator to start turning FI+cst into FI\|cst all over the place, which is general goodness (that is the canonical form) except that various pieces of the code generator don't handle OR aggressively. Fix this by introducing a new SelectionDAG::isBaseWithConstantOffset predicate, and using it in places that are looking for ADD(X,CST). The ARM backend in particular was missing a lot of addressing mode folding opportunities around OR. llvm-svn: 125470	2011-02-13 22:25:43 +00:00
David Greene	79827a5a26	[AVX] Implement 256-bit vector lowering for SCALAR_TO_VECTOR. This largely completes support for 128-bit fallback lowering for code that is not 256-bit ready. llvm-svn: 125315	2011-02-10 23:11:29 +00:00
David Greene	ce318e4958	[AVX] Implement 256-bit vector lowering for EXTRACT_VECTOR_ELT. llvm-svn: 125284	2011-02-10 16:57:36 +00:00
David Greene	b36195ab26	[AVX] Implement 256-bit vector lowering for INSERT_VECTOR_ELT. llvm-svn: 125187	2011-02-09 15:32:06 +00:00
David Greene	10b0db1d5f	[AVX] Implement BUILD_VECTOR lowering for 256-bit vectors. For anything but the simplest of cases, lower a 256-bit BUILD_VECTOR by splitting it into 128-bit parts and recombining. llvm-svn: 125105	2011-02-08 19:04:41 +00:00
David Greene	79651c527b	[AVX] Insert/extract subvector lowering support. This includes a couple of utility functions that will be used in other places for more AVX lowering. llvm-svn: 125029	2011-02-07 19:36:54 +00:00
NAKAMURA Takumi	1850c80afb	Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one. llvm-svn: 124949	2011-02-05 15:11:32 +00:00
NAKAMURA Takumi	b21c3db920	lib/Target/X86/X86ISelLowering.cpp: Introduce a new variable "IsWin64". No functional changes. llvm-svn: 124948	2011-02-05 15:11:13 +00:00
NAKAMURA Takumi	f7f319d4d3	Target/X86: Fix whitespace. llvm-svn: 124946	2011-02-05 15:10:54 +00:00
David Greene	96d07a82b2	[AVX] Revert 124910 until clients are ready. llvm-svn: 124912	2011-02-05 00:24:41 +00:00
David Greene	bdd481507a	[AVX] Add some utilities to insert and extract 128-bit subvectors. This allows us to easily support 256-bit operations that don't have native 256-bit support. This applies to integer operations, certain types of shuffles and various othher things. llvm-svn: 124910	2011-02-04 23:29:33 +00:00
David Greene	653f1eed2d	[AVX] Support VSINSERTF128 with more patterns and appropriate infrastructure. This makes lowering 256-bit vectors to 128-bit vectors simple when 256-bit vector support is not available. llvm-svn: 124868	2011-02-04 16:08:29 +00:00
David Greene	c4da110fd2	[AVX] VEXTRACTF128 support. This commit includes patterns for matching EXTRACT_SUBVECTOR to VEXTRACTF128 along with support routines to examine and translate index values. VINSERTF128 comes next. With these two in place we can begin supporting more AVX operations as INSERT/EXTRACT can be used as a fallback when 256-bit support is not available. llvm-svn: 124797	2011-02-03 15:50:00 +00:00
Rafael Espindola	d11311f291	Fix PR9127 by reversing the operands even if they have more then one use. Reversing the operands allows us to fold, but doesn't force us to. Also, at this point the DAG is still being optimized, so the check for hasOneUse is not very precise. llvm-svn: 124773	2011-02-03 03:58:05 +00:00
Evan Cheng	d22a4a1fd6	Patches to build EFI with Clang/LLVM. By Carl Norum. llvm-svn: 124639	2011-02-01 01:14:13 +00:00
Devang Patel	56cc5fdf09	Keep track of incoming argument's location while emitting LiveIns. llvm-svn: 124611	2011-01-31 21:38:14 +00:00
David Greene	34f7c0d8aa	[AVX] Clean up the code to configure target lowering for AVX. Specify how to lower more/new operations. This is a prerequisite for adding additional AVX lowering. llvm-svn: 124447	2011-01-27 22:38:56 +00:00
David Greene	bab5e6ed0e	[AVX] Add INSERT_SUBVECTOR and support it on x86. This provides a default implementation for x86, going through the stack in a similr fashion to how the codegen implements BUILD_VECTOR. Eventually this will get matched to VINSERTF128 if AVX is available. llvm-svn: 124307	2011-01-26 19:13:22 +00:00
David Greene	b6f1611928	[AVX] Support EXTRACT_SUBVECTOR on x86. This provides a default implementation of EXTRACT_SUBVECTOR for x86, going through the stack in a similr fashion to how the codegen implements BUILD_VECTOR. Eventually this will get matched to VEXTRACTF128 if AVX is available. llvm-svn: 124292	2011-01-26 15:38:49 +00:00
NAKAMURA Takumi	0cfdac078e	Target/X86: Tweak win64's tailcall. llvm-svn: 124272	2011-01-26 02:04:09 +00:00
NAKAMURA Takumi	9d29eff198	Fix whitespace. llvm-svn: 124270	2011-01-26 02:03:37 +00:00
Chris Lattner	218092e68e	fix PR8981, a crash trying to form a conditional inc with a floating point compare. llvm-svn: 123560	2011-01-16 02:56:53 +00:00
Anton Korobeynikov	2f93128109	Rename TargetFrameInfo into TargetFrameLowering. Also, put couple of FIXMEs and fixes here and there. llvm-svn: 123170	2011-01-10 12:39:04 +00:00
Jakob Stoklund Olesen	2fb5b31578	Simplify a bunch of isVirtualRegister() and isPhysicalRegister() logic. These functions not longer assert when passed 0, but simply return false instead. No functional change intended. llvm-svn: 123155	2011-01-10 02:58:51 +00:00
Evan Cheng	078b0b095e	Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call. llvm-svn: 123048	2011-01-08 01:24:27 +00:00
Evan Cheng	a048c83fe4	Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit. llvm-svn: 123015	2011-01-07 19:35:30 +00:00
Evan Cheng	7998b1d6fe	Use movups to lower memcpy and memset even if it's not fast (like corei7). The theory is it's still faster than a pair of movq / a quad of movl. This will probably hurt older chips like P4 but should run faster on current and future Intel processors. rdar://8817010 llvm-svn: 122955	2011-01-06 07:58:36 +00:00
Evan Cheng	3ae2b79aa3	Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy etc. takes an option OptSize. If OptSize is true, it would return the inline limit for functions with attribute OptSize. llvm-svn: 122952	2011-01-06 06:52:41 +00:00
Benjamin Kramer	6020ed9d99	X86: Lower a select directly to a setcc_carry if possible. int test(unsigned long a, unsigned long b) { return -(a < b); } compiles to _test: ## @test cmpq %rsi, %rdi ## encoding: [0x48,0x39,0xf7] sbbl %eax, %eax ## encoding: [0x19,0xc0] ret ## encoding: [0xc3] instead of _test: ## @test xorl %ecx, %ecx ## encoding: [0x31,0xc9] cmpq %rsi, %rdi ## encoding: [0x48,0x39,0xf7] movl $-1, %eax ## encoding: [0xb8,0xff,0xff,0xff,0xff] cmovael %ecx, %eax ## encoding: [0x0f,0x43,0xc1] ret ## encoding: [0xc3] llvm-svn: 122451	2010-12-22 23:09:28 +00:00
Benjamin Kramer	f6ddc4a1de	Add some x86 specific dagcombines for conditional increments. (add Y, (sete X, 0)) -> cmp X, 1; adc 0, Y (add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y (sub (sete X, 0), Y) -> cmp X, 1; sbb 0, Y (sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y for unsigned foo(unsigned a, unsigned b) { if (a == 0) b++; return b; } we now get: foo: cmpl $1, %edi movl %esi, %eax adcl $0, %eax ret instead of: foo: testl %edi, %edi sete %al movzbl %al, %eax addl %esi, %eax ret llvm-svn: 122364	2010-12-21 21:41:44 +00:00
Chris Lattner	3e5fbd74ed	rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for something that just glues two nodes together, even if it is sometimes used for flags. llvm-svn: 122310	2010-12-21 02:38:05 +00:00
Nate Begeman	4b9db07b02	Implement feedback from Bruno on making pblendvb an x86-specific ISD node in addition to being an intrinsic, and convert lowering to use it. Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing. llvm-svn: 122277	2010-12-20 22:04:24 +00:00
Chris Lattner	5c00d41688	now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results, the same as setcc. Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS). This is a step towards finishing off PR5443. In the testcase in that bug we now get: movq %rdi, %rax addq %rsi, %rax sbbq %rcx, %rcx testb $1, %cl setne %dl ret instead of: movq %rdi, %rax addq %rsi, %rax movl $0, %ecx adcq $0, %rcx testq %rcx, %rcx setne %dl ret llvm-svn: 122219	2010-12-20 01:37:09 +00:00
Chris Lattner	9c26d2711b	use for loop over types. llvm-svn: 122214	2010-12-20 01:03:27 +00:00
Chris Lattner	846c20d4e6	Change the X86 backend to stop using the evil ADDC/ADDE/SUBC/SUBE nodes (which their carry depenedencies with MVT::Flag operands) and use clean and beautiful EFLAGS dependences instead. We do this by changing the modelling of SBB/ADC to have EFLAGS input and outputs (which is what requires the previous scheduler change) and change X86 ISelLowering to custom lower ADDC and friends down to X86ISD::ADD/ADC/SUB/SBB nodes. With the previous series of changes, this causes no changes in the testsuite, woo. llvm-svn: 122213	2010-12-20 00:59:46 +00:00
Mon P Wang	1064992c84	Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type. llvm-svn: 122206	2010-12-19 23:55:53 +00:00
Chris Lattner	9edf3f50bf	improve the setcc -> setcc_carry optimization to happen more consistently by moving it out of lowering into dag combine. Add some missing patterns for matching away extended versions of setcc_c. llvm-svn: 122201	2010-12-19 22:08:31 +00:00
Chris Lattner	6dddab2ffe	simplify some code to just reuse a setcc if we can instead of going through the CSE maps to get it. llvm-svn: 122196	2010-12-19 21:23:48 +00:00
Chris Lattner	c37bb023b1	now that generic vector types aren't selected onto MMX operations, we don't need -disable-mmx anymore. llvm-svn: 122189	2010-12-19 20:19:20 +00:00
Chris Lattner	ae756e1980	reduce copy/paste programming with the power of for loops. llvm-svn: 122187	2010-12-19 20:07:10 +00:00
Chris Lattner	1e8c032a6e	X86 supports i8/i16 overflow ops (except i8 multiplies), we should generate them. Now we compile: define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp { entry: %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b) %cmp = extractvalue %0 %0, 1 br i1 %cmp, label %if.then, label %if.end into: _X: ## @X ## BB#0: ## %entry subl $12, %esp movb 16(%esp), %al addb 20(%esp), %al jo LBB0_2 Before we were generating: _X: ## @X ## BB#0: ## %entry pushl %ebp movl %esp, %ebp subl $8, %esp movb 12(%ebp), %al testb %al, %al setge %cl movb 8(%ebp), %dl testb %dl, %dl setge %ah cmpb %cl, %ah sete %cl addb %al, %dl testb %dl, %dl setge %al cmpb %al, %ah setne %al andb %cl, %al testb %al, %al jne LBB0_2 llvm-svn: 122186	2010-12-19 20:03:11 +00:00
Nate Begeman	97b72c99d2	Add support for matching psign & plendvb to the x86 target Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts llvm-svn: 122098	2010-12-17 22:55:37 +00:00
Nate Begeman	8b08f5232b	Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable. llvm-svn: 121439	2010-12-10 00:26:57 +00:00
Eric Christopher	a8aaaee379	Rewrite the darwin tlv support to use a chain and return to copying the output to the correct register. Fixes a hidden problem uncovered by the last patch where we'd try to DAG combine our MVT::Other node oddly. llvm-svn: 121358	2010-12-09 06:25:53 +00:00
Eric Christopher	8783074091	Stop confusing people, it's not really a chain, or a tumor. llvm-svn: 121340	2010-12-09 00:57:19 +00:00
Eric Christopher	d84970ae8b	Remove extraneous copy from DAG conversion for darwin tls. This was popping up at O0 when it wasn't folded and the fast allocator would complain. llvm-svn: 121330	2010-12-09 00:27:58 +00:00
Chris Lattner	6886171792	Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags result. This allows us to compile: void *test12(long count) { return new int[count]; } into: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx movq $-1, %rdi cmovnoq %rax, %rdi jmp __Znam ## TAILCALL instead of: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx seto %cl testb %cl, %cl movq $-1, %rdi cmoveq %rax, %rdi jmp __Znam Of course it would be even better if the regalloc inverted the cmov to 'cmovoq', which would eliminate the need for the 'movq %rdi, %rax'. llvm-svn: 120936	2010-12-05 07:49:54 +00:00
Chris Lattner	364bb0a081	it turns out that when ".with.overflow" intrinsics were added to the X86 backend that they were all implemented except umul. This one fell back to the default implementation that did a hi/lo multiply and compared the top. Fix this to check the overflow flag that the 'mul' instruction sets, so we can avoid an explicit test. Now we compile: void *func(long count) { return new int[count]; } into: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] seto %cl ## encoding: [0x0f,0x90,0xc1] testb %cl, %cl ## encoding: [0x84,0xc9] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL instead of: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL Other than the silly seto+test, this is using the o bit directly, so it's going in the right direction. llvm-svn: 120935	2010-12-05 07:30:36 +00:00
Chris Lattner	116580a11c	generalize the previous check to handle -1 on either side of the select, inserting a not to compensate. Add a missing isZero check that I lost somehow. This improves codegen of: void *func(long count) { return new int[count]; } from: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] to: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01] sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff] notq %rdi ## encoding: [0x48,0xf7,0xd7] orq %rax, %rdi ## encoding: [0x48,0x09,0xc7] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] llvm-svn: 120932	2010-12-05 02:00:51 +00:00
Chris Lattner	342e6ea5f9	Improve an integer select optimization in two ways: 1. generalize (select (x == 0), -1, 0) -> (sign_bit (x - 1)) to: (select (x == 0), -1, y) -> (sign_bit (x - 1)) \| y 2. Handle the identical pattern that happens with !=: (select (x != 0), y, -1) -> (sign_bit (x - 1)) \| y cmov is often high latency and can't fold immediates or memory operands. For example for (x == 0) ? -1 : 1, before we got: < testb %sil, %sil < movl $-1, %ecx < movl $1, %eax < cmovel %ecx, %eax now we get: > cmpb $1, %sil > sbbl %eax, %eax > orl $1, %eax llvm-svn: 120929	2010-12-05 01:23:24 +00:00
Benjamin Kramer	2f489236ab	Add patterns for the x86 popcnt instruction. - Also adds a new POPCNT subtarget feature that is currently enabled if the target supports SSE4.2 (nehalem) or SSE4A (barcelona). llvm-svn: 120917	2010-12-04 20:32:23 +00:00
Benjamin Kramer	8ceebfaa04	Simplify code. No functionality change. llvm-svn: 120907	2010-12-04 14:22:24 +00:00
Evan Cheng	419ea286ee	Fix and re-enable tail call optimization of expanded libcalls. llvm-svn: 120622	2010-12-01 22:59:46 +00:00
Duncan Sands	c4fb38b821	I don't think it makes any sense to assert that the target supports SSE3 here. The user (i.e. whoever generated a call to the intrinsic in the first place) is essentially asking for a particular instruction to be placed in the assembler. If that instruction won't execute on the target machine, that's their problem not ours. Two buildbots with processors that don't support SSE3 were barfing on the apm.ll test in CodeGen/X86 because of this assertion. llvm-svn: 120574	2010-12-01 12:58:13 +00:00
Evan Cheng	a695abde49	Speculatively disable x86 portion of r120501 to appease the x86_64 buildbot. llvm-svn: 120549	2010-12-01 03:27:20 +00:00
Evan Cheng	d4b0873c06	Enable sibling call optimization of libcalls which are expanded during legalization time. Since at legalization time there is no mapping from SDNode back to the corresponding LLVM instruction and the return SDNode is target specific, this requires a target hook to check for eligibility. Only x86 and ARM support this form of sibcall optimization right now. rdar://8707777 llvm-svn: 120501	2010-11-30 23:55:39 +00:00
Eric Christopher	2d1bcf4aea	Fix insertion point in pcmp expander. While I'm there, clean up too many \n even for me. llvm-svn: 120411	2010-11-30 08:20:21 +00:00
Eric Christopher	1a86e8461a	Fix some cleanups from my last patch. llvm-svn: 120410	2010-11-30 08:10:28 +00:00
Eric Christopher	fa6657cec0	Rewrite mwait and monitor support and custom lower arguments. Fixes PR8573. llvm-svn: 120404	2010-11-30 07:20:12 +00:00
Rafael Espindola	c4774795ce	Move lowering of TLS_addr32 and TLS_addr64 to X86MCInstLower. llvm-svn: 120263	2010-11-28 21:16:39 +00:00
Rafael Espindola	5d882894d8	Lower TLS_addr32 and TLS_addr64. llvm-svn: 120225	2010-11-27 20:43:02 +00:00
Wesley Peck	527da1b6e2	Renaming ISD::BIT_CONVERT to ISD::BITCAST to better reflect the LLVM IR concept. llvm-svn: 119990	2010-11-23 03:31:01 +00:00
Anton Korobeynikov	0eecf5d201	Move hasFP() and few related hooks to TargetFrameInfo. llvm-svn: 119740	2010-11-18 21:19:35 +00:00
Chris Lattner	edb9d84dcc	add targetoperand flags for jump tables, constant pool and block address nodes to indicate when ha16/lo16 modifiers should be used. This lets us pass PowerPC/indirectbr.ll. The one annoying thing about this patch is that the MCSymbolExpr isn't expressive enough to represent ha16(label1-label2) which we need on PowerPC. I have a terrible hack in the meantime, but this will have to be revisited at some point. Last major conversion item left is global variable references. llvm-svn: 119105	2010-11-15 02:46:57 +00:00
Chris Lattner	7077efe894	move the pic base symbol stuff up to MachineFunction since it is trivial and will be shared between ppc and x86. This substantially simplifies the X86 backend also. llvm-svn: 119089	2010-11-14 22:48:15 +00:00
Chris Lattner	239f9a35ed	simplify getPICBaseSymbol a bit. llvm-svn: 119088	2010-11-14 22:37:11 +00:00
Peter Collingbourne	feea10bcdf	Recognise 32-bit ror-based bswap implementation used by uclibc llvm-svn: 119007	2010-11-13 19:54:30 +00:00
Peter Collingbourne	1c6437a62a	Support ; as asm separator llvm-svn: 119006	2010-11-13 19:54:23 +00:00
Dale Johannesen	6d95ed1760	Remove possibly useful info from comment, per Chris. llvm-svn: 118865	2010-11-12 00:43:18 +00:00
Duncan Sands	1462777017	Simplify uses of MVT and EVT. An MVT can be compared directly with a SimpleValueType, while an EVT supports equality and inequality comparisons with SimpleValueType. llvm-svn: 118169	2010-11-03 12:17:33 +00:00
Duncan Sands	fb0a48ef96	Factorize the duplicated logic for choosing the right argument calling convention out of the fast and normal ISel files, and into the calling convention TD file. llvm-svn: 117856	2010-10-31 13:21:44 +00:00
John Thompson	e8360b7182	Inline asm multiple alternative constraints development phase 2 - improved basic logic, added initial platform support. llvm-svn: 117667	2010-10-29 17:29:13 +00:00
Michael J. Spencer	7db918f1e9	x86-Win32: Switch ftol2 calling convention from stdcall to C. llvm-svn: 117474	2010-10-27 18:52:38 +00:00
Dale Johannesen	ec57ac1c3c	An stdcall function calling a non-stdcall function cannot use tailcall. PR 8461. llvm-svn: 117322	2010-10-25 22:17:05 +00:00
Duncan Sands	1f0d37e892	Add parentheses to pacify gcc, which warns otherwise. llvm-svn: 117020	2010-10-21 16:02:12 +00:00
Michael J. Spencer	f509c6ca27	X86: Add alloca probing to dynamic alloca on Windows. Fixes PR8424. llvm-svn: 116984	2010-10-21 01:41:01 +00:00
Dale Johannesen	320a553319	Remove Synthesizable from the Type system; as MMX vector types are no longer Legal on X86, we don't need it. No functional change. 8499854. llvm-svn: 116947	2010-10-20 21:32:10 +00:00
Michael J. Spencer	3e64de9504	X86: Add MS-CRT libcalls. llvm-svn: 116801	2010-10-19 07:32:52 +00:00
Michael J. Spencer	8b382e7e10	Fix Whitespace. llvm-svn: 116800	2010-10-19 07:32:42 +00:00
Eric Christopher	604e142844	Combine these together - should probably have some text associated that says what why what we just asserted is wrong. llvm-svn: 116333	2010-10-12 19:44:17 +00:00
Nick Lewycky	eb7b91d417	Mark variable 'NoImplicitFloatOps' used only in an assert as used. llvm-svn: 116323	2010-10-12 18:18:03 +00:00
Dan Gohman	395a898b2b	Initial va_arg support for x86-64. Patch by David Meyer! llvm-svn: 116319	2010-10-12 18:00:49 +00:00
Andrew Trick	e01c9001c9	Fixes bug 8297: i386 cmpxchg8b, missing MachineMemOperand llvm-svn: 116214	2010-10-11 19:02:04 +00:00
Michael J. Spencer	8dedb62019	X86: Call ulldiv and ftol2 on Windows instead of their libgcc eqivilents. llvm-svn: 116188	2010-10-11 05:29:15 +00:00
Michael J. Spencer	00765e5be0	X86: MinGW should always use libgcc on Windows. llvm-svn: 116177	2010-10-10 23:11:06 +00:00
Michael J. Spencer	7a573a5e1f	X86: Call _alldiv instead of __divdi3 on Windows (excluding cygwin). llvm-svn: 116174	2010-10-10 22:04:34 +00:00
Michael J. Spencer	bee1f7f5ba	Fix Whitespace. llvm-svn: 116173	2010-10-10 22:04:20 +00:00
Cameron Esfahani	d57f9ecd4a	Recommit 116056, now with the missing file... llvm-svn: 116083	2010-10-08 19:24:18 +00:00
Andrew Trick	cf97db2402	reverting 116056: win64_params.ll may need to be conditionalized? llvm-svn: 116063	2010-10-08 17:22:42 +00:00
Cameron Esfahani	a07b5c291d	Small patch to restore home register stack space allocation for the Win64 case. Add test case. This code eventually needs to be tighter, since it's always allocating it, even in leaf routines. llvm-svn: 116056	2010-10-08 10:31:30 +00:00
Evan Cheng	5c31bf0619	Canonicalize X86ISD::MOVDDUP nodes to v2f64 to make sure all cases match. Also eliminate unneeded isel patterns. rdar://8520311 llvm-svn: 115977	2010-10-07 20:50:20 +00:00
Anton Korobeynikov	d77a443631	va_args support for Win64. Patch by Cameron! llvm-svn: 115480	2010-10-03 22:52:07 +00:00
Dale Johannesen	dd224d2333	Massive rewrite of MMX: The x86_mmx type is used for MMX intrinsics, parameters and return values where these use MMX registers, and is also supported in load, store, and bitcast. Only the above operations generate MMX instructions, and optimizations do not operate on or produce MMX intrinsics. MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into smaller pieces. Optimizations may occur on these forms and the result casted back to x86_mmx, provided the result feeds into a previous existing x86_mmx operation. The point of all this is prevent optimizations from introducing MMX operations, which is unsafe due to the EMMS problem. llvm-svn: 115243	2010-09-30 23:57:10 +00:00
Chris Lattner	b5b71e07af	improve indentation llvm-svn: 114815	2010-09-27 06:34:01 +00:00
Eric Christopher	422e463be7	This code should never fire on non-darwin subtargets. llvm-svn: 114811	2010-09-27 06:01:51 +00:00
Dale Johannesen	6a4cd59b08	We can't return SSE/MMX vectors if SSE is disabled. llvm-svn: 114745	2010-09-24 19:05:48 +00:00
Bob Wilson	e1223fb583	Attempt to fix llvm-gcc build. It was crashing when building gcov.o for an ARM cross-compiler on x86, because the MMO size did not match the type size. This fixes the MMO size and also the size of the stack object to match the type size. llvm-svn: 114554	2010-09-22 17:35:14 +00:00
Chris Lattner	8a236b63d8	reimplement elf TLS support in terms of addressing modes, eliminating SegmentBaseAddress. llvm-svn: 114529	2010-09-22 04:39:11 +00:00
Chris Lattner	a5156c30ed	convert the last 4 X86ISD nodes that should have memoperands to have them. llvm-svn: 114523	2010-09-22 01:28:21 +00:00
Chris Lattner	ed85da5600	give X86ISD::FNSTCW16m a memoperand, since it touches memory. It only can access the stack due to how it is generated though. llvm-svn: 114522	2010-09-22 01:11:26 +00:00
Chris Lattner	78f518b79b	give FP_TO_INT16_IN_MEM and friends a memoperand. They are only used with stack slots, but hey, lets be safe. llvm-svn: 114521	2010-09-22 01:05:16 +00:00
Chris Lattner	54e5329545	give VZEXT_LOAD a memory operand, it now works with segment registers. llvm-svn: 114515	2010-09-22 00:34:38 +00:00
Chris Lattner	e479e9643b	give LCMPXCHG_DAG[8] a memory operand, allowing it to work with addrspace 256/257 llvm-svn: 114508	2010-09-21 23:59:42 +00:00
Owen Anderson	5e65dfbb97	Reimplement r114460 in target-independent DAGCombine rather than target-dependent, by using the predicate to discover the number of sign bits. Enhance X86's target lowering to provide a useful response to this query. llvm-svn: 114473	2010-09-21 20:42:50 +00:00
Chris Lattner	886250c8f0	convert a couple more places to use the new getStore() llvm-svn: 114463	2010-09-21 18:51:21 +00:00
Owen Anderson	f4b1a5bdc4	When adding the carry bit to another value on X86, exploit the fact that the carry-materialization (sbbl x, x) sets the registers to 0 or ~0. Combined with two's complement arithmetic, we can fold the intermediate AND and the ADD into a single SUB. This fixes <rdar://problem/8449754>. llvm-svn: 114460	2010-09-21 18:41:19 +00:00
Chris Lattner	802527adad	eliminate some uses of the getStore overload. llvm-svn: 114453	2010-09-21 17:50:43 +00:00
Chris Lattner	7727d05dbb	convert the targets off the non-MachinePointerInfo of getLoad. llvm-svn: 114410	2010-09-21 06:44:06 +00:00
Chris Lattner	82fd06d3ce	it's more elegant to put the "getConstantPool" and "getFixedStack" on the MachinePointerInfo class. While this isn't the problem I'm setting out to solve, it is the right way to eliminate PseudoSourceValue, so lets go with it. llvm-svn: 114406	2010-09-21 06:22:23 +00:00
Chris Lattner	c3e05d6e50	update the X86 backend to use the MachinePointerInfo version of one of the getLoad methods. This fixes at least one bug where an incorrect svoffset is passed in (a potential combiner-aa miscompile). llvm-svn: 114404	2010-09-21 06:02:19 +00:00
Chris Lattner	2510de2bea	reimplement memcpy/memmove/memset lowering to use MachinePointerInfo instead of srcvalue/offset pairs. This corrects SV info for mem operations whose size is > 32-bits. llvm-svn: 114401	2010-09-21 05:40:29 +00:00
Chris Lattner	e3d864b857	convert targets to the new MF.getMachineMemOperand interface. llvm-svn: 114391	2010-09-21 04:39:43 +00:00
John Thompson	1094c80281	Added skeleton for inline asm multiple alternative constraint support. llvm-svn: 113766	2010-09-13 18:15:37 +00:00
Bruno Cardoso Lopes	99a9f4661a	Minor change. Fix comments and remove unused and redundant code llvm-svn: 113378	2010-09-08 18:12:31 +00:00
Bruno Cardoso Lopes	f7fee1c185	x86 vector shuffle lowering now relies only on target specific nodes to emit shuffles and don't do isel mask matching anymore. - Add the selection of the remaining shuffle opcode (movddup) - Introduce two new functions to "recognize" where we may get potential folds and add several comments to them explaining why they are not yet in the desidered shape. - Add more patterns to fallback the case where we select a specific shuffle opcode as if it could fold a load, but it can't, so remap to a valid instruction. - Add a couple of FIXMEs to address in the following days once there's a good solution to the current folding problem. llvm-svn: 113369	2010-09-08 17:43:25 +00:00
Bruno Cardoso Lopes	6b1d62c529	Factor out some x86 vector shuffle rewriting and add comments about the direction the shuffle lowering is heading to llvm-svn: 113286	2010-09-07 21:03:14 +00:00
Bruno Cardoso Lopes	7c483028fb	Move code around to prepare for moving some of the logic together to another function llvm-svn: 113267	2010-09-07 20:20:27 +00:00
Bill Wendling	353802114f	Add an MVT::x86mmx type. It will take the place of all current MMX vector types. llvm-svn: 113261	2010-09-07 20:03:56 +00:00
Bruno Cardoso Lopes	5a45db3e6c	decouple MMX check from regular splat checks. Some refactoring is coming, and MMX should be left alone to be easily removed after moving to intrinsics llvm-svn: 113247	2010-09-07 18:41:45 +00:00
Bruno Cardoso Lopes	4f5d4b4a6e	Remove now useless check, because the code can be matched below, no need to leave it for isel llvm-svn: 113242	2010-09-07 18:29:03 +00:00
Bruno Cardoso Lopes	c9b3316fea	Minor change. Since the checks are equivalent, use isMMX llvm-svn: 113239	2010-09-07 18:24:00 +00:00
Bruno Cardoso Lopes	c6accda78e	Remove the last bit of isShuffleMaskLegal checks and improve the comment regarding mmx shuffles llvm-svn: 113059	2010-09-04 02:58:56 +00:00
Bruno Cardoso Lopes	731bcc1abf	make explicit that we not handle several mmx shuffles llvm-svn: 113058	2010-09-04 02:50:13 +00:00
Bruno Cardoso Lopes	20779ee157	Emit target specific nodes to handle palignr. Do not touch it for MMX versions yet. llvm-svn: 113056	2010-09-04 02:36:07 +00:00
Bruno Cardoso Lopes	cff7cd18ab	Emit target specific nodes to handle splats starting at zero indicies llvm-svn: 113055	2010-09-04 02:02:14 +00:00
Bruno Cardoso Lopes	95759917eb	Emit target specific nodes for isPSHUFHWMask and isPSHUFLWMask llvm-svn: 113050	2010-09-04 01:36:45 +00:00
Bruno Cardoso Lopes	2b57008c72	Emit target specific nodes for isSHUFPMask llvm-svn: 113048	2010-09-04 01:22:57 +00:00
Bruno Cardoso Lopes	2f7af36134	Previous isMOVLMask matching already emits targets nodes, remove check llvm-svn: 113047	2010-09-04 00:50:08 +00:00
Bruno Cardoso Lopes	9f8e704151	One more check from the original isShuffleMaskLegal goes away llvm-svn: 113045	2010-09-04 00:46:16 +00:00
Bruno Cardoso Lopes	16959372bb	Remove a duplicated but useless check that i've inserted in the previous commit. llvm-svn: 113044	2010-09-04 00:43:12 +00:00
Bruno Cardoso Lopes	44578d38d3	Refactor some code and remove the extra checks for unpckl_undef and unpckh_undef llvm-svn: 113043	2010-09-04 00:39:43 +00:00
Bruno Cardoso Lopes	7829d0e74b	Remove check for unpckh mask llvm-svn: 113035	2010-09-03 23:32:47 +00:00
Bruno Cardoso Lopes	d1dacc57aa	Remove check for unpckl mask llvm-svn: 113034	2010-09-03 23:31:50 +00:00
Bruno Cardoso Lopes	207b9d6218	Inline isShuffleMaskLegal into LowerVECTOR_SHUFFLE, so we can start checking each standalone condition and decide whether emit target specific nodes or remove the condition if it's already matched before. llvm-svn: 113031	2010-09-03 23:24:06 +00:00
Bruno Cardoso Lopes	2bef20eda7	Reapply considered harmfull part of rr112934 and r112942. "Use target specific nodes instead of relying in unpckl and unpckh pattern fragments during isel time. Also place a depth limit in getShuffleScalarElt. llvm-svn: 113020	2010-09-03 22:09:41 +00:00
Bruno Cardoso Lopes	fe8717c573	Reintroduce a simple function refactoring done in r112934, also without any functionality changes llvm-svn: 113008	2010-09-03 20:20:02 +00:00
Bruno Cardoso Lopes	48e589b122	Reapply piecies of r112942 and r112934 which don't do functional changes llvm-svn: 113007	2010-09-03 20:10:35 +00:00
Bruno Cardoso Lopes	6979cf0808	Reapply Fix comment llvm-svn: 113006	2010-09-03 19:55:05 +00:00
Daniel Dunbar	6f3da24d70	Revert r112934, "- Use specific nodes to match unpckl masks.", which introduced some infinite loop and select failures. - Apologies for eager reverting, but its branch day. llvm-svn: 113000	2010-09-03 19:38:11 +00:00
Daniel Dunbar	f1aacd55c0	Revert r112938 "Fix comment", which depends on r112934, which introduced some infinite loop and select failures. llvm-svn: 112999	2010-09-03 19:38:08 +00:00
Daniel Dunbar	0ffe4db45c	Revert r112942, "Use punpckh and unpckh family of nodes instead of using unpckh mask pattern fragment", which depends on r112934, which introduced some infinite loop and select failures. llvm-svn: 112998	2010-09-03 19:38:05 +00:00
Bruno Cardoso Lopes	a85ec10483	Use punpckh and unpckh family of nodes instead of using unpckh mask pattern fragment llvm-svn: 112942	2010-09-03 01:39:08 +00:00
Bruno Cardoso Lopes	adc6bca2dd	Fix comment llvm-svn: 112938	2010-09-03 01:28:51 +00:00
Bruno Cardoso Lopes	cce44678b4	- Use specific nodes to match unpckl masks. - Teach getShuffleScalarElt how to handle more target specific nodes, so the DAGCombine can make use of it. - Add another hack to avoid the node update problem during legalization. More description on the comments llvm-svn: 112934	2010-09-03 01:24:00 +00:00
Anton Korobeynikov	a689c5b2c0	Revert win64 changes. They seem to be incomplete llvm-svn: 112885	2010-09-02 22:31:32 +00:00
Anton Korobeynikov	56291f7e53	Properly allocate win64 shadow reg area. Patch by Jan Sjodin! llvm-svn: 112875	2010-09-02 22:16:28 +00:00
Bruno Cardoso Lopes	489613f1e5	Replace unpckl_undef and unpckh_undef matching with target specific opcodes llvm-svn: 112806	2010-09-02 05:23:12 +00:00
Bruno Cardoso Lopes	e4e4be3885	Move condition out to prepare for more matching llvm-svn: 112805	2010-09-02 04:20:26 +00:00
Bruno Cardoso Lopes	bf7fd146c7	Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it llvm-svn: 112804	2010-09-02 03:57:58 +00:00
Bruno Cardoso Lopes	6a7f634487	become more strict about when it's safe to use X86ISD::MOVLPS llvm-svn: 112799	2010-09-02 02:35:51 +00:00
Bruno Cardoso Lopes	04c25c15c7	Revert r112689, avoid those kind of checks cause they mess up with mmx llvm-svn: 112760	2010-09-01 22:59:03 +00:00
Bruno Cardoso Lopes	b3825216ce	Use movlps, movlpd, movss and movsd specific nodes instead of pattern matching with movlp pattern fragment llvm-svn: 112694	2010-09-01 05:08:25 +00:00
Bruno Cardoso Lopes	6aaebe877b	minor change, simplify some logic llvm-svn: 112689	2010-09-01 00:57:08 +00:00
Bruno Cardoso Lopes	2b025707a2	Move some functions around so they can be used for some other to come function llvm-svn: 112687	2010-09-01 00:51:36 +00:00
Bruno Cardoso Lopes	4b56d87290	Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes llvm-svn: 112661	2010-08-31 22:35:05 +00:00
Bruno Cardoso Lopes	61996ef835	Use x86 specific MOVSHDUP node and add more patterns to match it llvm-svn: 112657	2010-08-31 22:22:11 +00:00
Bruno Cardoso Lopes	5de15ce468	Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments llvm-svn: 112644	2010-08-31 21:38:49 +00:00
Bruno Cardoso Lopes	03e4c35302	Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes llvm-svn: 112642	2010-08-31 21:15:21 +00:00
Bruno Cardoso Lopes	dfd9dd5d75	Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles llvm-svn: 112570	2010-08-31 02:26:40 +00:00
Chris Lattner	94656b1c8c	fix the buildvector->insertp[sd] logic to not always create a redundant insertp[sd] $0, which is a noop. Before: _f32: ## @f32 pshufd $1, %xmm1, %xmm2 pshufd $1, %xmm0, %xmm3 addss %xmm2, %xmm3 addss %xmm1, %xmm0 ## kill: XMM0<def> XMM0<kill> XMM0<def> insertps $0, %xmm0, %xmm0 insertps $16, %xmm3, %xmm0 ret after: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movdqa %xmm2, %xmm0 insertps $16, %xmm3, %xmm0 ret The extra movs are due to a random (poor) scheduling decision. llvm-svn: 112379	2010-08-28 17:59:08 +00:00
Chris Lattner	bcb6090ad0	fix the BuildVector -> unpcklps logic to not do pointless shuffles when the top elements of a vector are undefined. This happens all the time for X86-64 ABI stuff because only the low 2 elements of a 4 element vector are defined. For example, on: _Complex float f32(_Complex float A, _Complex float B) { return A+B; } We used to produce (with SSE2, SSE4.1+ uses insertps): _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $16, %xmm2, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm0 addss %xmm1, %xmm0 pshufd $16, %xmm0, %xmm1 movdqa %xmm2, %xmm0 unpcklps %xmm1, %xmm0 ret We now produce: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movaps %xmm2, %xmm0 unpcklps %xmm3, %xmm0 ret This implements rdar://8368414 llvm-svn: 112378	2010-08-28 17:28:30 +00:00
Chris Lattner	96db6e66f4	improve comments in the unpcklps generating logic, introduce a new EltStride variable instead of reusing NumElems variable for a non-obvious purpose. No functionality change. llvm-svn: 112377	2010-08-28 17:15:43 +00:00
Bruno Cardoso Lopes	a982aa24ef	Clean up the logic of vector shuffles -> vector shifts. Also teach this logic how to handle target specific shuffles if needed, this is necessary while searching recursively for zeroed scalar elements in vector shuffle operands. llvm-svn: 112348	2010-08-28 02:46:39 +00:00
Anton Korobeynikov	c0b36921c2	Properly handle passing of FP stuff to varargs function on Win64: value should be copied to the corresponding shadow reg as well. Patch by Cameron Esfahani! llvm-svn: 112262	2010-08-27 14:43:06 +00:00
Bruno Cardoso Lopes	e25ba0c7c2	zap the now unused MVT::getIntVectorWithNumElements llvm-svn: 112218	2010-08-26 20:53:12 +00:00
Chris Lattner	eb2cc0ce0e	implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1. llvm-svn: 112171	2010-08-26 05:51:22 +00:00
Chris Lattner	cc60609cb4	fix sse1 only codegen in x86-64 mode, which is something we apparently try to support. llvm-svn: 112168	2010-08-26 05:24:29 +00:00
Bruno Cardoso Lopes	d4085f6e91	Revert this for now, PUNPCKLDQ dont operate on v4f32 llvm-svn: 112090	2010-08-25 21:26:37 +00:00
Anton Korobeynikov	b3b53ecac0	Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there. Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove other flags-clobberring stuff (e.g. cmp instructions) occuring after _alloca call. llvm-svn: 112034	2010-08-25 07:50:11 +00:00
Bruno Cardoso Lopes	0770d25758	PUNPCKLDQ should also be used for v4f32 llvm-svn: 112020	2010-08-25 02:55:40 +00:00
Bruno Cardoso Lopes	2e45d522c1	teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests llvm-svn: 112017	2010-08-25 02:35:37 +00:00
Dan Gohman	c88fda477a	Fix X86's isLegalAddressingMode to recognize that static addresses need not be RIP-relative in small mode. llvm-svn: 111917	2010-08-24 15:55:12 +00:00
Bruno Cardoso Lopes	758d7b1f5c	Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments llvm-svn: 111890	2010-08-24 01:16:15 +00:00
Bruno Cardoso Lopes	264d90fff7	Start using target speficic nodes for shuffles: pshufhw and pshuflw llvm-svn: 111837	2010-08-23 20:41:02 +00:00
Anton Korobeynikov	cbbe4501df	Revert invalid r111792. Jump tables are not broken on x86-64 / coff, it's COFF emitter which does not support differences of two symbols (and needs to be fixed). GAS is pretty fine with code produced. llvm-svn: 111801	2010-08-23 07:38:51 +00:00
Michael J. Spencer	e87231232a	Workaround broken jump tables on x86-64 COFF. llvm-svn: 111792	2010-08-23 04:45:37 +00:00
Bruno Cardoso Lopes	9f20e7a1bf	Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly llvm-svn: 111704	2010-08-21 01:32:18 +00:00
Bruno Cardoso Lopes	6f3b38a851	This is the first step towards refactoring the x86 vector shuffle code. The general idea here is to have a group of x86 target specific nodes which are going to be selected during lowering and then directly matched in isel. The commit includes the addition of those specific nodes and a bunch of patterns, and incrementally we're going to switch between them and what we have right now. Both the patterns and target specific nodes can change as we move forward with this work. llvm-svn: 111691	2010-08-20 22:55:05 +00:00
Anton Korobeynikov	231ab847ca	More fixes for win64: - Do not clobber al during variadic calls, this is AMD64 ABI-only feature - Emit wincall64, where necessary Patch by Cameron Esfahani! llvm-svn: 111289	2010-08-17 21:06:07 +00:00
Eric Christopher	54194bd127	Rework how the non-sse2 memory barrier is lowered so that the encoding is correct for the built-in assembler. Based on a patch from Chris. llvm-svn: 111083	2010-08-14 21:51:50 +00:00
Chris Lattner	2f6c3434ac	improve indentation llvm-svn: 111073	2010-08-14 17:26:09 +00:00
Bruno Cardoso Lopes	081861b6b7	Fix comment to reflect code, and remove an unused argument llvm-svn: 111022	2010-08-13 17:50:47 +00:00
Bruno Cardoso Lopes	7306c86886	Begin to support some vector operations for AVX 256-bit intructions. The long term goal here is to be able to match enough of vector_shuffle and build_vector so all avx intrinsics which aren't mapped to their own built-ins but to shufflevector calls can be codegen'd. This is the first (baby) step, support building zeroed vectors. llvm-svn: 110897	2010-08-12 02:06:36 +00:00
Dan Gohman	5531aa4de1	Use ISD::ADD instead of ISD::SUB with a negated constant. This avoids trouble if the return type of TD->getPointerSize() is changed to something which doesn't promote to a signed type, and is simpler anyway. Also, use getCopyFromReg instead of getRegister to read a physical register's value. llvm-svn: 110835	2010-08-11 18:14:00 +00:00
Bruno Cardoso Lopes	91d61df3eb	Add AVX matching patterns to Packed Bit Test intrinsics. Apply the same approach of SSE4.1 ptest intrinsics but create a new x86 node "testp" since AVX introduces vtest{ps}{pd} instructions which set ZF and CF depending on sign bit AND and ANDN of packed floating-point sources. This is slightly different from what the "ptest" does. Tests comming with the other 256 intrinsics tests. llvm-svn: 110744	2010-08-10 23:25:42 +00:00
Bruno Cardoso Lopes	85da72a88f	Support AVX 256-bit load and store intrinsics llvm-svn: 110645	2010-08-10 01:43:16 +00:00
Bruno Cardoso Lopes	77954bdf7a	Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX llvm-svn: 110394	2010-08-05 23:35:51 +00:00
Eric Christopher	2db8464282	Make x86-64 membarriers work without sse and clean up some of the uses. llvm-svn: 110274	2010-08-04 23:03:04 +00:00
Bruno Cardoso Lopes	349165b48f	Support all 128-bit AVX vector intrinsics. Most part of them I already declared during the addition of the assembler support, the additional changes are: - Add missing intrinsics - Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file. - Duplicate some patterns to AVX mode. - Step into PCMPEST/PCMPIST custom inserter and add AVX versions. llvm-svn: 109878	2010-07-30 19:54:33 +00:00
Jakob Stoklund Olesen	ba0e124aaf	Revert r109652, and remove the offending assert in loadRegFromStackSlot instead. We do sometimes load from a too small stack slot when dealing with x86 arguments (varargs and smaller-than-32-bit args). It looks like we know what we are doing in those cases, so I am going to remove the assert instead of artifically enlarging stack slot sizes. The assert in storeRegToStackSlot stays in. We don't want to write beyond the bounds of a stack slot. llvm-svn: 109764	2010-07-29 17:42:27 +00:00
Jakob Stoklund Olesen	f2234fbe70	Create a fixed stack object for varargs that is as large as any register. The size of this object isn't used for anything - technically it is of variable size. This avoids a false positive from the assert in X86InstrInfo::loadRegFromStackSlot, and fixes PR7735. llvm-svn: 109652	2010-07-28 20:55:38 +00:00
Nate Begeman	53afc8f06a	Implement a vectorized algorithm for <16 x i8> << <16 x i8> This is about 4x faster and smaller than the existing scalarization. llvm-svn: 109566	2010-07-28 00:21:48 +00:00
Nate Begeman	269a6da023	~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches. For: define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp { entry: %shl = shl <4 x i32> %r, %a ; <<4 x i32>> [#uses=1] %tmp2 = bitcast <4 x i32> %shl to <2 x i64> ; <<2 x i64>> [#uses=1] ret <2 x i64> %tmp2 } We get: _shl: ## @shl pslld $23, %xmm1 paddd LCPI0_0, %xmm1 cvttps2dq %xmm1, %xmm1 pmulld %xmm1, %xmm0 ret Instead of: _shl: ## @shl pshufd $3, %xmm0, %xmm2 movd %xmm2, %eax pshufd $3, %xmm1, %xmm2 movd %xmm2, %ecx shll %cl, %eax movd %eax, %xmm2 pshufd $1, %xmm0, %xmm3 movd %xmm3, %eax pshufd $1, %xmm1, %xmm3 movd %xmm3, %ecx shll %cl, %eax movd %eax, %xmm3 punpckldq %xmm2, %xmm3 movd %xmm0, %eax movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm2 movhlps %xmm0, %xmm0 movd %xmm0, %eax movhlps %xmm1, %xmm1 movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm0 punpckldq %xmm0, %xmm2 movdqa %xmm2, %xmm0 punpckldq %xmm3, %xmm0 ret llvm-svn: 109549	2010-07-27 22:37:06 +00:00
Evan Cheng	d4218b8793	On x86, f32 / f64 nodes share the same registers as 128-bit vector values. llvm-svn: 109450	2010-07-26 21:50:05 +00:00
Evan Cheng	37b740c4bf	Add an ILP scheduler. This is a register pressure aware scheduler that's appropriate for targets without detailed instruction iterineries. The scheduler schedules for increased instruction level parallelism in low register pressure situation; it schedules to reduce register pressure when the register pressure becomes high. On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2 by 16%. llvm-svn: 109300	2010-07-24 00:39:05 +00:00
Dale Johannesen	f2d75670b7	The only supported calling convention for X86-64 uses SSE, so we can't return floating point values if this is disabled. Detect this error for clang. With SSE1 only, f64 is a problem; it can be done, but neither llvm-gcc nor clang has ever generated correct code for it. Since nobody noticed this I think it's OK to treat it as an error for now. This also handles SSE-sized vectors of floating point. 8207686, 8204109. llvm-svn: 109201	2010-07-23 00:30:35 +00:00
Eric Christopher	9a77382685	Custom lower the memory barrier instructions and add support for lowering without sse2. Add a couple of new testcases. Fixes a few libgomp tests and latent bugs. Remove a few todos. llvm-svn: 109078	2010-07-22 02:48:34 +00:00
Eric Christopher	a4c435f1fa	80-columns. llvm-svn: 109070	2010-07-22 00:26:08 +00:00
Nate Begeman	784e062b2a	Fix a couple issues with Win64 ABI 1) all registers were spilled as xmm, regardless of actual size 2) win64 abi doesn't do the varargs-size-in-%al thing Still to look into: xmm6-15 are marked as clobbered by call instructions on win64 even though they aren't. llvm-svn: 109035	2010-07-21 20:49:52 +00:00
Eric Christopher	d27913e516	Pulling out previous patch, must've run the tests in the wrong directory. llvm-svn: 109005	2010-07-21 09:23:56 +00:00
Eric Christopher	b2d1067024	Lower MEMBARRIER on x86 and support processors without SSE2. Fixes a pile of libgomp failures in the llvm-gcc testsuite due to the libcall not existing. llvm-svn: 109004	2010-07-21 09:05:23 +00:00
Evan Cheng	55f0c6b9fc	Split -enable-finite-only-fp-math to two options: -enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN. llvm-svn: 108465	2010-07-15 22:07:12 +00:00
Jakob Stoklund Olesen	9b449d5a92	Use TargetOpcode::COPY instead of X86-native register copy instructions when lowering atomics. This will allow those copies to still be coalesced after TII::isMoveInstr is removed. llvm-svn: 108385	2010-07-14 23:50:27 +00:00
Evan Cheng	a8e8874552	Fix for PR7193 was overly conservative. The only case where sibcall callee address cannot be allocated a register is in 32-bit mode where the first three arguments are marked inreg. In that case EAX, EDX, and ECX will be used for argument passing. This fixes PR7610. llvm-svn: 108327	2010-07-14 06:44:01 +00:00
Dan Gohman	d7b5ce3312	Reapply bottom-up fast-isel, with several fixes for x86-32: - Check getBytesToPopOnReturn(). - Eschew ST0 and ST1 for return values. - Fix the PIC base register initialization so that it doesn't ever fail to end up the top of the entry block. llvm-svn: 108039	2010-07-10 09:00:22 +00:00
Jakob Stoklund Olesen	be8d9b0bb8	An x86 function returns a floating point value in st(0), and we must make sure it is popped, even if it is ununsed. A CopyFromReg node is too weak to represent the required sideeffect, so insert an FpGET_ST0 instruction directly instead. This will matter when CopyFromReg gets lowered to a generic COPY instruction. llvm-svn: 108037	2010-07-10 04:04:25 +00:00
Bob Wilson	6586e9b203	--- Reverse-merging r107947 into '.': U utils/TableGen/FastISelEmitter.cpp --- Reverse-merging r107943 into '.': U test/CodeGen/X86/fast-isel.ll U test/CodeGen/X86/fast-isel-loads.ll U include/llvm/Target/TargetLowering.h U include/llvm/Support/PassNameParser.h U include/llvm/CodeGen/FunctionLoweringInfo.h U include/llvm/CodeGen/CallingConvLower.h U include/llvm/CodeGen/FastISel.h U include/llvm/CodeGen/SelectionDAGISel.h U lib/CodeGen/LLVMTargetMachine.cpp U lib/CodeGen/CallingConvLower.cpp U lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp U lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp U lib/CodeGen/SelectionDAG/FastISel.cpp U lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp U lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp U lib/CodeGen/SelectionDAG/InstrEmitter.cpp U lib/CodeGen/SelectionDAG/TargetLowering.cpp U lib/Target/XCore/XCoreISelLowering.cpp U lib/Target/XCore/XCoreISelLowering.h U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86ISelLowering.h llvm-svn: 107987	2010-07-09 16:37:18 +00:00
Dan Gohman	0a7d155d67	Fix the memoperand offsets in code generated for va_start. llvm-svn: 107948	2010-07-09 01:06:48 +00:00
Dan Gohman	0b5aa1cdd3	Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emitting a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL. llvm-svn: 107943	2010-07-09 00:39:23 +00:00
Chris Lattner	f469307c77	Change LEA to have 5 operands for its memory operand, just like all other instructions, even though a segment is not allowed. This resolves a bunch of gross hacks in the encoder and makes LEA more consistent with the rest of the instruction set. No functionality change. llvm-svn: 107934	2010-07-08 23:46:44 +00:00
Chris Lattner	ec536276f0	add some long-overdue enums to refer to the parts of the 5-operand X86 memory operand. llvm-svn: 107925	2010-07-08 22:41:28 +00:00
Dan Gohman	e75704369d	Revert 107840 107839 107813 107804 107800 107797 107791. Debug info intrinsics win for now. llvm-svn: 107850	2010-07-08 01:00:56 +00:00
Evan Cheng	1c349f18f8	Move getExtLoad() and (some) getLoad() DebugLoc argument after EVT argument for consistency sake. llvm-svn: 107820	2010-07-07 22:15:37 +00:00
Dan Gohman	2d4d01d0de	Add X86FastISel support for return statements. This entails refactoring a bunch of stuff, to allow the target-independent calling convention logic to be employed. llvm-svn: 107800	2010-07-07 18:32:53 +00:00
Dan Gohman	87fb4e8fcd	Simplify FastISel's constructor by giving it a FunctionLoweringInfo instance, rather than pointers to all of FunctionLoweringInfo's members. This eliminates an NDEBUG ABI sensitivity. llvm-svn: 107789	2010-07-07 16:29:44 +00:00
Dan Gohman	fe7532a308	Split the SDValue out of OutputArg so that SelectionDAG-independent code can do calling-convention queries. This obviates OutputArgReg. llvm-svn: 107786	2010-07-07 15:54:55 +00:00
Dale Johannesen	ce65663330	Accept RIP-relative symbols with 'i' constraint, and print the (%rip) only if the 'a' modifier is present. PR 7528. llvm-svn: 107727	2010-07-06 23:27:00 +00:00
Dan Gohman	ee0cb70381	CanLowerReturn doesn't need a SelectionDAG; it just needs an LLVMContext. SelectBasicBlock doesn't needs its BasicBlock argument. llvm-svn: 107712	2010-07-06 22:19:37 +00:00
Devang Patel	a3ca21b228	Propagate debug loc. llvm-svn: 107710	2010-07-06 22:08:15 +00:00
Dan Gohman	3439629239	Reapply r107655 with fixes; insert the pseudo instruction into the block before calling the expansion hook. And don't put EFLAGS in a mbb's live-in list twice. llvm-svn: 107691	2010-07-06 20:24:04 +00:00
Dan Gohman	f4f04107ef	Revert r107655. llvm-svn: 107668	2010-07-06 15:49:48 +00:00
Dan Gohman	12205645a6	Fix a bunch of custom-inserter functions to handle the case where the pseudo instruction is not at the end of the block. llvm-svn: 107655	2010-07-06 15:18:19 +00:00
Eric Christopher	2ad0c779c3	Fix up -fstack-protector on linux to use the segment registers. Split out testcases per architecture and os now. Patch from Nelson Elhage. llvm-svn: 107640	2010-07-06 05:18:56 +00:00
Eric Christopher	d429846eca	Have the X86 backend use Triple instead of a string and some enums. llvm-svn: 107625	2010-07-05 19:26:33 +00:00
Chris Lattner	c4a7073db3	more tidying. llvm-svn: 107615	2010-07-05 05:53:14 +00:00
Chris Lattner	45cc4d74a3	Just rip v2f32 support completely out of the X86 backend. In the example in the testcase, we now generate: _test1: ## @test1 movss 4(%esp), %xmm0 addss 8(%esp), %xmm0 movl 12(%esp), %eax movss %xmm0, (%eax) ret instead of: _test1: ## @test1 subl $20, %esp movl 24(%esp), %eax movq %mm0, (%esp) movq %mm0, 8(%esp) movss (%esp), %xmm0 addss 12(%esp), %xmm0 movss %xmm0, (%eax) addl $20, %esp ret v2f32 support did not work reliably because most of the X86 backend didn't know it was legal. It was apparently only added to support returning source-level v2f32 values in MMX registers in x86-32 mode. If ABI compatibility is important on this GCC-extended-vector type for some reason, then the frontend should generate IR that returns v2i32 instead of v2f32. However, we generally don't try very hard to be abi compatible on gcc extended vectors. llvm-svn: 107601	2010-07-04 23:07:25 +00:00
Chris Lattner	681b926d54	fix PR7518 - terrible codegen of <2 x float>, by only marking v2f32 as legal in 32-bit mode. It is just as terrible there, but I just care about x86-64 and noone claims it is valuable in 64-bit mode. llvm-svn: 107600	2010-07-04 22:57:10 +00:00
Evan Cheng	0664a67fe1	Remove isSS argument from CreateFixedObject. Fixed objects cannot be spill slots so it's always false. llvm-svn: 107550	2010-07-03 00:40:23 +00:00
Gabor Greif	12ca3d9fac	use ArgOperand API llvm-svn: 107280	2010-06-30 13:03:37 +00:00
Duncan Sands	67bfa9d109	Remove pointless and unused variables. llvm-svn: 107130	2010-06-29 12:48:49 +00:00
Bill Wendling	0a5bb081cc	Reduce indentation via early exit. NFC. llvm-svn: 107067	2010-06-28 21:08:32 +00:00
Gabor Greif	83205af3fa	use ArgOperand API llvm-svn: 106944	2010-06-26 11:51:52 +00:00
Dale Johannesen	ce97d55ad9	The hasMemory argument is irrelevant to how the argument for an "i" constraint should get lowered; PR 6309. While this argument was passed around a lot, this is the only place it was used, so it goes away from a lot of other places. llvm-svn: 106893	2010-06-25 21:55:36 +00:00
Bill Wendling	e41e40f689	- Reapply r106066 now that the bzip2 build regression has been fixed. - 2010-06-25-CoalescerSubRegDefDead.ll is the testcase for r106878. llvm-svn: 106880	2010-06-25 20:48:10 +00:00
Dale Johannesen	5ad5226c58	Disallow matching "i" constraint to symbol addresses when address requires a register or secondary load to compute (most PIC modes). This improves "g" constraint handling. 8015842. The test from 2007 is attempting to test the fix for PR1761, but since -relocation-model=static doesn't work on Darwin x86-64, it was not testing what it was supposed to be testing and was passing erroneously. Fixed to use Linux x86-64. llvm-svn: 106779	2010-06-24 20:14:51 +00:00
Dan Gohman	600f62b3ba	Reapply r106634, now that the bug it exposed is fixed. llvm-svn: 106746	2010-06-24 14:30:44 +00:00
Dan Gohman	c3e291c560	Fix a bug in the code which determines when it's safe to use the bt instruction, which was exposed by r106263. llvm-svn: 106718	2010-06-24 02:07:59 +00:00
Daniel Dunbar	4df321b7ad	Revert r106263, "Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass,"... it was causing both 'file' (with clang) and 176.gcc (with llvm-gcc) to be miscompiled. llvm-svn: 106634	2010-06-23 17:09:26 +00:00
Jim Grosbach	6f71039fa4	The generic DAG combiner can now fold atomic fences when needed, so switch to using that. llvm-svn: 106633	2010-06-23 16:25:07 +00:00
Daniel Dunbar	ef5a4383ad	Revert r106066, "Create a more targeted fix for not sinking instructions into a range where it"... it causes bzip2 to be miscompiled by Clang. Conflicts: lib/CodeGen/MachineSink.cpp llvm-svn: 106614	2010-06-23 00:48:25 +00:00
Jim Grosbach	6c275bc5a2	fix typo llvm-svn: 106574	2010-06-22 20:52:02 +00:00
Nick Lewycky	dcc7b6dcb6	Fix warning in no-asserts build. llvm-svn: 106405	2010-06-20 20:27:42 +00:00
Dan Gohman	92c11acdb8	Change UpdateNodeOperands' operand and return value from SDValue to SDNode *, since it doesn't care about the ResNo value. llvm-svn: 106282	2010-06-18 15:30:29 +00:00
Dan Gohman	c3479f5342	Delete unused variables. llvm-svn: 106280	2010-06-18 14:32:32 +00:00
Dan Gohman	f1d8304fe3	Eliminate unnecessary uses of getZExtValue(). llvm-svn: 106279	2010-06-18 14:22:04 +00:00
Dan Gohman	35b6f9a929	isValueValidForType can be a static member function. llvm-svn: 106278	2010-06-18 14:01:07 +00:00
Dan Gohman	b92156d5e4	Fold the ShrinkDemandedOps pass into the regular DAGCombiner pass, which is faster, simpler, and less surprising. llvm-svn: 106263	2010-06-18 01:05:21 +00:00
Bill Wendling	8c0cf0994d	Create a more targeted fix for not sinking instructions into a range where it will conflict with another live range. The place which creates this scenerio is the code in X86 that lowers a select instruction by splitting the MBBs. This eliminates the need to check from the bottom up in an MBB for live pregs. llvm-svn: 106066	2010-06-15 23:46:31 +00:00
Eric Christopher	6c4d63e1a5	For 32-bit non-pic tlv mach-o addressing we don't need a pic base or a relative address. llvm-svn: 106064	2010-06-15 23:08:42 +00:00
Eric Christopher	89d103a8ce	Ensure that mov and not lea are used to stick the address into the register. While we're at it, make sure it's in the right one. llvm-svn: 105645	2010-06-08 22:04:25 +00:00
Dale Johannesen	df1a7f83bf	Fix some liveout handling related to tail calls, see comments. I don't think this ever resulted in problems on x86, but it would on ARM. llvm-svn: 105509	2010-06-05 00:30:45 +00:00
Eric Christopher	b0e1a458ce	Add first pass at darwin tls compiler support. llvm-svn: 105381	2010-06-03 04:07:48 +00:00
Eli Friedman	6e3d5af945	Fix comment so it doesn't include comments which are irrelevant to the x86 backend. Add a FIXME noting what can be fixed here. llvm-svn: 105342	2010-06-02 19:35:46 +00:00
Dan Gohman	a690618c58	Use comments to document non-obvious code rather than mailing list archives. llvm-svn: 105341	2010-06-02 19:13:40 +00:00
Eli Friedman	526e6d045f	Don't try to custom-lower 64-bit add-with-overflow and friends on x86-32; the x86 backend currently doesn't know how to handle them. This doesn't really fix anything because LegalizeTypes doesn't know how to handle them either. We do get a better error message, though. llvm-svn: 105305	2010-06-02 00:27:18 +00:00
Evan Cheng	27c4933e02	Fix PR7193: if sibling call address can take a register, make sure there are enough registers available by counting inreg arguments. llvm-svn: 105092	2010-05-29 01:35:22 +00:00
Dale Johannesen	e8be73f3e7	Fix comment typos. llvm-svn: 105059	2010-05-28 23:24:28 +00:00
Dale Johannesen	9e43c07bc5	Mark some math lib intrinsic nodes Legal on SSE4.1. No functional effect as these nodes are not generated yet. llvm-svn: 104879	2010-05-27 20:12:41 +00:00
Dan Gohman	dc53f1cb5c	FastISel doesn't yet handle callee-pop functions. To support this, move IsCalleePop from X86ISelLowering to X86Subtarget. llvm-svn: 104866	2010-05-27 18:43:40 +00:00
Zhongxing Xu	730a977e02	SRetReturnReg was set in LowerFormalArguments(). So only assert it here. llvm-svn: 104691	2010-05-26 08:10:02 +00:00
Evan Cheng	168ced94d8	Implement @llvm.returnaddress. rdar://8015977. llvm-svn: 104421	2010-05-22 01:47:14 +00:00
Dale Johannesen	2b78565842	Previous commit message should refer to 104308. llvm-svn: 104337	2010-05-21 18:44:47 +00:00
Dale Johannesen	6361e3e8a2	Fix two bugs in 104348: Case where MMX is disabled wasn't handled right. MMX->MMX bitconverts are Legal. llvm-svn: 104336	2010-05-21 18:40:15 +00:00
Dale Johannesen	b3b9c8ac48	Fix i64->f64 conversion, x86-64, -no-sse. A bit tricky since there's a 3rd 64-bit type, MMX vectors. PR 7135. llvm-svn: 104308	2010-05-21 00:52:33 +00:00
Evan Cheng	738e920edf	Code refactoring: pull SchedPreference enum from TargetLowering.h to TargetMachine.h and put it in its own namespace. llvm-svn: 104147	2010-05-19 20:19:50 +00:00
Dale Johannesen	2ef974ee0e	Revert 103911; it broke a test that expects bitconvert <1xi64> -> i64 to work in MMX registers on hosts where -no-sse is the default (not mine). The right thing is to accept this and make i64->f64 conversions go through memory, but I don't have time right now. llvm-svn: 103914	2010-05-16 20:19:04 +00:00
Dale Johannesen	fc1492d71b	Make x86-64 64-bit bitconvert work when SSE is not available. (This worked as of about 6 months ago and I didn't track down exactly what broke it; I think this fix is appropriate.) llvm-svn: 103911	2010-05-16 18:22:38 +00:00
Anton Korobeynikov	8f35fabbc1	Add support for thiscall calling convention. Patch by Charles Davis and Steven Watanabe! llvm-svn: 103902	2010-05-16 09:08:45 +00:00
Dale Johannesen	3a366a88f2	Fix uint64->{float, double} conversion to do rounding correctly in 32-bit. The implementation in LegalizeIntegerTypes to handle this as sint64->float + appropriate power of 2 is subject to double rounding, considered incorrect by numerics people. Use this implementation only when it is safe. This leads to using library calls in some cases that produced inline code before, but it's correct now. (EVTToAPFloatSemantics belongs somewhere else, any suggestions?) Add a correctly rounding (though not particularly fast) conversion that uses X87 80-bit computations for x86-32. 7885399, 5901940. This shows up in gcc.c-torture/execute/ieee/rbug.c in the gcc testsuite on some platforms. llvm-svn: 103883	2010-05-15 18:51:12 +00:00
Bill Wendling	95f6ebcb37	Rename "HasCalls" in MachineFrameInfo to "AdjustsStack" to better describe what the variable actually tracks. N.B., several back-ends are using "HasCalls" as being synonymous for something that adjusts the stack. This isn't 100% correct and should be looked into. llvm-svn: 103802	2010-05-14 21:14:32 +00:00
Dan Gohman	35dd005d22	Lowering of atomic instructions can result in operands being used more than once. If ISel had put a kill flag on one of them, it's not valid to transfer the kill flag to each new instance. llvm-svn: 103799	2010-05-14 21:01:44 +00:00
Dan Gohman	bb919dfb6b	Implement a bunch more TargetSelectionDAGInfo infrastructure. Move EmitTargetCodeForMemcpy, EmitTargetCodeForMemset, and EmitTargetCodeForMemmove out of TargetLowering and into SelectionDAGInfo to exercise this. llvm-svn: 103481	2010-05-11 17:31:57 +00:00
Dan Gohman	25c1653700	Get rid of the EdgeMapping map. Instead, just check for BasicBlock changes before doing phi lowering for switches. llvm-svn: 102809	2010-05-01 00:01:06 +00:00
Dan Gohman	2e2cc87081	Make this code less confusing. Instead of reassigning BB, just operate on the original variables, so it's easier to see what is being done to which blocks. llvm-svn: 102759	2010-04-30 20:14:26 +00:00
Dan Gohman	57bb73c80b	Remove the -disable-16bit command-line option, which is now obsolete. llvm-svn: 102730	2010-04-30 18:30:26 +00:00
Evan Cheng	5117a555e0	Another sibcall bug. If caller and callee calling conventions differ, then it's only safe to do a tail call if the results are returned in the same way. llvm-svn: 102683	2010-04-30 01:12:32 +00:00
Evan Cheng	050df1b8de	Enable i16 to i32 promotion by default. llvm-svn: 102493	2010-04-28 08:30:49 +00:00
Evan Cheng	d21f564543	Unbreak the build. Only form shld / shrd after legalization. llvm-svn: 102488	2010-04-28 02:25:18 +00:00
Evan Cheng	347e3b8f15	Rather than having a ton of patterns for double shift instructions, e.g. SHLD16rrCL, just perform custom dag combine to form x86 specific dag so they match to the same pattern. This also makes sure later dag combine do not cause isel to miss them (e.g. promoting i16 to i32). llvm-svn: 102485	2010-04-28 01:18:01 +00:00
Stuart Hastings	c0458f1a40	Tweak x86 INC/DEC generation to look for CopyToReg or SETCC. Radar 7866163. llvm-svn: 102477	2010-04-28 00:35:10 +00:00
Evan Cheng	3b928af28f	SRA promotion is also not free. llvm-svn: 102456	2010-04-27 19:48:31 +00:00
Evan Cheng	6e45f1d1ff	Promoting 16-bit cmp / test aren't free. Don't do it. llvm-svn: 102366	2010-04-26 19:06:11 +00:00
Evan Cheng	ed69b382ea	- Move TargetLowering::EmitTargetCodeForFrameDebugValue to TargetInstrInfo and rename it to emitFrameIndexDebugValue. - Teach spiller to modify DBG_VALUE instructions to reference spill slots. llvm-svn: 102323	2010-04-26 07:38:55 +00:00
Dale Johannesen	582565e991	Stop abusing EmitInstrWithCustomInserter for target-dependent form of DEBUG_VALUE, as it doesn't have reasonable default behavior for unsupported targets. Add a new hook instead. No functional change. llvm-svn: 102320	2010-04-25 21:33:54 +00:00
Evan Cheng	a02d0e7d6b	Avoid promoting a i16 node if it would eliminate a (store (op (load))) opportunity. llvm-svn: 102237	2010-04-24 04:44:57 +00:00
Evan Cheng	0367559786	Fix X86ISD::CMP i16 to i32 promotion. llvm-svn: 102192	2010-04-23 18:21:16 +00:00
Dan Gohman	c594eab10f	Move HandlePHINodesInSuccessorBlocks functions out of SelectionDAGISel and into SelectionDAGBuilder and FastISel. llvm-svn: 102123	2010-04-22 20:46:50 +00:00

... 7 8 9 10 11 ...

2074 Commits