llvm-project

Commit Graph

Author	SHA1	Message	Date
Evan Cheng	104dbb0fd1	For x86, canonicalize max (x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512	2012-01-04 01:41:39 +00:00
Chad Rosier	6ca97df951	Fix 80-column violations. llvm-svn: 147495	2012-01-03 23:19:12 +00:00
Nadav Rotem	6d31bac85e	Revert 147426 because it caused pr11696. llvm-svn: 147485	2012-01-03 22:19:42 +00:00
Chad Rosier	493c1b3152	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Craig Topper	5bacb7e9e5	Miscellaneous shuffle lowering cleanup. No functional changes. Primarily converting the indexing loops to unsigned to be consistent across functions. llvm-svn: 147430	2012-01-02 09:17:37 +00:00
Craig Topper	53d559641f	Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection. llvm-svn: 147428	2012-01-02 08:46:48 +00:00
Nadav Rotem	6c7a0e6c8b	Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426	2012-01-02 08:05:46 +00:00
Craig Topper	6e54ba7eee	Merge X86 SHUFPS and SHUFPD node types. llvm-svn: 147394	2011-12-31 23:50:21 +00:00
Craig Topper	0fdf720ded	Make LowerBUILD_VECTOR keep node vector types consistent when creating MOVL for v16i16 and v32i8. llvm-svn: 147337	2011-12-29 03:34:54 +00:00
Craig Topper	862c9b65be	Remove some elses after returns. llvm-svn: 147336	2011-12-29 03:20:51 +00:00
Craig Topper	274e20a499	Remove trailing spaces. Fix an assert to use && instead of \|\| before string. Add same assert on similar code path. llvm-svn: 147335	2011-12-29 03:09:33 +00:00
Eli Friedman	3a01ddb7e9	Fix type-checking for load transformation which is not legal on floating-point types. PR11674. llvm-svn: 147323	2011-12-28 21:24:44 +00:00
Elena Demikhovsky	b3515a8d4b	Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. Matching MOVLP mask for AVX (265-bit vectors) was wrong. The failure was detected by conformance tests. llvm-svn: 147308	2011-12-28 08:14:01 +00:00
Craig Topper	df34d152bd	Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for consistency. Add comments and an assert for BMI instructions to PerformXorCombine since the enabling of the combine is conditional on it, but the function itself isn't. llvm-svn: 147287	2011-12-27 06:27:23 +00:00
Chandler Carruth	a3d54fe0ae	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	38ce24455d	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Chandler Carruth	c9fcde2347	Expand more when we have a nice 'tzcnt' instruction, to avoid generating 'bsf' instructions here. This one is actually debatable to my eyes. It's not clear that any chip implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding. Still, this restores the old behavior with 'tzcnt' enabled for now. llvm-svn: 147246	2011-12-24 11:11:38 +00:00
Chandler Carruth	7e9453e916	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chad Rosier	00bbedff03	Fix 80-column violations. llvm-svn: 147192	2011-12-22 22:35:21 +00:00
Chad Rosier	3ede414127	No case stmt for BUILD_VECTOR in PerformDAGCombine(), so I assume this isn't necessary. Please chime in if I'm mistaken. llvm-svn: 147065	2011-12-21 19:14:52 +00:00
Chandler Carruth	24680c24d8	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Benjamin Kramer	1b54835a10	Another variadics tweak. llvm-svn: 146852	2011-12-18 20:51:31 +00:00
Benjamin Kramer	530b820500	Use the fancy new VariadicFunction template instead of a plain variadic function. Some compilers were complaining about passing StringRef to it. llvm-svn: 146850	2011-12-18 19:59:20 +00:00
Craig Topper	a913dde0ef	Remove an unused X86ISD node type. llvm-svn: 146833	2011-12-17 19:16:44 +00:00
Benjamin Kramer	792edd3c75	X86: Factor the bswap asm matching to be slightly less horrible to read. llvm-svn: 146831	2011-12-17 14:36:05 +00:00
Lang Hames	da07b3ad42	Make sure that the lower bits on the VSELECT condition are properly set. llvm-svn: 146800	2011-12-17 01:08:46 +00:00
Craig Topper	a4d411cb1b	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00
Chad Rosier	75ed9dcbc6	Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146684	2011-12-15 21:34:44 +00:00
Lang Hames	c44b5e469b	Fix VSELECT operand order. Was previously backwards, causing bogus vector shift results - <rdar://problem/10559581>. llvm-svn: 146671	2011-12-15 18:57:27 +00:00
Chad Rosier	b7a0b89ff0	Use SmallVector/assign(), rather than std::vector/push_back(). llvm-svn: 146627	2011-12-15 01:16:09 +00:00
Chad Rosier	1940baa76b	Add support for lowering fneg when AVX is enabled. rdar://10566486 llvm-svn: 146625	2011-12-15 01:02:25 +00:00
Chandler Carruth	637cc6a8aa	Initial CodeGen support for CTTZ/CTLZ where a zero input produces an undefined result. This adds new ISD nodes for the new semantics, selecting them when the LLVM intrinsic indicates that the undef behavior is desired. The new nodes expand trivially to the old nodes, so targets don't actually need to do anything to support these new nodes besides indicating that they should be expanded. I've done this for all the operand types that I could figure out for all the targets. Owners of various targets, please review and let me know if any of these are incorrect. Note that the expand behavior is conservatively correct, and exactly matches LLVM's current behavior with these operations. Ideally this patch will not change behavior in any way. For example the regtest suite finds the exact same instruction sequences coming out of the code generator. That's why there are no new tests here -- all of this is being exercised by the existing test suite. Thanks to Duncan Sands for reviewing the various bits of this patch and helping me get the wrinkles ironed out with expanding for each target. Also thanks to Chris for clarifying through all the discussions that this is indeed the approach he was looking for. That said, there are likely still rough spots. Further review much appreciated. llvm-svn: 146466	2011-12-13 01:56:10 +00:00
Craig Topper	1fdfec63a4	Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast. llvm-svn: 146344	2011-12-11 19:12:35 +00:00
Benjamin Kramer	16bbfbec66	X86: Add patterns for the various rounding ops for SSE4.1 and AVX. llvm-svn: 146257	2011-12-09 15:44:03 +00:00
Owen Anderson	57a7f41d5d	Don't explicitly marked libm rounding ops as legal on SSE4.1/AVX. There don't seem to be patterns for these, so I don't know why they were marked legal in the first place. Fixes failures caused by r146171. llvm-svn: 146180	2011-12-08 20:51:38 +00:00
Owen Anderson	0b9b9da6c8	Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise. llvm-svn: 146171	2011-12-08 19:32:14 +00:00
Craig Topper	83320e03e6	Add X86ISD::HADD/HSUB to getTargetNodeName llvm-svn: 145929	2011-12-06 09:31:36 +00:00
Craig Topper	8d4ba198d6	Merge floating point and integer UNPCK X86ISD node types. llvm-svn: 145926	2011-12-06 08:21:25 +00:00
Craig Topper	3cb802c775	Clean up some of the shuffle decoding code for UNPCK instructions. Add instruction commenting for AVX/AVX2 forms for integer UNPCKs. llvm-svn: 145924	2011-12-06 05:31:16 +00:00
Craig Topper	bf41eb3a98	Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted. llvm-svn: 145921	2011-12-06 04:59:07 +00:00
Jakob Stoklund Olesen	10e1252269	Use logarithmic units for basic block alignment. This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly documented as taking log2(bytes) units, but the x86 target would still set a preferred loop alignment of '16'. CodePlacementOpt passed this number on to the basic block, and AsmPrinter interpreted it as bytes. Now both MachineFunction and MachineBasicBlock use logarithmic alignments. Obviously, MachineConstantPool still measures alignments in bytes, so we can emulate the thrill of using as. llvm-svn: 145889	2011-12-06 01:26:19 +00:00
Craig Topper	51bec1a37a	Remove some leftover remnants that once tried to create 64-bit MMX PALIGNR instructions. llvm-svn: 145804	2011-12-05 07:27:14 +00:00
Craig Topper	6a55b1dd9f	Clean up and optimizations to the X86 shuffle lowering code. No functional change. llvm-svn: 145803	2011-12-05 06:56:46 +00:00
Nick Lewycky	50f02cb21b	Move global variables in TargetMachine into new TargetOptions class. As an API change, now you need a TargetOptions object to create a TargetMachine. Clang patch to follow. One small functionality change in PTX. PTX had commented out the machine verifier parts in their copy of printAndVerify. That now calls the version in LLVMTargetMachine. Users of PTX who need verification disabled should rely on not passing the command-line flag to enable it. llvm-svn: 145714	2011-12-02 22:16:29 +00:00
Craig Topper	b67440367f	Reduce duplicate code in isHorizontalBinOp and add some asserts to protect assumptions llvm-svn: 145681	2011-12-02 08:18:41 +00:00
Craig Topper	abeb79eee3	Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors. llvm-svn: 145680	2011-12-02 07:16:01 +00:00
Nadav Rotem	96923cc2bb	X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479). llvm-svn: 145488	2011-11-30 10:13:37 +00:00
Craig Topper	c4977ba413	Add instruction selection support for AVX2 horizontal add/sub instructions. llvm-svn: 145487	2011-11-30 09:10:50 +00:00
Craig Topper	0a672eaf9e	Merge VPERM2F128/VPERM2I128 ISD node types. llvm-svn: 145485	2011-11-30 07:47:51 +00:00
Craig Topper	bafd224c8b	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483	2011-11-30 06:25:25 +00:00
Craig Topper	c16db840be	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	818a983e93	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Craig Topper	b0456936da	Make isCommutedVSHUFP more like the way isCommutedSHUFP is handled. llvm-svn: 145218	2011-11-28 01:14:24 +00:00
Craig Topper	79ee88a511	Merge detecting and handling for VSHUFPSY and VSHUFPDY since a lot of the code was similar for both. llvm-svn: 145199	2011-11-27 21:41:12 +00:00
Craig Topper	51280d565b	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153	2011-11-26 22:55:48 +00:00
Craig Topper	7704bd7ac3	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. llvm-svn: 145148	2011-11-26 20:47:44 +00:00
Craig Topper	d65a444478	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Craig Topper	d26466748b	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. llvm-svn: 145125	2011-11-24 22:20:08 +00:00
Benjamin Kramer	ebcb451874	X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. Before: movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00] testq %rax, %rdi ## encoding: [0x48,0x85,0xf8] jne LBB0_2 ## encoding: [0x75,A] After: btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20] jb LBB0_2 ## encoding: [0x72,A] btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off saving one register and a giant movabsq. llvm-svn: 145103	2011-11-23 13:54:17 +00:00
Elena Demikhovsky	779ba6d7b7	I added several lines in X86 code generator that allow to choose VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask. The patch was reviewed by Bruno. llvm-svn: 145099	2011-11-23 10:23:16 +00:00
Craig Topper	ccb7097509	Fix shuffle decoding logic to handle UNPCKLPS/UNPCKLPD on 256-bit vectors correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms. llvm-svn: 145055	2011-11-22 01:57:35 +00:00
Craig Topper	f563977795	Add methods for querying minimum SSE version along with AVX. Simplifies all the places that had to check a version of SSE and AVX. llvm-svn: 145053	2011-11-22 00:44:41 +00:00
Craig Topper	6270d072c5	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	669199ca94	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. llvm-svn: 145026	2011-11-21 06:57:39 +00:00
Craig Topper	a065238c6e	Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled. llvm-svn: 145022	2011-11-21 01:12:36 +00:00
Craig Topper	e79761df73	Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine. llvm-svn: 145005	2011-11-20 00:12:05 +00:00
Craig Topper	a3a6583694	Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. llvm-svn: 145004	2011-11-19 22:34:59 +00:00
Craig Topper	3af6ae089f	Custom lower AVX2 variable shift intrinsics to shl/srl/sra nodes and remove the intrinsic patterns. llvm-svn: 144999	2011-11-19 17:46:46 +00:00
Craig Topper	f984efbfce	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. llvm-svn: 144989	2011-11-19 09:02:40 +00:00
Craig Topper	81390be00f	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. llvm-svn: 144988	2011-11-19 07:33:10 +00:00
Craig Topper	de6b73bb4d	Extend VPBLENDVB and VPSIGN lowering to work for AVX2. llvm-svn: 144987	2011-11-19 07:07:26 +00:00
Nadav Rotem	1ec141d0f9	Add AVX2 vpbroadcast support llvm-svn: 144967	2011-11-18 02:49:55 +00:00
Nadav Rotem	37010002f2	AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code. llvm-svn: 144720	2011-11-15 22:50:37 +00:00
Pete Cooper	7c7ba1baa1	Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used by later instructions. Only done for DEC64m right now. Fixes <rdar://problem/6172640> llvm-svn: 144705	2011-11-15 21:57:53 +00:00
Jay Foad	0745e645e0	Remove some unnecessary includes of PseudoSourceValue.h. llvm-svn: 144631	2011-11-15 07:24:32 +00:00
Pete Cooper	890e02e854	Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered Constant idx case is still done in tablegen but other cases are then expanded Fixes <rdar://problem/10435460> llvm-svn: 144557	2011-11-14 19:38:42 +00:00
Craig Topper	a331515c82	Add neverHasSideEffects, mayLoad, and mayStore to many patternless SSE/AVX instructions. Remove MMX check from LowerVECTOR_SHUFFLE since MMX vector types won't go through it anyway. llvm-svn: 144522	2011-11-14 06:46:21 +00:00
Craig Topper	b8bcb473e2	Add BLSI, BLSMSK, and BLSR to getTargetNodeName. llvm-svn: 144502	2011-11-13 17:31:07 +00:00
Craig Topper	3dc75f9e3b	Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code. llvm-svn: 144457	2011-11-12 09:58:49 +00:00
Craig Topper	ea28a34c43	Add lowering for AVX2 shift instructions. llvm-svn: 144380	2011-11-11 07:39:23 +00:00
Nadav Rotem	1938482bfa	AVX2: Add patterns for variable shift operations llvm-svn: 144212	2011-11-09 21:22:13 +00:00
Nadav Rotem	79135d844d	Add AVX2 support for vselect of v32i8 llvm-svn: 144187	2011-11-09 13:21:28 +00:00
Craig Topper	c9eb09d3b8	Add instruction selection for AVX2 integer comparisons. llvm-svn: 144176	2011-11-09 08:06:13 +00:00
Craig Topper	8c8a431057	Add AVX2 instruction lowering for add, sub, and mul. llvm-svn: 144174	2011-11-09 07:28:55 +00:00
Pete Cooper	82cd9e81fc	Added invariant field to the DAG.getLoad method and changed all calls. When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses llvm-svn: 144100	2011-11-08 18:42:53 +00:00
Evan Cheng	91b56e0390	Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222 llvm-svn: 144052	2011-11-08 00:31:58 +00:00
Dan Gohman	198b7ffc11	Reapply r143206, with fixes. Disallow physical register lifetimes across calls, and only check for nested dependences on the special call-sequence-resource register. llvm-svn: 143660	2011-11-03 21:49:52 +00:00
Eli Friedman	3f5eccbe7a	Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289. llvm-svn: 143498	2011-11-01 21:18:39 +00:00
Benjamin Kramer	7402ee6ec2	X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in. llvm-svn: 143315	2011-10-30 17:31:21 +00:00
Nadav Rotem	c602b2c4de	Fix pr11266. On x86: (shl V, 1) -> add V,V Hardware support for vector-shift is sparse and in many cases we scalarize the result. Additionally, on sandybridge padd is faster than shl. llvm-svn: 143311	2011-10-30 13:24:22 +00:00
Dan Gohman	9b9c970148	Revert r143206, as there are still some failing tests. llvm-svn: 143262	2011-10-29 00:41:52 +00:00
Dan Gohman	73057ad24f	Reapply r143177 and r143179 (reverting r143188), with scheduler fixes: Use a separate register, instead of SP, as the calling-convention resource, to avoid spurious conflicts with actual uses of SP. Also, fix unscheduling of calling sequences, which can be triggered by pseudo-two-address dependencies. llvm-svn: 143206	2011-10-28 17:55:38 +00:00
Duncan Sands	225a7037d6	Speculatively disable Dan's commits 143177 and 143179 to see if it fixes the dragonegg self-host (it looks like gcc is miscompiled). Original commit messages: Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW on every node as it legalizes them. This makes it easier to use hasOneUse() heuristics, since unneeded nodes can be removed from the DAG earlier. Make LegalizeOps visit the DAG in an operands-last order. It previously used operands-first, because LegalizeTypes has to go operands-first, and LegalizeTypes used to be part of LegalizeOps, but they're now split. The operands-last order is more natural for several legalization tasks. For example, it allows lowering code for nodes with floating-point or vector constants to see those constants directly instead of seeing the lowered form (often constant-pool loads). This makes some things somewhat more complicated today, though it ought to allow things to be simpler in the future. It also fixes some bugs exposed by Legalizing using RAUW aggressively. Remove the part of LegalizeOps that attempted to patch up invalid chain operands on libcalls generated by LegalizeTypes, since it doesn't work with the new LegalizeOps traversal order. Instead, define what LegalizeTypes is doing to be correct, and transfer the responsibility of keeping calls from having overlapping calling sequences into the scheduler. Teach the scheduler to model callseq_begin/end pairs as having a physical register definition/use to prevent calls from having overlapping calling sequences. This is also somewhat complicated, though there are ways it might be simplified in the future. This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others. Please direct high-level questions about this patch to management. Delete #if 0 code accidentally left in. llvm-svn: 143188	2011-10-28 09:55:57 +00:00
Dan Gohman	4db3f7dd83	Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW on every node as it legalizes them. This makes it easier to use hasOneUse() heuristics, since unneeded nodes can be removed from the DAG earlier. Make LegalizeOps visit the DAG in an operands-last order. It previously used operands-first, because LegalizeTypes has to go operands-first, and LegalizeTypes used to be part of LegalizeOps, but they're now split. The operands-last order is more natural for several legalization tasks. For example, it allows lowering code for nodes with floating-point or vector constants to see those constants directly instead of seeing the lowered form (often constant-pool loads). This makes some things somewhat more complicated today, though it ought to allow things to be simpler in the future. It also fixes some bugs exposed by Legalizing using RAUW aggressively. Remove the part of LegalizeOps that attempted to patch up invalid chain operands on libcalls generated by LegalizeTypes, since it doesn't work with the new LegalizeOps traversal order. Instead, define what LegalizeTypes is doing to be correct, and transfer the responsibility of keeping calls from having overlapping calling sequences into the scheduler. Teach the scheduler to model callseq_begin/end pairs as having a physical register definition/use to prevent calls from having overlapping calling sequences. This is also somewhat complicated, though there are ways it might be simplified in the future. This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others. Please direct high-level questions about this patch to management. llvm-svn: 143177	2011-10-28 01:29:32 +00:00
Lang Hames	58dba012b6	Rename NonScalarIntSafe to something more appropriate. llvm-svn: 143080	2011-10-26 23:50:43 +00:00
Rafael Espindola	b3285224cd	Fixes an issue reported by -verify-machineinstrs. Patch by Sanjoy Das. llvm-svn: 143064	2011-10-26 21:16:41 +00:00
Nadav Rotem	e649d66552	Fix pr11193. SHL inserts zeros from the right, thus even when the original sign_extend_inreg value was of 1-bit, we need to sra. llvm-svn: 142724	2011-10-22 12:39:25 +00:00
Craig Topper	039a79067a	Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code. llvm-svn: 142642	2011-10-21 06:55:01 +00:00
Evan Cheng	54d678fff4	Fix TLS lowering bug. The CopyFromReg must be glued to the TLSCALL. rdar://10291355 llvm-svn: 142550	2011-10-19 22:22:54 +00:00
Duncan Sands	d278d35b13	Fix a bunch of unused variable warnings when doing a release build with gcc-4.6. llvm-svn: 142350	2011-10-18 12:44:00 +00:00
Benjamin Kramer	5fb5e3b384	SmallVector -> array llvm-svn: 142073	2011-10-15 13:28:31 +00:00
Craig Topper	965de2c197	Add X86 ANDN instruction. Including instruction selection. llvm-svn: 141947	2011-10-14 07:06:56 +00:00
Craig Topper	3657fe4b17	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141939	2011-10-14 03:21:46 +00:00
Bill Wendling	063f55ffdd	Revert r141854 because it was causing failures: http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101 --- Reverse-merging r141854 into '.': U test/MC/Disassembler/X86/x86-32.txt U test/MC/Disassembler/X86/simple-tests.txt D test/CodeGen/X86/bmi.ll U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86.td U lib/Target/X86/X86Subtarget.h llvm-svn: 141857	2011-10-13 07:48:07 +00:00
Craig Topper	8cc9388073	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141854	2011-10-13 07:09:14 +00:00
Craig Topper	271064e873	Add X86 LZCNT instruction. Including instruction selection support. llvm-svn: 141651	2011-10-11 06:44:02 +00:00
Eli Friedman	8ec0897db6	Make sure the X86 backend doesn't explode on 128-bit shuffles in AVX mode. Fixes PR11102. llvm-svn: 141585	2011-10-10 22:28:47 +00:00
Nadav Rotem	814598563f	Fix 10892 - When lowering SIGN_EXTEND_INREG do not lower v2i64 because the instruction set has no 64-bit SRA support. llvm-svn: 141570	2011-10-10 19:31:45 +00:00
Evan Cheng	74db300f37	High bits of movmskp{s\|d} and pmovmskb are known zero. rdar://10247336 llvm-svn: 141371	2011-10-07 17:21:44 +00:00
Eli Friedman	2fb357a5b0	PR11033: Make sure we don't generate PCMPGTQ and PCMPEQQ if the target CPU does not support them. llvm-svn: 140723	2011-09-28 21:00:25 +00:00
Duncan Sands	a54fd541c2	Implement Chris's suggestion of legalizing the various SSE and AVX hadd/hsub intrinsics into the new fhadd/fhsub X86 node. llvm-svn: 140383	2011-09-23 16:10:22 +00:00
Duncan Sands	0e4fcb8e3b	Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from floating point add/sub of appropriate shuffle vectors. Does not synthesize the 256 bit AVX versions because they work differently. llvm-svn: 140332	2011-09-22 20:15:48 +00:00
Benjamin Kramer	cfd26cd744	The SSE version differences for fmin/fmax are more involved than I thought. - x87: no min or max. - SSE1: min/max for single precision scalars and vectors. - SSE2: min/max for single and double precision scalars and vectors. - AVX: as SSE2, but also supports the wider ymm vectors. (this is covered by the isTypeLegal check) llvm-svn: 140296	2011-09-22 03:27:22 +00:00
Benjamin Kramer	dc397a6402	X86: Don't form min/max nodes if the target is missing SSE. llvm-svn: 140294	2011-09-22 03:01:42 +00:00
Nadav Rotem	50f123d8e5	fix comment llvm-svn: 140258	2011-09-21 17:14:40 +00:00
Nadav Rotem	c1cd8506ce	Insert a sanity check on the combining of x86 truncing-store nodes. This comes to replace the problematic check that was removed in r139995. llvm-svn: 140246	2011-09-21 08:45:10 +00:00
Richard Trieu	a318b8dce6	Change: assert(!"error message"); To: assert(0 && "error message"); which is more consistant across the code base. llvm-svn: 140234	2011-09-21 03:09:09 +00:00
Bruno Cardoso Lopes	f7638e1e51	Simplify max/minp[s\|d] dagcombine matching llvm-svn: 140199	2011-09-20 22:34:45 +00:00
Craig Topper	68c92d86da	Extend changes from r139986 to produce 256-bit AVX minps/minpd/maxps/maxpd. llvm-svn: 140140	2011-09-20 07:38:59 +00:00
Nadav Rotem	763c11cc12	Fix typos in my prev commit, found by Tobi. llvm-svn: 140003	2011-09-18 19:00:23 +00:00
Nadav Rotem	261a10a007	setOperationAction should be done on the return value of the type, not the operands. llvm-svn: 140001	2011-09-18 14:57:03 +00:00
Nadav Rotem	7ae11279e9	When promoting integer vectors we often create ext-loads. This patch adds a dag-combine optimization to implement the ext-load efficiently (using shuffles). For example the type <4 x i8> is stored in memory as i32, but it needs to find its way into a <4 x i32> register. Previously we scalarized the memory access, now we use shuffles. llvm-svn: 139995	2011-09-18 10:39:32 +00:00
Craig Topper	d9d01917ee	Fix typo by changing Lower256IntVETCC to Lower256IntVSETCC. llvm-svn: 139993	2011-09-18 08:03:58 +00:00
Duncan Sands	f2b8c854dd	Synthesize x86 max/min instructions also for vectors (i.e. produce maxps and maxpd). This broke the sse41-blend.ll testcase by causing maxpd to be produced rather than a cmp+blend pair, which is the reason I tweaked it. Gives a small speedup on doduc with dragonegg when the GCC vectorizer is used. llvm-svn: 139986	2011-09-17 16:49:39 +00:00
Bruno Cardoso Lopes	fa1ca3070b	Change all checks regarding the presence of any SSE level to always take into consideration the presence of AVX. This change, together with the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully) emit the same code as SSE for 128-bit vector ops. I don't have a testcase for this, but AVX now beats SSE in performance for 128-bit ops in the majority of programas in the llvm testsuite llvm-svn: 139817	2011-09-15 18:27:36 +00:00
Eli Friedman	da5f010177	Fix the code creating VZEXT_LOAD so that it creates the right memoperand. Issue spotted in -debug output. I can't think of any practical effects at the moment, but it might matter if we start doing more aggressive alias analysis in CodeGen. llvm-svn: 139758	2011-09-14 23:42:45 +00:00
Bruno Cardoso Lopes	333a59eced	Vector shuffle mask <i32 4, i32 5, i32 2, i32 3> should yield "movsd", not "movss". llvm-svn: 139686	2011-09-14 02:36:14 +00:00
Bruno Cardoso Lopes	56d9b51caf	Revert the remaining part of r139528. According to PR10907 the bug seems to be in the VSELECT operands order, so I'll leave the fix for Nadav. llvm-svn: 139624	2011-09-13 19:33:00 +00:00
Nadav Rotem	52202fbf2d	Add vselect target support for targets that do not support blend but do support xor/and/or (For example SSE2). llvm-svn: 139623	2011-09-13 19:17:42 +00:00
Bruno Cardoso Lopes	973d2921e8	Revert the wrong part of r139528, and fix testcases. llvm-svn: 139541	2011-09-12 21:24:07 +00:00
Bruno Cardoso Lopes	be7a086f58	Not sure how CMPPS and CMPPD had already ever worked, I guess it didn't. However with this fix it does now. Basically the operand order for the x86 target specific node is not the same as the instruction, but since the intrinsic need that specific order at the instruction definition, just change the order during legalization. Also, there were some wrong invertions of condition codes, such as GE => LE, GT => LT, fix that too. Fix PR10907. llvm-svn: 139528	2011-09-12 19:30:40 +00:00
Nadav Rotem	b873b18721	CR fixes per Bruno's request. Undo the changes from r139285 which added custom lowering to vselect. Add tablegen lowering for vselect. llvm-svn: 139479	2011-09-11 15:02:23 +00:00
Eli Friedman	7f50e00203	r139454 activates an assert in a case where we were doing the right thing anyway. Make that explicit, and un-XFAIL the testcase. llvm-svn: 139458	2011-09-10 02:01:42 +00:00
Richard Trieu	d9917bef6c	Fixed an assert from: assert("not implemented for target shuffle node"); to: assert(0 && "not implemented for target shuffle node"); This causes a test failure in CodeGen/X86/palignr.ll which has been marked as XFAIL for the time being. Test failure filed at PR10901. llvm-svn: 139454	2011-09-10 01:26:21 +00:00
Nadav Rotem	de838daefd	Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type llvm-svn: 139400	2011-09-09 20:29:17 +00:00
Nadav Rotem	b5df62036b	Dix the 80-columns and remove unsupported v8i16 type from the list of legal vselect types. llvm-svn: 139324	2011-09-08 22:17:35 +00:00
Bruno Cardoso Lopes	fb113a0051	Add AVX versions of blend vector operations and fix some issues noticed in Nadav's r139285 and r139287 commits. 1) Rename vsel.ll to a more descriptive name 2) Change the order of BLEND operands to "Op1, Op2, Cond", this is necessary because PBLENDVB is already used in different places with this order, and it was being emitted in the wrong way for vselect 3) Add AVX patterns and tests for the same SSE41 instructions llvm-svn: 139305	2011-09-08 18:05:08 +00:00
Nadav Rotem	2550ba2a27	Add X86-SSE4 codegen support for vector-select. llvm-svn: 139285	2011-09-08 08:11:19 +00:00
Duncan Sands	f2641e1bc1	Add codegen support for vector select (in the IR this means a select with a vector condition); such selects become VSELECT codegen nodes. This patch also removes VSETCC codegen nodes, unifying them with SETCC nodes (codegen was actually often using SETCC for vector SETCC already). This ensures that various DAG combiner optimizations kick in for vector comparisons. Passes dragonegg bootstrap with no testsuite regressions (nightly testsuite as well as "make check-all"). Patch mostly by Nadav Rotem. llvm-svn: 139159	2011-09-06 19:07:46 +00:00
Rafael Espindola	db5823dc77	Fix style issues and typos found by Duncan. llvm-svn: 139154	2011-09-06 18:43:08 +00:00
Duncan Sands	a098436b32	Split the init.trampoline intrinsic, which currently combines GCC's init.trampoline and adjust.trampoline intrinsics, into two intrinsics like in GCC. While having one combined intrinsic is tempting, it is not natural because typically the trampoline initialization needs to be done in one function, and the result of adjust trampoline is needed in a different (nested) function. To get around this llvm-gcc hacks the nested function lowering code to insert an additional parent variable holding the adjust.trampoline result that can be accessed from the child function. Dragonegg doesn't have the luxury of tweaking GCC code, so it stored the result of adjust.trampoline in the memory GCC set aside for the trampoline itself (this is always available in the child function), and set up some new memory (using an alloca) to hold the trampoline. Unfortunately this breaks Go which allocates trampoline memory on the heap and wants to use it even after the parent has exited (!). Rather than doing even more hacks to get Go working, it seemed best to just use two intrinsics like in GCC. Patch mostly by Sanjoy Das. llvm-svn: 139140	2011-09-06 13:37:06 +00:00
Jakob Stoklund Olesen	d0c8a31c8b	Use existing function. llvm-svn: 139055	2011-09-02 23:52:49 +00:00
Jakob Stoklund Olesen	38019e3188	Remove unused variables. llvm-svn: 139047	2011-09-02 22:41:25 +00:00
Bruno Cardoso Lopes	f61d1c072e	Fix vbroadcast matching logic to early unmatch if the node doesn't have only one use. Fix PR10825. llvm-svn: 138951	2011-09-01 18:15:06 +00:00
Eric Christopher	72d1d5e193	Rework this conditional a bit. Patch by Sanjoy Das llvm-svn: 138853	2011-08-31 04:17:21 +00:00
Bruno Cardoso Lopes	9fc6b8be03	- Move all MOVSS and MOVSD patterns close to their definitions - Duplicate some store patterns to their AVX forms! - Catched a bug while restricting the patterns subtarget, fix it and update a testcase to check it properly llvm-svn: 138851	2011-08-31 03:04:20 +00:00
Bruno Cardoso Lopes	db520db514	Teach more places to use VMOVAPS,VMOVUPS instead of MOVAPS,MOVUPS, whenever AVX is enabled. llvm-svn: 138849	2011-08-31 03:04:09 +00:00
Evan Cheng	cb1e5bae4c	Fix (movhps load) lowering / pattern to match more cases. rdar://10050549 llvm-svn: 138848	2011-08-31 02:05:24 +00:00
Rafael Espindola	94d3253626	Adds support for variable sized allocas. For a variable sized alloca, code is inserted to first check if the current stacklet has enough space. If so, space is allocated by simply decrementing the stack pointer. Otherwise a runtime routine (__morestack_allocate_stack_space in libgcc) is called which allocates the required memory from the heap. Patch by Sanjoy Das. llvm-svn: 138818	2011-08-30 19:47:04 +00:00
Rafael Espindola	3353017668	Adds a SelectionDAG node X86SegAlloca which will be custom lowered from DYNAMIC_STACKALLOC. Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which will match X86SegAlloca (based on word size) are also added. They will be custom emitted to inject the actual stack handling code. Patch by Sanjoy Das. llvm-svn: 138814	2011-08-30 19:43:21 +00:00
Rafael Espindola	c21742112b	Emit segmented-stack specific code into function prologues for X86. Modify the pass added in the previous patch to call this new code. This new prologues generated will call a libgcc routine (__morestack) to allocate more stack space from the heap when required Patch by Sanjoy Das. llvm-svn: 138812	2011-08-30 19:39:58 +00:00
Eli Friedman	850b9a9a84	Explicitly zero out parts of a vector which are required to be zero by the algorithm in LowerUINT_TO_FP_i32. This only has a substantial effect on the generated code when the input is extracted from a vector register; other ways of loading an i32 do the appropriate zeroing implicitly. Fixes PR10802. llvm-svn: 138768	2011-08-29 21:15:46 +00:00
Benjamin Kramer	61a1ff543c	Silence GCC warnings and make an array const. llvm-svn: 138706	2011-08-27 17:36:14 +00:00
Eli Friedman	5e5704277f	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. llvm-svn: 138660	2011-08-26 21:21:21 +00:00
Bruno Cardoso Lopes	8347b86293	Add support for AVX 256-bit version of MOVDDUP! llvm-svn: 138588	2011-08-25 21:40:37 +00:00
Bruno Cardoso Lopes	388eacee2c	Make isMOVDDUP mask check more strict and update comments! llvm-svn: 138587	2011-08-25 21:40:34 +00:00
Bruno Cardoso Lopes	296256fb32	Add support for 256-bit versions of VSHUFPD and VSHUFPS. llvm-svn: 138546	2011-08-25 02:58:26 +00:00
Bruno Cardoso Lopes	2953d7b320	Move all SHUFP* patterns close to the SHUFP* definitions. Also be explicit about which subtarget they refer to, and add AVX versions of the ones we currently don't. Make the mask check more strict, to be clear it won't be used to match to 256-bit versions! llvm-svn: 138514	2011-08-24 23:17:55 +00:00
Eli Friedman	9c73a57b20	Hook up 64-bit atomic load/store on x86-32. I plan to write more efficient implementations eventually. llvm-svn: 138505	2011-08-24 22:33:28 +00:00
Eli Friedman	38cd821dc4	Fix whitespace. llvm-svn: 138487	2011-08-24 21:17:30 +00:00
Eli Friedman	342e8df0e0	Basic x86 code generation for atomic load and store instructions. llvm-svn: 138478	2011-08-24 20:50:09 +00:00
Craig Topper	de92622aa5	Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711. llvm-svn: 138427	2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes	9e9f2ce32d	Fix a nasty bug where a v4i64 was being wrong emitted with 32-bit permutations. Also tidy up some patterns and make them close to their instruction definition! llvm-svn: 138392	2011-08-23 22:06:37 +00:00
Nick Lewycky	4c8ff77f1b	PerformSubCombine to work on integers larger than i128. Fixes a crasher. llvm-svn: 138354	2011-08-23 19:01:24 +00:00
Craig Topper	6612e35b0d	Add support for breaking 256-bit v16i16 and v32i8 VSETCC into two 128-bit ones, avoiding sclarization. Add vex form of pcmpeqq and pcmpgtq. Fixes more cases for PR10712. llvm-svn: 138321	2011-08-23 04:36:33 +00:00
Bruno Cardoso Lopes	74f090d44c	Add support for breaking 256-bit int VETCC into two 128-bit ones, avoding scalarization of the compare. Reduces code from 59 to 6 instructions. Fix PR10712. llvm-svn: 138271	2011-08-22 20:31:04 +00:00
Bruno Cardoso Lopes	1a87fcb9ba	Fix PR10688. Add support for spliting 256-bit vector shifts when the shift amount is variable llvm-svn: 137885	2011-08-17 22:12:20 +00:00
Bruno Cardoso Lopes	be5e987379	Introduce matching patterns for vbroadcast AVX instruction. The idea is to match splats in the form (splat (scalar_to_vector (load ...))) whenever the load can be folded. All the logic and instruction emission is working but because of PR8156, there are no ways to match loads, cause they can never be folded for splats. Thus, the tests are XFAILed, but I've tested and exercised all the logic using a relaxed version for checking the foldable loads, as if the bug was already fixed. This should work out of the box once PR8156 gets fixed since MayFoldLoad will work as expected. llvm-svn: 137810	2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes	6d33c7f303	Update comments about vector splat handling in x86 llvm-svn: 137808	2011-08-17 02:29:13 +00:00
Bruno Cardoso Lopes	ed786a346e	Now that we have a canonical way to handle 256-bit splats: vinsertf128 $1 + vpermilps $0, remove the old code that used to first do the splat in a 128-bit vector and then insert it into a larger one. This is better because the handling code gets simpler and also makes a better room for the upcoming vbroadcast! llvm-svn: 137807	2011-08-17 02:29:10 +00:00
Bruno Cardoso Lopes	2e99f1b3aa	Instead of always leaving the work to the generic legalizer when there is no support for native 256-bit shuffles, be more smart in some cases, for example, when you can extract specific 128-bit parts and use regular 128-bit shuffles for them. Example: For this shuffle: shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 1, i32 0, i32 7, i32 6> This was expanded to: vextractf128 $1, %ymm1, %xmm2 vpextrq $0, %xmm2, %rax vmovd %rax, %xmm1 vpextrq $1, %xmm2, %rax vmovd %rax, %xmm2 vpunpcklqdq %xmm1, %xmm2, %xmm1 vpextrq $0, %xmm0, %rax vmovd %rax, %xmm2 vpextrq $1, %xmm0, %rax vmovd %rax, %xmm0 vpunpcklqdq %xmm2, %xmm0, %xmm0 vinsertf128 $1, %xmm1, %ymm0, %ymm0 ret Now we get: vshufpd $1, %xmm0, %xmm0, %xmm0 vextractf128 $1, %ymm1, %xmm1 vshufpd $1, %xmm1, %xmm1, %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 llvm-svn: 137733	2011-08-16 18:21:54 +00:00
Bruno Cardoso Lopes	cbe7feeab9	Fix PR10656. It's only profitable to use 128-bit inserts and extracts when AVX mode is one. Otherwise is just more work for the type legalizer. llvm-svn: 137661	2011-08-15 21:45:54 +00:00
Bruno Cardoso Lopes	c53dd2ac01	Fix comment! llvm-svn: 137521	2011-08-12 21:54:42 +00:00
Bruno Cardoso Lopes	f15dfe5818	The VPERM2F128 is a AVX instruction which permutes between two 256-bit vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. llvm-svn: 137519	2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes	8fbf023c9b	Add a dag combine to xform 256-bit shuffles into simple vector inserts and extracts. This simple combine makes us generate only 1 instruction instead of 11 in the v8 case. llvm-svn: 137362	2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes	043c820800	Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict. llvm-svn: 137324	2011-08-11 18:59:13 +00:00
Nadav Rotem	efdd183f52	Add a comment, per Bruno's CR. llvm-svn: 137313	2011-08-11 17:05:47 +00:00
Nadav Rotem	1542d5a00a	[AVX] If the data which is going to be saved is already in two XMM registers (for example, after integer operation), do not pack the registers into a YMM before saving. Its better to save as two XMM registers. Before: vinsertf128 $1, %xmm3, %ymm0, %ymm3 vinsertf128 $0, %xmm1, %ymm3, %ymm1 vmovaps %ymm1, 416(%rsp) After: vmovaps %xmm3, 416+16(%rsp) vmovaps %xmm1, 416(%rsp) llvm-svn: 137308	2011-08-11 16:41:21 +00:00
Bruno Cardoso Lopes	a2d8bb97b9	Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing infinite recursive calls in legalize. Fix PR10562 llvm-svn: 137296	2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes	572c9aaf53	Use the splat index to generate the desired shuffle. Otherwise we could only get undefs and the vector shuffle becomes an undef, generating wrong code. llvm-svn: 137295	2011-08-11 02:49:41 +00:00
Eli Friedman	3ae39f8ad1	Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2). Fixes PR9693. llvm-svn: 137292	2011-08-11 01:48:05 +00:00
Nadav Rotem	410a11fe82	When performing a truncating store, it is sometimes possible to rearrange the data in-register prior to saving to memory. When we reorder the data in memory we prevent the need to save multiple scalars to memory, making a single regular store. llvm-svn: 137238	2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes	278ffd7d8e	Fix a bug in vpermilps mask checking. Fix PR10560 llvm-svn: 137194	2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes	72323966c8	Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556 llvm-svn: 137179	2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes	6963062a99	Use fp unpack instructions to unpack int types. Until we have AVX2, this is the best we can do for these patterns. This fix PR10554. llvm-svn: 137161	2011-08-09 22:18:37 +00:00
Bruno Cardoso Lopes	24dd1d4a27	Revert r137114 llvm-svn: 137127	2011-08-09 17:39:01 +00:00
Bruno Cardoso Lopes	ad3453cf2d	Handle sitofp between v4f64 <- v4i32. Fix PR10559 llvm-svn: 137114	2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes	af6a85484c	Make LowerVSETCC aware of AVX types and add patterns to match them. llvm-svn: 137090	2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes	c96953c12a	Add support for several vector shifts operations while in AVX mode. Fix PR10581 llvm-svn: 137067	2011-08-08 21:31:08 +00:00
Evan Cheng	19e3f80579	Fix an obvious type. Patch by Ivan Krasin. llvm-svn: 136899	2011-08-04 18:38:15 +00:00
Bill Wendling	e234f6ae0c	Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR. Fixes PR10527. llvm-svn: 136853	2011-08-04 00:32:58 +00:00
Benjamin Kramer	103e2ec2df	Remove unused variables. llvm-svn: 136803	2011-08-03 19:53:48 +00:00
Eli Friedman	04c5025cd5	Don't create a ridiculous EXTRACT_ELEMENT. PR10563. The testcase looks extremely fragile, so I'm adding an assertion which should catch any cases like this. llvm-svn: 136711	2011-08-02 18:38:35 +00:00
Bruno Cardoso Lopes	5ada908140	Make this kind of lowering to be supported by 256-bit instructions: shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0> To: shuffle (vload ptr)), undef, <1, 1, 1, 1> Fix PR10494 llvm-svn: 136691	2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes	a8e3673816	Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise the legalizer. This commit together with the two previous ones fixes PR10495. llvm-svn: 136654	2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes	616fe60548	Teach PreprocessISelDAG to be aware of vector types and to not process them. llvm-svn: 136653	2011-08-01 21:54:05 +00:00
Bruno Cardoso Lopes	bd30a4b584	Lower CONCAT_VECTORS to use two VINSERTF128 instructions instead of using a stack store. llvm-svn: 136652	2011-08-01 21:54:02 +00:00
Bruno Cardoso Lopes	7513939ddd	Since vectors with all ones can't be created with a 256-bit instruction, avoid returning early for v8i32 types, which would only be valid for vector with all zeros. Also split the handling of zeros and ones into separate checking logic since they are handled differently. This fixes PR10547 llvm-svn: 136642	2011-08-01 19:51:53 +00:00
Eli Friedman	adec587d5c	Misc optimizer+codegen work for 'cmpxchg' and 'atomicrmw'. They appear to be working on x86 (at least for trivial testcases); other architectures will need more work so that they actually emit the appropriate instructions for orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC, Mips, and Alpha backends need such changes.) llvm-svn: 136457	2011-07-29 03:05:32 +00:00
Bruno Cardoso Lopes	65ce5ea3ba	Fix two tests that I crashed in the previous commits. The mask elts on the second half must be reindexed. llvm-svn: 136454	2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes	81eb193f2e	Match VPERMIL masks more strictly and update the target specific mask generation to always catch the weird cases. llvm-svn: 136453	2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes	795f558532	Add DecodeShuffle shuffle support for VPERMIPD variantes llvm-svn: 136452	2011-07-29 01:31:11 +00:00
Bruno Cardoso Lopes	c00f6728bc	Fix a bug while generating target specific VPERMIL masks: skip undef mask elements. This fixes PR10529. llvm-svn: 136450	2011-07-29 01:31:04 +00:00
Bruno Cardoso Lopes	b9ba465de8	Enable usage of SSE4 extracts and inserts in their 128-bit AVX forms. Also tidy up code a bit. llvm-svn: 136449	2011-07-29 01:31:02 +00:00
Bruno Cardoso Lopes	6aee388423	Cleanup PALIGNR handling and remove the old palign pattern fragment. Also make PALIGNR masks to don't match 256-bits, which isn't supported It's also a step to solve PR10489 llvm-svn: 136448	2011-07-29 01:30:59 +00:00
Bruno Cardoso Lopes	8c19a8b5d5	Invert the subvector insertion to be more likely to be taken as a COPY llvm-svn: 136324	2011-07-28 01:26:53 +00:00
Bruno Cardoso Lopes	9e2a301216	Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move a convert pattern close to the instruction definition. llvm-svn: 136320	2011-07-28 01:26:39 +00:00
Eli Friedman	26a484852e	Code generation for 'fence' instruction. llvm-svn: 136283	2011-07-27 22:21:52 +00:00
Jeffrey Yasskin	6381c0100b	Explicitly cast narrowing conversions inside {}s that will become errors in C++0x. llvm-svn: 136211	2011-07-27 06:22:51 +00:00
Bruno Cardoso Lopes	f9324f4f6b	Move some code around to open opportunity for more shuffle matching llvm-svn: 136201	2011-07-27 00:56:37 +00:00
Bruno Cardoso Lopes	27a30a7792	The vpermilps and vpermilpd have different behaviour regarding the usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. llvm-svn: 136200	2011-07-27 00:56:34 +00:00
Benjamin Kramer	124ac2b997	Add a neat little two's complement hack for x86. On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add, which can be commuted and encoded efficiently. This code is generated for __builtin_clz and friends. llvm-svn: 136167	2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes	f8fe47bd2b	Recognize unpckh* masks and match 256-bit versions. The new versions are different from the previous 128-bit because they work in lanes. Update a few comments and add testcases llvm-svn: 136157	2011-07-26 22:03:40 +00:00
Eli Friedman	93dc04d5ca	Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130. llvm-svn: 136148	2011-07-26 21:02:58 +00:00
Bruno Cardoso Lopes	d77b383199	More movsldup/movshdup cleanup. Rewrite the mask matching function and add support for 256-bit versions (but no instruction selection yet, coming next). llvm-svn: 136050	2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes	5b268a4b82	More cleanup, subtarget info isn't used here. llvm-svn: 136049	2011-07-26 02:39:25 +00:00
Bruno Cardoso Lopes	9212bf275d	Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128 This also fixes PR10452 llvm-svn: 136004	2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes	123dff0f58	- Handle special scalar_to_vector case: splats. Using a native 128-bit shuffle before inserting on a 256-bit vector. - Add AVX versions of movd/movq instructions - Introduce a few COPY patterns to match insert_subvector instructions. This turns a trivial insert_subvector instruction into a register copy, coalescing the xmm into a ymm and avoid emiting on more instruction. llvm-svn: 136002	2011-07-25 23:05:25 +00:00
Bruno Cardoso Lopes	276eb8debf	Reintroduce r135730, this is indeed the right approach, there is no native 256-bit vector instruction to do scalar_to_vector. llvm-svn: 136001	2011-07-25 23:05:16 +00:00
Eli Friedman	ea8c66fea5	Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask. Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle. llvm-svn: 135980	2011-07-25 21:36:45 +00:00
Rafael Espindola	77242dd537	Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64 too. Patch by Jeff Muizelaar. llvm-svn: 135789	2011-07-22 18:56:05 +00:00
Dan Gohman	c535278cf1	Fix x86's XALUO lowering to return its replacement values instead of doing the RAUW calls for the overflow value itself. This makes it more consistent with how the rest of LegalizeDAG works. llvm-svn: 135788	2011-07-22 18:45:15 +00:00
Benjamin Kramer	959b7e9df7	GCC complains about the angle of this line. Remove the escaped newline. llvm-svn: 135739	2011-07-22 01:02:57 +00:00
Bruno Cardoso Lopes	1872173841	Remove the 128-bit special handling from SCALAR_TO_VECTOR. This isn't the way to go. Doing this here will prevent several node matches later, and would have to force looking all the way through several VINSERTF128/VEXTRACTF128 chains to optimize simple things. llvm-svn: 135730	2011-07-22 00:15:10 +00:00
Bruno Cardoso Lopes	612e56174b	-Inspected a AVX code block added by someone in early Feb. This was never used and was actually very wrong, fix it and make it simpler. Also remove the ConcatVectors function, which is unused now. - Fix a introduction of useless nodes in r126664 and r126264. The VUNPCKL* should never be introduced cause we don't want duplicate nodes for 128 AVX and non-AVX modes, the actual instruction difference only exists during isel, but not for target specific DAG nodes. We only introduce V* target nodes when there is no 128-bit version already there. - Fix a fragile test and make it more useful. llvm-svn: 135729	2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes	91eff5140f	Add a DAGCombine for transforming 128->256 casts into a simple vxorps + vinsertf128 pair of instructions llvm-svn: 135727	2011-07-22 00:15:00 +00:00
Bruno Cardoso Lopes	dbebd01269	Introduce a new function to lower 256-bit vectors which are not direclty supported and should be promoted and handled by smaller shuffles llvm-svn: 135726	2011-07-22 00:14:56 +00:00
Bruno Cardoso Lopes	95d037721b	Rename function to be more specific and be more strict about its usage llvm-svn: 135725	2011-07-22 00:14:53 +00:00
Bruno Cardoso Lopes	178fb40612	- Register v16i16 as valid VR256 register class - Add more bitcasts for v16i16 - Since 135661 and 135662 already added the splat logic, just add one more splat test for v16i16 llvm-svn: 135663	2011-07-21 02:24:08 +00:00
Bruno Cardoso Lopes	b878caa5e2	Add support for 256-bit versions of VPERMIL instruction. This is a new instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 llvm-svn: 135662	2011-07-21 01:55:47 +00:00
Bruno Cardoso Lopes	fb4920eb25	Improve splat promotion to handle AVX types: v32i8 and v16i16. Also refactor the code and add a bunch of comments. The final shuffle emitted by handling 256-bit types is suitable for the VPERM shuffle instruction which is going to be introduced in a next commit (with a testcase which cover this commit) llvm-svn: 135661	2011-07-21 01:55:42 +00:00
Bruno Cardoso Lopes	0bdeacf03b	Tidy up code llvm-svn: 135656	2011-07-21 01:55:27 +00:00
Evan Cheng	bbf3b0de8b	Goodbye TargetAsmInfo. This eliminate last bit of CodeGen and Target in llvm-mc. There is still a bit more refactoring left to do in Targets. But we are now very close to fixing all the layering issues in MC. llvm-svn: 135611	2011-07-20 19:50:42 +00:00
Evan Cheng	d60fa58ba1	Sink getDwarfRegNum, getLLVMRegNum, getSEHRegNum from TargetRegisterInfo down to MCRegisterInfo. Also initialize the mapping at construction time. This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step towards fixing the layering violation. llvm-svn: 135424	2011-07-18 20:57:22 +00:00
Chris Lattner	229907cd11	land David Blaikie's patch to de-constify Type, with a few tweaks. llvm-svn: 135375	2011-07-18 04:54:35 +00:00
Bruno Cardoso Lopes	8df9cfc279	Fix a couple of things: 1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us canonize the loads and handle things the same way we use to handle for 128-bit registers. Despite of what one of the removed comments explained, the load promotion would not mess with VPERM, it's only a matter of doing the appropriate bitcasts when this instructions comes to be introduced. Also make LOAD v8i32 legal. 2) Doing 1) exposed two bugs: - v4i64 was being promoted to itself for several opcodes (introduced in r124447 by David Greene) causing endless recursion and the stack to explode. - there was no support for allOnes BUILD_VECTORs and ANDNP would fail to match because it was generating early target constant pools during lowering. 3) The testcases are already checked-in, doing 1) exposed the bugs in the current testcases. 4) Tidy up code to be more clear and explicit about AVX. llvm-svn: 135313	2011-07-15 22:24:33 +00:00
Eric Christopher	92464be28c	Check register class matching instead of width of type matching when determining validity of matching constraint. Allow i1 types access to the GR8 reg class for x86. Fixes PR10352 and rdar://9777108 llvm-svn: 135180	2011-07-14 20:13:52 +00:00
Nadav Rotem	771f29677f	[VECTOR-SELECT] During type legalization we often use the SIGN_EXTEND_INREG SDNode. When this SDNode is legalized during the LegalizeVector phase, it is scalarized because non-simple types are automatically marked to be expanded. In this patch we add support for lowering SIGN_EXTEND_INREG manually. This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements' flag. llvm-svn: 135144	2011-07-14 11:11:14 +00:00
Bruno Cardoso Lopes	9613b64916	Make X86ISD::ANDNP more general and Codegen 256-bit VANDNP. A more general version of X86ISD::ANDNP also opened the room for a little bit of refactoring. llvm-svn: 135088	2011-07-13 21:36:51 +00:00
Bruno Cardoso Lopes	7ba479d22f	The target specific node PANDN name is misleading. That happens because it's later selected to a ANDNPD/ANDNPS instruction instead of the PANDN instruction. Rename it. llvm-svn: 135087	2011-07-13 21:36:47 +00:00
Julien Lerouge	112fcc164a	Add _allrem, _aullrem and _allmul to the runtime for MSVC. http://llvm.org/bugs/show_bug.cgi?id=10305 llvm-svn: 134744	2011-07-08 21:40:25 +00:00
Cameron Zwarich	f03fa189ca	Add an intrinsic and codegen support for fused multiply-accumulate. The intent is to use this for architectures that have a native FMA instruction. llvm-svn: 134742	2011-07-08 21:39:21 +00:00
Nick Lewycky	9badf60203	Let the inline asm 'q' constraint match float, and on 64-bit double too. Fixes PR9602! llvm-svn: 134665	2011-07-08 00:19:27 +00:00
Eric Christopher	7a2a0f80de	Go ahead and emit the barrier on x86-64 even without sse2. The processor supports it just fine. Fixes PR9675 and rdar://9740801 llvm-svn: 134664	2011-07-08 00:04:56 +00:00
Eric Christopher	9721396dab	Add support for the X86 'l' constraint. Fixes PR10149 and rdar://9738585 llvm-svn: 134648	2011-07-07 22:29:07 +00:00
Eric Christopher	7e5f2350d3	Use getRegForInlineAsmConstraint instead of custom defining regclasses via vectors. Part of rdar://9643582 llvm-svn: 134079	2011-06-29 17:23:50 +00:00
Jakob Stoklund Olesen	7297e7e223	Clean up the handling of the x87 fp stack to make it more robust. Drop the FpMov instructions, use plain COPY instead. Drop the FpSET/GET instruction for accessing fixed stack positions. Instead use normal COPY to/from ST registers around inline assembly, and provide a single new FpPOP_RETVAL instruction that can access the return value(s) from a call. This is still necessary since you cannot tell from the CALL instruction alone if it returns anything on the FP stack. Teach fast isel to use this. This provides a much more robust way of handling fixed stack registers - we can tolerate arbitrary FP stack instructions inserted around calls and inline assembly. Live range splitting could sometimes break x87 code by inserting spill code in unfortunate places. As a bonus we handle floating point inline assembly correctly now. llvm-svn: 134018	2011-06-28 18:32:28 +00:00
Chad Rosier	15db390f8f	Replace dyn_cast<> with cast<> since the cast is already guarded by the necessary check. llvm-svn: 133874	2011-06-25 18:51:28 +00:00
Chad Rosier	bde13d3f76	Enable tail call optimization in the presence of a byval (x86-32 and x86-64). <rdar://problem/9483883> llvm-svn: 133858	2011-06-25 02:04:56 +00:00
Chad Rosier	e553e75b15	Hoist simple check above more complex checking to avoid unnecessary overheads. No functional change intended. llvm-svn: 133824	2011-06-24 21:15:36 +00:00
Evan Cheng	3a0c5e52ff	Remove TargetOptions.h dependency from X86Subtarget. llvm-svn: 133726	2011-06-23 17:54:54 +00:00
Benjamin Kramer	25e17b0f89	Remove unused but set variables. llvm-svn: 133347	2011-06-18 11:09:41 +00:00
John McCall	4b7a8d68ae	Add a new function attribute, nonlazybind, which inhibits lazy-loading optimizations when emitting calls to the function; instead those calls may use faster relocations which require the function to be immediately resolved upon loading the dynamic object featuring the call. This is useful when it is known that the function will be called frequently and pervasively and therefore there is no merit in delaying binding of the function. Currently only implemented for x86-64, where it turns into a call through the global offset table. Patch by Dan Gohman, who assures me that he's going to add LangRef documentation for this once it's committed. llvm-svn: 133080	2011-06-15 20:36:13 +00:00
Eric Christopher	0713a9d8fc	Add a parameter to CCState so that it can access the MachineFunction. No functional change. Part of PR6965 llvm-svn: 132763	2011-06-08 23:55:35 +00:00
Stuart Hastings	e0d3426e1a	Followup to 132458, omit unnecessary stack copy when x87 input is a load. rdar://problem/6373334 llvm-svn: 132696	2011-06-06 23:15:58 +00:00
Stuart Hastings	be605494ac	Reapply 132424 with fixes. This fixes PR10068. rdar://problem/5993888 llvm-svn: 132606	2011-06-03 23:53:54 +00:00
Eric Christopher	de9399bf76	Have LowerOperandForConstraint handle multiple character constraints. Part of rdar://9119939 llvm-svn: 132510	2011-06-02 23:16:42 +00:00
Rafael Espindola	aa318ae495	Revert 132424 to fix PR10068. llvm-svn: 132479	2011-06-02 19:57:47 +00:00
Stuart Hastings	8d530ad22a	Omit unnecessary stack copy when x87 input is a load. rdar://problem/6373334 llvm-svn: 132458	2011-06-02 15:57:11 +00:00
Stuart Hastings	7adc95f69e	Recommit 132404 with fixes. rdar://problem/5993888 llvm-svn: 132424	2011-06-01 21:33:14 +00:00
Stuart Hastings	aab130d995	Revert 132404 to appease a buildbot. rdar://problem/5993888 llvm-svn: 132419	2011-06-01 19:52:20 +00:00
Stuart Hastings	7b7c102f2c	Add support for x86 CMPEQSS and friends. These instructions do a floating-point comparison, generate a mask of 0s or 1s, and generally DTRT with NaNs. Only profitable when the user wants a materialized 0 or 1 at runtime. rdar://problem/5993888 llvm-svn: 132404	2011-06-01 17:17:45 +00:00
Stuart Hastings	9f20804216	FGETSIGN support for x86, using movmskps/pd. Will be enabled with a patch to TargetLowering.cpp. rdar://problem/5660695 llvm-svn: 132388	2011-06-01 04:39:42 +00:00
Stuart Hastings	493a12bf5e	Reverting 132105: it broke some LLVM-GCC DejaGNU tests. llvm-svn: 132108	2011-05-26 04:09:49 +00:00
Stuart Hastings	276f231c2f	Correctly handle a one-word struct passed byval on x86_64. rdar://problem/6920088 llvm-svn: 132105	2011-05-26 02:44:56 +00:00
Evan Cheng	88f9137fd7	- Teach SelectionDAG::isKnownNeverZero to return true (op x, c) when c is non-zero. - Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero. rdar://9490949 llvm-svn: 131948	2011-05-24 01:48:22 +00:00
Chad Rosier	552f8c4819	Don't attempt to tail call optimize for Win64. llvm-svn: 131709	2011-05-20 00:59:28 +00:00
Evan Cheng	e8d2e9eb35	Revert r131664 and fix it in instcombine instead. rdar://9467055 llvm-svn: 131708	2011-05-20 00:54:37 +00:00
Eric Christopher	4014e5e208	Oddly people want to use the 'r' constraint for fp constants on x86. Fixes rdar://9218925 Fixes PR9601 llvm-svn: 131682	2011-05-19 21:33:47 +00:00
Evan Cheng	2b9bd38678	crc32 with 64-bit output zeros upper 32-bits. rdar://9467055 llvm-svn: 131664	2011-05-19 18:57:12 +00:00
Chad Rosier	f4e832b14e	Enables vararg functions that pass all arguments via registers to be optimized into tail-calls when possible. llvm-svn: 131560	2011-05-18 19:59:50 +00:00
Eli Friedman	d000a2c26e	Clean up the mess created by r131467+r131469. llvm-svn: 131471	2011-05-17 18:02:22 +00:00
Stuart Hastings	c65d8eda7b	Revert 131467 due to buildbot complaint. llvm-svn: 131469	2011-05-17 16:59:46 +00:00
Stuart Hastings	3cf5308890	Fix an obscure issue in X86_64 parameter passing: if a tiny byval is passed as the fifth parameter, insure it's passed correctly (in R9). rdar://problem/6920088 llvm-svn: 131467	2011-05-17 16:45:55 +00:00
Nadav Rotem	d8edb1d5cc	Fix a bug in PerformEXTRACT_VECTOR_ELTCombine. The code created an ADD SDNode with two different types, in cases where the index and the ptr had different types. llvm-svn: 131461	2011-05-17 08:31:57 +00:00
Eli Friedman	d4a3609d30	Remove dead code. Fix associated test to use FileCheck. llvm-svn: 131424	2011-05-16 21:28:22 +00:00
Nadav Rotem	8f971c27fb	Add custom lowering of X86 vector SRA/SRL/SHL when the shift amount is a splat vector. llvm-svn: 131179	2011-05-11 08:12:09 +00:00
Eli Friedman	2518f8376d	Make the logic for determining function alignment more explicit. No functionality change. llvm-svn: 131012	2011-05-06 20:34:06 +00:00
Daniel Dunbar	cd01ed5bd6	ADT/Triple: Renambe isOSX... methods to isMacOSX for consistency with the OS triple component. llvm-svn: 129838	2011-04-20 00:14:25 +00:00
Daniel Dunbar	100455a3c8	Target/X86: Eliminate uses of getDarwinVers(). llvm-svn: 129813	2011-04-19 21:04:12 +00:00
Chris Lattner	0ab5e2cded	Fix a ton of comment typos found by codespell. Patch by Luis Felipe Strano Moraes! llvm-svn: 129558	2011-04-15 05:18:47 +00:00
Evan Cheng	ee9d45dd55	Don't try to create zero-sized stack objects. llvm-svn: 128586	2011-03-30 23:44:13 +00:00
Benjamin Kramer	8d2227373d	Make helper static. llvm-svn: 128338	2011-03-26 12:38:19 +00:00
NAKAMURA Takumi	521eb7c11e	Target/X86: [PR8777][PR8778] Tweak alloca/chkstk for Windows targets. FIXME: Some cleanups would be needed. llvm-svn: 128206	2011-03-24 07:07:00 +00:00
Andrew Trick	4ab9a16569	Revert r128175. I'm backing this out for the second time. It was supposed to be fixed by r128164, but the mingw self-host must be defeating the fix. llvm-svn: 128181	2011-03-23 23:11:02 +00:00
Andrew Trick	4046a0de91	Reapply Eli's r127852 now that the pre-RA scheduler can spill EFLAGS. (target-specific branchless method for double-width relational comparisons on x86) llvm-svn: 128175	2011-03-23 22:16:02 +00:00
Evan Cheng	0663f23bd8	Re-apply r127953 with fixes: eliminate empty return block if it has no predecessors; update dominator tree if cfg is modified. llvm-svn: 127981	2011-03-21 01:19:09 +00:00
Daniel Dunbar	327cd36f74	Revert r127953, "SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR", it broke a lot of things. llvm-svn: 127954	2011-03-19 21:47:14 +00:00
Evan Cheng	824a711305	SimplifyCFG has stopped duplicating returns into predecessors to canonicalize IR to have single return block (at least getting there) for optimizations. This is general goodness but it would prevent some tailcall optimizations. One specific case is code like this: int f1(void); int f2(void); int f3(void); int f4(void); int f5(void); int f6(void); int foo(int x) { switch(x) { case 1: return f1(); case 2: return f2(); case 3: return f3(); case 4: return f4(); case 5: return f5(); case 6: return f6(); } } => LBB0_2: ## %sw.bb callq _f1 popq %rbp ret LBB0_3: ## %sw.bb1 callq _f2 popq %rbp ret LBB0_4: ## %sw.bb3 callq _f3 popq %rbp ret This patch teaches codegenprep to duplicate returns when the return value is a phi and where the phi operands are produced by tail calls followed by an unconditional branch: sw.bb7: ; preds = %entry %call8 = tail call i32 @f5() nounwind br label %return sw.bb9: ; preds = %entry %call10 = tail call i32 @f6() nounwind br label %return return: %retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ] ret i32 %retval.0 This allows codegen to generate better code like this: LBB0_2: ## %sw.bb jmp _f1 ## TAILCALL LBB0_3: ## %sw.bb1 jmp _f2 ## TAILCALL LBB0_4: ## %sw.bb3 jmp _f3 ## TAILCALL rdar://9147433 llvm-svn: 127953	2011-03-19 17:17:39 +00:00
Nadav Rotem	e7a101ccab	Add support for legalizing UINT_TO_FP of vectors on platforms which do not have native support for this operation (such as X86). The legalized code uses two vector INT_TO_FP operations and is faster than scalarizing. llvm-svn: 127951	2011-03-19 13:09:10 +00:00
Eli Friedman	59721e3238	Revert r127852; it's apparently causing an ICE on mingw. llvm-svn: 127909	2011-03-18 21:12:29 +00:00
Eli Friedman	1a916a3c0c	Add a target-specific branchless method for double-width relational comparisons on x86. Essentially, the way this works is that SUB+SBB sets the relevant flags the same way a double-width CMP would. This is a substantial improvement over the generic lowering in LLVM. The output is also shorter than the gcc-generated output; I haven't done any detailed benchmarking, though. llvm-svn: 127852	2011-03-18 02:34:11 +00:00
Cameron Zwarich	2ef0c69df1	Move more logic into getTypeForExtArgOrReturn. llvm-svn: 127809	2011-03-17 14:53:37 +00:00
Cameron Zwarich	34e7b3f77e	Rename getTypeForExtendedInteger() to getTypeForExtArgOrReturn(). llvm-svn: 127807	2011-03-17 14:21:56 +00:00
Cameron Zwarich	ac106273d4	The x86-64 ABI says that a bool is only guaranteed to be sign-extended to a byte rather than an int. Thankfully, this only causes LLVM to miss optimizations, not generate incorrect code. This just fixes the zext at the return. We still insert an i32 ZextAssert when reading a function's arguments, but it is followed by a truncate and another i8 ZextAssert so it is not optimized. llvm-svn: 127766	2011-03-16 22:20:18 +00:00
Eric Christopher	cf56a5034f	Change the x86 32-bit scheduler to register pressure and fix up the corresponding testcases back to the previous versions. Fixes some performance regressions only seen on 32-bit. llvm-svn: 127441	2011-03-11 01:05:58 +00:00
Stuart Hastings	d17ae4e939	Revert 127359; it broke lencod. llvm-svn: 127382	2011-03-10 00:25:53 +00:00
Stuart Hastings	9955e2f912	X86 byval copies no longer always_inline. <rdar://problem/8706628> llvm-svn: 127359	2011-03-09 21:10:30 +00:00
NAKAMURA Takumi	58d1f93b03	Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4. llvm-svn: 127328	2011-03-09 11:33:15 +00:00
Benjamin Kramer	679cfb54ec	X86: Fix the (saddo/ssub x, 1) -> incl/decl selection to check the right operand for 1. Found by inspection. llvm-svn: 127247	2011-03-08 15:20:20 +00:00

... 4 5 6 7 8 ...

2074 Commits