llvm-project

Commit Graph

Author	SHA1	Message	Date
Nadav Rotem	b15c69a725	whitespace llvm-svn: 170997	2012-12-23 07:33:44 +00:00
Nadav Rotem	2cade68025	Loop Vectorizer: Update the cost model of scatter/gather operations and make them more expensive. llvm-svn: 170995	2012-12-23 07:23:55 +00:00
Benjamin Kramer	76268ac682	X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. llvm-svn: 170985	2012-12-22 16:07:56 +00:00
Benjamin Kramer	b2f0a2bd4b	X86: Emit vector sext as shuffle + sra if vpmovsx is not available. Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better than scalarized loads. Fixes PR14590. llvm-svn: 170984	2012-12-22 11:34:28 +00:00
Benjamin Kramer	82d1c371e2	X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization. Part of PR14667. llvm-svn: 170908	2012-12-21 17:46:58 +00:00
Benjamin Kramer	4669d18893	X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics This is very mechanical, no functionality change. Preparation for PR14667. llvm-svn: 170898	2012-12-21 14:04:55 +00:00
Nadav Rotem	6d4fdd6d2c	Improve the X86 cost model for loads and stores. llvm-svn: 170830	2012-12-21 01:33:59 +00:00
Patrik Hagglund	e09cac9a67	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. llvm-svn: 170537	2012-12-19 12:02:25 +00:00
Patrik Hagglund	f9eb168ef4	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. llvm-svn: 170532	2012-12-19 11:30:36 +00:00
NAKAMURA Takumi	89209462fe	X86ISelLowering.cpp: Fix warnings. [-Wlogical-op-parentheses] llvm-svn: 170523	2012-12-19 10:12:48 +00:00
Elena Demikhovsky	14a4af0e66	Optimized load + SIGN_EXTEND patterns in the X86 backend. llvm-svn: 170506	2012-12-19 07:50:20 +00:00
Bill Wendling	3d7b0b8ac7	Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future. llvm-svn: 170502	2012-12-19 07:18:57 +00:00
Jakub Staszak	338863a546	Reverse order of checking SSE level when calculating compare cost, so we check AVX2 before AVX. llvm-svn: 170464	2012-12-18 22:57:56 +00:00
Craig Topper	f3ff6ae066	Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible. llvm-svn: 170305	2012-12-17 05:12:30 +00:00
Benjamin Kramer	b16ccde7a4	X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. llvm-svn: 170273	2012-12-15 16:47:44 +00:00
Nadav Rotem	8487537bdb	TypeLegalizer: Do not generate target specific nodes with illegal types, because we cant type-legalize them. llvm-svn: 170245	2012-12-14 21:20:37 +00:00
Evan Cheng	962711ee71	Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I mention the inline memcpy / memset expansion code is a mess? This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset. The first indicates whether it is expanding a memset or a memcpy / memmove. The later is whether the memset is a memset of zero. It's totally possible (likely even) that targets may want to do different things for memcpy and memset of zero. llvm-svn: 169959	2012-12-12 02:34:41 +00:00
Evan Cheng	c3d1aca657	- Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term. Also added more comments to explain why it is generally ok to return true. - Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to be true for loaded source (memcpy) or zero constants (memset). The poor name choice is probably some kind of legacy issue. llvm-svn: 169954	2012-12-12 01:32:07 +00:00
Evan Cheng	04e5518783	Avoid using lossy load / stores for memcpy / memset expansion. e.g. f64 load / store on non-SSE2 x86 targets. llvm-svn: 169944	2012-12-12 00:42:09 +00:00
Patrik Hagglund	e98b7a0389	Revert EVT->MVT changes, r169836-169851, due to buildbot failures. llvm-svn: 169854	2012-12-11 11:14:33 +00:00
Patrik Hagglund	ad432a8e70	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. Accordingly, add bitsLT (and similar) to MVT. llvm-svn: 169850	2012-12-11 10:20:51 +00:00
Patrik Hagglund	8d2e7cf561	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. llvm-svn: 169845	2012-12-11 09:57:18 +00:00
Evan Cheng	79e2ca90bc	Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 llvm-svn: 169791	2012-12-10 23:21:26 +00:00
Shuxin Yang	95de7c37e2	- Re-enable population count loop idiom recognization - fix a bug which cause sigfault. - add two testing cases which was causing crash llvm-svn: 169687	2012-12-09 03:12:46 +00:00
Chandler Carruth	91e47532fe	Revert the patches adding a popcount loop idiom recognition pass. There are still bugs in this pass, as well as other issues that are being worked on, but the bugs are crashers that occur pretty easily in the wild. Test cases have been sent to the original commit's review thread. This reverts the commits: r169671: Fix a logic error. r169604: Move the popcnt tests to an X86 subdirectory. r168931: Initial commit adding the pass. llvm-svn: 169683	2012-12-08 22:18:29 +00:00
Bill Wendling	e94d843e43	s/AttrListPtr/AttributeSet/g to better label what this class is going to be in the near future. llvm-svn: 169651	2012-12-07 23:16:57 +00:00
Nadav Rotem	ad0b5fbe8c	When we use the BLEND instruction that uses the MSB as a mask, we can remove the VSRI instruction before it since it does not affect the MSB. Thanks Craig Topper for suggesting this. llvm-svn: 169638	2012-12-07 21:43:11 +00:00
Nadav Rotem	481e50efe0	X86: Prefer using VPSHUFD over VPERMIL because it has better throughput. llvm-svn: 169624	2012-12-07 19:01:13 +00:00
Evan Cheng	9ec512d768	Replace r169459 with something safer. Rather than having computeMaskedBits to understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 llvm-svn: 169536	2012-12-06 19:13:27 +00:00
Jakub Staszak	40ee5674cd	Remove unneeded function, since PR8156 was fixed over a year ago. llvm-svn: 169534	2012-12-06 19:05:46 +00:00
Jakub Staszak	65ca2fb9e6	Simplify code. llvm-svn: 169521	2012-12-06 18:22:59 +00:00
Evan Cheng	5213139f48	Let targets provide hooks that compute known zero and ones for any_extend and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 llvm-svn: 169459	2012-12-06 01:28:01 +00:00
Elena Demikhovsky	cd3c1c4a16	Simplified BLEND pattern matching for shuffles. Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2. llvm-svn: 169366	2012-12-05 09:24:57 +00:00
Evan Cheng	d31802c1f6	Add x86 isel lowering logic to form bit test with inverted condition. e.g. x ^ -1. Patch by David Majnemer. rdar://12755626 llvm-svn: 169339	2012-12-05 00:10:38 +00:00
Chandler Carruth	ed0881b2a6	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Shuxin Yang	abcc370423	rdar://12100355 (part 1) This revision attempts to recognize following population-count pattern: while(a) { c++; ... ; a &= a - 1; ... }, where <c> and <a>could be used multiple times in the loop body. TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent instruction sequence, which need to be improved in the following commits. Reviewed by Nadav, really appreciate! llvm-svn: 168931	2012-11-29 19:38:54 +00:00
Elena Demikhovsky	eace43bff7	I changed hasAVX() to hasFp256() and hasAVX2() to hasInt256() in X86IselLowering.cpp. The logic was not changed, only names. llvm-svn: 168875	2012-11-29 12:44:59 +00:00
Jakub Staszak	a17f3f8c30	Normalize splat 256bit vectors with 8 elements. llvm-svn: 168600	2012-11-26 19:24:31 +00:00
Craig Topper	c8c28d1ff0	Mark ISD::FMA as Legal instead of custom for x86 with FMA3/FMA4. Needed so that llvm.muladd can be converted to ISD::FMA for fp_contract. llvm-svn: 168413	2012-11-21 05:36:24 +00:00
Duncan Sands	d71b4e4568	Add the Erlang/HiPE calling convention, patch by Yiannis Tsiouris. llvm-svn: 168166	2012-11-16 12:36:39 +00:00
Craig Topper	70601ba6f9	Use roundps/pd for llvm.ceil, llvm.trunc, llvm.rint, and llvm.nearbyint of vector types. llvm-svn: 168141	2012-11-16 06:37:56 +00:00
Craig Topper	61d045781a	Add llvm.ceil, llvm.trunc, llvm.rint, llvm.nearbyint intrinsics. llvm-svn: 168025	2012-11-15 06:51:10 +00:00
Benjamin Kramer	6293429b51	X86: Enable SSE memory intrinsics even when stack alignment is less than 16 bytes. The stack realignment code was fixed to work when there is stack realignment and a dynamic alloca is present so this shouldn't cause correctness issues anymore. Note that this also enables generation of AVX instructions for memset under the assumptions: - Unaligned loads/stores are always fast on CPUs supporting AVX - AVX is not slower than SSE We may need some tweaked heuristics if one of those assumptions turns out not to be true. Effectively reverts r58317. Part of PR2962. llvm-svn: 167967	2012-11-14 20:08:40 +00:00
Craig Topper	a7f489d1ab	Factor out an overly replicated typecast. No functional change. llvm-svn: 167916	2012-11-14 06:41:09 +00:00
Manman Ren	0f3240d3a7	X86: when constructing VZEXT_LOAD from other loads, makes sure its output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://12684358 llvm-svn: 167859	2012-11-13 19:13:05 +00:00
Michael Liao	d39c0fb19f	Fix PR14314 - Fix operand order for atomic sub, where the minuend is the value loaded from memory and the subtrahend is the parameter specified. llvm-svn: 167718	2012-11-12 06:49:17 +00:00
Craig Topper	dd13d3fda1	Move some helper methods to being static functions in the implementation file. llvm-svn: 167696	2012-11-11 22:45:02 +00:00
Craig Topper	a43e2fd3eb	Remove unnecessary subtraction and addition by 1 around a couple for loops. llvm-svn: 167673	2012-11-10 09:25:36 +00:00
Craig Topper	84afbf2b02	Tidy up spacing. No functional change. llvm-svn: 167671	2012-11-10 09:02:47 +00:00
Craig Topper	f5d527401f	Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static. llvm-svn: 167669	2012-11-10 08:57:41 +00:00
Craig Topper	9268c94b15	Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. llvm-svn: 167652	2012-11-10 01:23:36 +00:00
Nadav Rotem	d1e906e1f1	indent llvm-svn: 167607	2012-11-09 07:02:24 +00:00
Michael Liao	73cffddb95	Add support of RTM from TSX extension - Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region llvm-svn: 167573	2012-11-08 07:28:54 +00:00
Jakub Staszak	7d6ee3e1b4	Simplify code. No functionality change. llvm-svn: 167505	2012-11-06 23:52:19 +00:00
Nadav Rotem	1c89744f32	Make the helper functions static. No functional change. llvm-svn: 167501	2012-11-06 23:36:00 +00:00
Nadav Rotem	f036ca466e	CostModel: add another known vector trunc optimization. llvm-svn: 167488	2012-11-06 21:17:17 +00:00
Nadav Rotem	0914f0b262	Cost Model: add tables for some avx type-conversion hacks. llvm-svn: 167480	2012-11-06 19:33:53 +00:00
Nadav Rotem	48c5b8e659	Refactor the getTypeLegalizationCost interface. No functionality change. llvm-svn: 167422	2012-11-05 23:57:45 +00:00
Nadav Rotem	c378a8067d	CostModel: Add tables for the common x86 compares. llvm-svn: 167421	2012-11-05 23:48:20 +00:00
Richard Smith	18d2762048	Suppress signed/unsigned comparison warning. llvm-svn: 167410	2012-11-05 22:01:44 +00:00
Nadav Rotem	856ffa6677	Cost Model: Normalize the insert/extract index when splitting types llvm-svn: 167402	2012-11-05 21:12:13 +00:00
Nadav Rotem	7411623fd8	Implement the cost of abnormal x86 instruction lowering as a table. llvm-svn: 167395	2012-11-05 19:32:46 +00:00
Nadav Rotem	c2345cbe73	X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. llvm-svn: 167347	2012-11-03 00:39:56 +00:00
Chandler Carruth	5da3f0512e	Revert the majority of the next patch in the address space series: r165941: Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis. Despite this commit log, this change primarily changed stuff outside of VMCore, and those changes do not carry any tests for correctness (or even plausibility), and we have consistently found questionable or flat out incorrect cases in these changes. Most of them are probably correct, but we need to devise a system that makes it more clear when we have handled the address space concerns correctly, and ideally each pass that gets updated would receive an accompanying test case that exercises that pass specificaly w.r.t. alternate address spaces. However, from this commit, I have retained the new C API entry points. Those were an orthogonal change that probably should have been split apart, but they seem entirely good. In several places the changes were very obvious cleanups with no actual multiple address space code added; these I have not reverted when I spotted them. In a few other places there were merge conflicts due to a cleaner solution being implemented later, often not using address spaces at all. In those cases, I've preserved the new code which isn't address space dependent. This is part of my ongoing effort to clean out the partial address space code which carries high risk and low test coverage, and not likely to be finished before the 3.2 release looms closer. Duncan and I would both like to see the above issues addressed before we return to these changes. llvm-svn: 167222	2012-11-01 09:14:31 +00:00
Shuxin Yang	01efdd6c28	(For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization. The adc/sbb optimization is to able to convert following expression into a single adc/sbb instruction: (ult) ... = x + 1 // where the ult is unsigned-less-than comparison (ult) ... = x - 1 This change is to flip the "x >u y" (i.e. ugt comparison) in order to expose the adc/sbb opportunity. llvm-svn: 167180	2012-10-31 23:11:48 +00:00
Michael Liao	e2d7e4e8e5	Clean up redundant SP register maintained in X86 TLI llvm-svn: 167104	2012-10-31 04:14:09 +00:00
Manman Ren	acb8becc73	X86 MMX: optimize transfer from mmx to i32 We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 llvm-svn: 167056	2012-10-30 22:15:38 +00:00
Jakub Staszak	a3d8e9974a	Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance to test it with chapni's fix (-mattr=+avx). llvm-svn: 166985	2012-10-30 00:01:57 +00:00
Jakub Staszak	d74cb61d86	Revert r166971. It causes buildbot failure. To be investigated. llvm-svn: 166979	2012-10-29 23:13:50 +00:00
Jakub Staszak	c3a92131dc	Remove unused variable. llvm-svn: 166973	2012-10-29 22:04:32 +00:00
Jakub Staszak	9c361bdfeb	Simplify code. No functionality change. llvm-svn: 166972	2012-10-29 22:02:26 +00:00
Jakub Staszak	c8f4825ba6	Allow to fold vector load if there is more than one bitcast, so in the case: %0 = load <8 x i16>* %dest %1 = shufflevector <8 x i16> %0, <8 x i16> %in, <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14> store <8 x i16> %1, <8 x i16>* %dest We get: vmovlpd (%eax), %xmm0, %xmm0 instead of: vmovaps (%eax), %xmm1 vmovsd %xmm1, %xmm0, %xmm0 No extra test-case is added. I just fixed the existing one (also it uses FileCheck now). llvm-svn: 166971	2012-10-29 21:56:35 +00:00
Duncan Sands	ac8448e0d0	Silence a GCC warning about comparing signed and unsigned types. llvm-svn: 166922	2012-10-29 11:29:53 +00:00
Michael Liao	6d810bd9b8	Clean up where SlotSize should be used instead of pointer size. llvm-svn: 166664	2012-10-25 06:29:14 +00:00
Michael Liao	c5af149e70	Add custom conversion from v2u32 to v2f32 in 32-bit mode - As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to v2f32 is added to improve the efficiency of the code generated. llvm-svn: 166545	2012-10-24 04:09:32 +00:00
Michael Liao	2843625bb5	Fix PR14161 - Check index being extracted to be constant 0 before simplfiying. Otherwise, retain the original sequence. llvm-svn: 166504	2012-10-23 21:40:15 +00:00
Matt Beaumont-Gay	bdcebd323a	Silence -Wsign-compare llvm-svn: 166494	2012-10-23 19:46:36 +00:00
Michael Liao	c03c03d56e	Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 - Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce DAG combine overhead. - Extend the support to v4i16/v8i16 as well. llvm-svn: 166487	2012-10-23 17:36:08 +00:00
Michael Liao	1be96bb5ce	Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 llvm-svn: 166486	2012-10-23 17:34:00 +00:00
Shuxin Yang	cdde059a34	This patch is to fix radar://8426430. It is about llvm support of __builtin_debugtrap() which is supposed to consistently raise SIGTRAP across all systems. In contrast, __builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap" functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap(). The X86 backend is already able to handle debugtrap(). This patch is to: 1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang). 2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which make the __builtin_debugtrap() "available" to all existing ports without the hassle of changing their code. 3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and __builtin_trap() will be expanded into the function call of the specified trap function. This behavior may need change in the future. The provided testing-case is to make sure 2) and 3) are working for ARM port, and we already have a testing case for x86. llvm-svn: 166300	2012-10-19 20:11:16 +00:00
Michael Liao	4b7ccfcaad	Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. llvm-svn: 166288	2012-10-19 17:15:18 +00:00
Michael Liao	cef9541dac	Check SSSE3 instead of SSE4.1 - All shuffle insns required, especially PSHUB, are added in SSSE3. llvm-svn: 166086	2012-10-17 03:59:18 +00:00
Michael Liao	6f7206132f	Fix setjmp on models with non-Small code model nor non-Static relocation model - MBB address is only valid as an immediate value in Small & Static code/relocation models. On other models, LEA is needed to load IP address of the restore MBB. - A minor fix of MBB in MC lowering is added as well to enable target relocation flag being propagated into MC. llvm-svn: 166084	2012-10-17 02:22:27 +00:00
Michael Liao	02ca34541e	Support v8f32 to v8i8/vi816 conversion through custom lowering - Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to vector element-wise widening) to reduce DAG combiner and its overhead added in X86 backend. llvm-svn: 166036	2012-10-16 18:14:11 +00:00
NAKAMURA Takumi	1705a999fa	Reapply r165661, Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". Original message since r165661: My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the ORIGINAL condition code. llvm-svn: 166017	2012-10-16 06:28:34 +00:00
Michael Liao	97bf363a9e	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. llvm-svn: 165989	2012-10-15 22:39:43 +00:00
Micah Villmow	4bb926d91d	Resubmit the changes to llvm core to update the functions to support different pointer sizes on a per address space basis. llvm-svn: 165941	2012-10-15 16:24:29 +00:00
Benjamin Kramer	ecd15d7f6c	X86: Fix accidentally swapped operands. llvm-svn: 165871	2012-10-13 12:50:19 +00:00
Benjamin Kramer	d6b9362fc2	X86: Promote i8 cmov when both operands are coming from truncates of the same width. X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this level is often not a good idea because it's too late for many optimizations to kick in. This solution doesn't add any extensions (truncs are free) and tries to avoid introducing partial register stalls by filtering direct copyfromregs. I'm seeing a ~10% speedup on reading a random .png file with libpng15 via graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture. llvm-svn: 165868	2012-10-13 10:39:49 +00:00
Micah Villmow	0c61134d8d	Revert 165732 for further review. llvm-svn: 165747	2012-10-11 21:27:41 +00:00
Micah Villmow	083189730e	Add in the first iteration of support for llvm/clang/lldb to allow variable per address space pointer sizes to be optimized correctly. llvm-svn: 165726	2012-10-11 17:21:41 +00:00
NAKAMURA Takumi	da0730c2d7	Revert r165661, "Patch by Shuxin Yang <shuxin.llvm@gmail.com>." It broke stage2 clang and test-suite/MultiSource/Benchmarks/mediabench/g721/g721encode. llvm-svn: 165692	2012-10-11 02:02:05 +00:00
Evan Cheng	60a25a571e	Change MachineInstrBuilder::addDisp to copy over target flags by default. llvm-svn: 165677	2012-10-11 00:15:48 +00:00
Nadav Rotem	17418964f8	Patch by Shuxin Yang <shuxin.llvm@gmail.com>. Original message: The attached is the fix to radar://11663049. The optimization can be outlined by following rules: (select (x != c), e, c) -> select (x != c), e, x), (select (x == c), c, e) -> select (x == c), x, e) where the <c> is an integer constant. The reason for this change is that : on x86, conditional-move-from-constant needs two instructions; however, conditional-move-from-register need only one instruction. While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase. The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource". llvm-svn: 165661	2012-10-10 21:31:55 +00:00
Michael Liao	e999b865dd	Add support for FP_ROUND from v2f64 to v2f32 - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. llvm-svn: 165631	2012-10-10 16:53:28 +00:00
Michael Liao	effae0c8e1	Add alternative support for FP_ROUND from v2f32 to v2f64 - Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. llvm-svn: 165625	2012-10-10 16:32:15 +00:00
Evan Cheng	3903e1be01	When expanding atomic load arith instructions, do not lose target flags. rdar://12453106 llvm-svn: 165568	2012-10-09 23:48:33 +00:00
Bill Wendling	c9b22d735a	Create enums for the different attributes. We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488	2012-10-09 07:45:08 +00:00
Micah Villmow	cdfe20b97f	Move TargetData to DataLayout. llvm-svn: 165402	2012-10-08 16:38:25 +00:00
Preston Gurd	0d67f5106c	This patch corrects commit 165126 by using an integer bit width instead of a pointer to a type, in order to remove the uses of getGlobalContext(). Patch by Tyler Nowicki. llvm-svn: 165255	2012-10-04 21:33:40 +00:00
Michael Liao	f54249b55f	Add register encoding support in X86 backend - Add 'HwEncoding' for X86 registers and call getEncodingValue() to retrieve their encoding values. - This's the first step to adopt new scheme. Furthur revising is onging. llvm-svn: 165241	2012-10-04 19:50:43 +00:00
Bill Wendling	b0a290ef9e	Use new accessor methods to query for attributes. llvm-svn: 165205	2012-10-04 06:43:21 +00:00
Michael Liao	d60d8143cf	Clean up tailing whitespaces llvm-svn: 165182	2012-10-03 23:43:52 +00:00
Craig Topper	4f1c8caf2f	Change getX86SubSuperRegister to take an MVT::SimpleValueType rather than an EVT and add llvm_unreachable to the switches. Helps it compile to dramatically better code. llvm-svn: 164919	2012-09-30 19:49:56 +00:00
Bill Wendling	863bab689a	Remove the `hasFnAttr' method from Function. The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725	2012-09-26 21:48:26 +00:00
Michael Liao	de51caf2a0	Add missing i64 max/min/umax/umin on 32-bit target - Turn on atomic6432.ll and add specific test case as well llvm-svn: 164616	2012-09-25 18:08:13 +00:00
Evan Cheng	446ff28df1	Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 llvm-svn: 164588	2012-09-25 05:32:34 +00:00
Michael Liao	a880186030	Add missing i8 max/min/umax/umin support - Fix PR5145 and turn on test 8-bit atomic ops llvm-svn: 164358	2012-09-21 03:18:52 +00:00
Michael Liao	c33bebff52	Revise td of X86 atomic instructions - Rewirte most atomic instructions in templates for both better maintenance and future extensions, such as HLE in TSX. llvm-svn: 164357	2012-09-21 03:00:17 +00:00
Michael Liao	3237662b65	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. llvm-svn: 164281	2012-09-20 03:06:15 +00:00
Benjamin Kramer	ece434252c	X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math. This was only an issue if sse is disabled. llvm-svn: 163967	2012-09-15 12:44:27 +00:00
Michael Liao	8b48bf27b0	Fix comment llvm-svn: 163835	2012-09-13 20:30:16 +00:00
Michael Liao	137f8aedea	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. llvm-svn: 163832	2012-09-13 20:24:54 +00:00
Michael Liao	abb87d4857	Fix PR11985 - BlockAddress has no support of BA + offset form and there is no way to propagate that offset into machine operand; - Add BA + offset support and a new interface 'getTargetBlockAddress' to simplify target block address forming; - All targets are modified to use new interface and X86 backend is enhanced to support BA + offset addressing. llvm-svn: 163743	2012-09-12 21:43:09 +00:00
Craig Topper	ad495964f1	Indentation fixes. No functional change. llvm-svn: 163682	2012-09-12 06:20:41 +00:00
Craig Topper	a29ed865d0	Make a bunch of lowering helper functions static instead of member functions. No functional change. llvm-svn: 163596	2012-09-11 06:15:32 +00:00
Dmitri Gribenko	ca1e27be0d	Remove redundant semicolons which are null statements. llvm-svn: 163547	2012-09-10 21:26:47 +00:00
Michael Liao	400f7ef871	Enhance PR11334 fix to support extload from v2f32/v4f32 - Fix an remaining issue of PR11674 as well llvm-svn: 163528	2012-09-10 18:33:51 +00:00
Michael Liao	c3d5b21c39	Add boolean simplification support from CMOV - If a boolean value is generated from CMOV and tested as boolean value, simplify the use of test result by referencing the original condition. RDRAND intrinisc is one of such cases. llvm-svn: 163516	2012-09-10 16:36:16 +00:00
Elena Demikhovsky	264fb0217e	The VPSHUFB 256-bit instruction may be generated when one of input vector is undefined or zeroinitializer. I've added the "zeroinitializer" case in this patch. llvm-svn: 163506	2012-09-10 12:13:11 +00:00
Craig Topper	4ed79bd7d7	Add instruction selection for ffloor of vectors when SSE4.1 or AVX is enabled. llvm-svn: 163473	2012-09-08 17:42:27 +00:00
Craig Topper	0955a9f4e1	Use 256-bit alignment for constant pool value for 256-bit vector FNEG lowering. llvm-svn: 163463	2012-09-08 07:46:05 +00:00
Craig Topper	98f2e861a0	Add support for lowering FABS of vector types. llvm-svn: 163461	2012-09-08 07:31:51 +00:00
Craig Topper	3e41a5bb31	Set operation action for FFLOOR to Expand for all vector types for X86. Set FFLOOR of v4f32 to Expand for ARM. v2f64 was already correct. llvm-svn: 163458	2012-09-08 04:58:43 +00:00
Elena Demikhovsky	42777877c2	AVX2 optimization. Added generation of VPSHUB instruction for <32 x i8> vector shuffle when possible. llvm-svn: 163312	2012-09-06 12:42:01 +00:00
Michael Liao	2d95a2b5c4	Remove duplicated helper function llvm-svn: 163295	2012-09-06 07:11:22 +00:00
Craig Topper	f3e4aa8cdd	Use iPTR instead of i32 for extract_subvector/insert_subvector index in lowering and patterns. This makes it consistent with the incoming DAG nodes from the DAG builder. llvm-svn: 163293	2012-09-06 06:09:01 +00:00
Roman Divacky	ad06cee239	Stop casting away const qualifier needlessly. llvm-svn: 163258	2012-09-05 22:26:57 +00:00
Preston Gurd	cdf540d5d6	Generic Bypass Slow Div - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150	2012-09-04 18:22:17 +00:00
Elena Demikhovsky	cbe99bbb36	This patch optimizes shuffle instruction - generates 2 instructions instead of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134	2012-09-04 12:49:02 +00:00
Craig Topper	d6cc4062be	Typos llvm-svn: 163053	2012-09-01 06:33:50 +00:00
Manman Ren	26c5d0f607	SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure its output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://11457792 llvm-svn: 163036	2012-08-31 23:16:57 +00:00
Michael Liao	3224543bf9	Fix PR12359 - In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as well as PSHUFB will zero elements with negative indices. Patch by Sriram Murali <sriram.murali@intel.com> llvm-svn: 163018	2012-08-31 20:12:31 +00:00
Craig Topper	c30fdbc46c	Add support for converting llvm.fma to fma4 instructions. llvm-svn: 162999	2012-08-31 15:40:30 +00:00
Craig Topper	e39ad7b549	Only perform DAG combine on FMAs of legal types. llvm-svn: 162892	2012-08-30 06:56:15 +00:00
Craig Topper	a999c66292	Convert FMA4 patterns to use target specific nodes instead of intrinsics to align with FMA3. llvm-svn: 162829	2012-08-29 07:18:25 +00:00
Michael Liao	407d659fa5	Add comments on the literal value used. llvm-svn: 162805	2012-08-28 23:42:17 +00:00
Michael Liao	710e1a594b	Explicitly update the number of nodes to be traversed llvm-svn: 162780	2012-08-28 19:20:29 +00:00
Michael Liao	b7d85b6328	Fix PR12312 - Add a target-specific DAG optimization to recognize a pattern PTEST-able. Such a pattern is a OR'd tree with X86ISD::OR as the root node. When X86ISD::OR node has only its flag result being used as a boolean value and all its leaves are extracted from the same vector, it could be folded into an X86ISD::PTEST node. llvm-svn: 162735	2012-08-28 03:34:40 +00:00
Craig Topper	a737ef8964	Remove MMX shift intrinsic handling code that also exists in SelectionDAGBuilder. llvm-svn: 162661	2012-08-27 08:08:30 +00:00
Craig Topper	663d160adb	Custom lower FMA intrinsics to target specific nodes and remove the patterns. llvm-svn: 162534	2012-08-24 04:03:22 +00:00
Michael Liao	10ff96ce8c	fix a case where all operands of BUILD_VECTOR are undefined llvm-svn: 162214	2012-08-20 17:59:18 +00:00
Nadav Rotem	178250ad87	When unsafe math is used, we can use commutative FMAX and FMIN. In some cases this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 llvm-svn: 162187	2012-08-19 13:06:16 +00:00
Nadav Rotem	a136939fa9	Reapply r162160 with a fix: Optimize Arith->Trunc->SETCC sequence to allow better compare/branch code. llvm-svn: 162172	2012-08-18 17:53:03 +00:00
Craig Topper	0128f9bad7	Refactor code a bit to reduce number of calls in the final compiled code. No functional change intended. llvm-svn: 162166	2012-08-18 06:39:34 +00:00
Nadav Rotem	c324af609e	Revert r162160 because it made a few buildbots fail. llvm-svn: 162164	2012-08-18 05:02:36 +00:00
Nadav Rotem	2cb14a5c4b	The X86 backend has a number of optimizations for SETCC nodes which use arithmetic instructions. However, when small data types are used, a truncate node appears between the SETCC node and the arithmetic operation. This patch adds support for this pattern. Before: xorl %esi, %edi testb %dil, %dil setne %al ret After: xorb %dil, %sil setne %al ret rdar://12081007 llvm-svn: 162160	2012-08-18 02:43:28 +00:00
Craig Topper	31625574db	Use nested switch to select arguments to reduce calls to EmitPCMP. llvm-svn: 162089	2012-08-17 07:15:56 +00:00
Craig Topper	602e1abe0d	Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce to only a single call to it thus allowing it to be inlined by the compiler. llvm-svn: 162088	2012-08-17 06:55:11 +00:00
Michael Liao	06f6fe875a	minor fix of X86ISD::VSEXT_MOVL dump llvm-svn: 161902	2012-08-14 22:53:17 +00:00
Michael Liao	34107b9177	fix PR11334 - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. llvm-svn: 161894	2012-08-14 21:24:47 +00:00
Craig Topper	925a281b00	Factor duplicate calls to getUNDEF in several functions. llvm-svn: 161860	2012-08-14 08:18:43 +00:00
Craig Topper	d0d4b11f66	Re-factor intrinsic lowering to combine common parts of similar intrinsics. Reduces compiled code size a little bit. llvm-svn: 161859	2012-08-14 07:43:25 +00:00
Craig Topper	4e5eb72735	Tidy up VSETCC lowering code a bit more by adding an llvm_unreachable and putting an a couple if conditions in a better order. llvm-svn: 161746	2012-08-13 03:42:38 +00:00
Craig Topper	5145a0d967	Refactor code a bit to share commonalities. No functional change intended. llvm-svn: 161745	2012-08-13 02:34:03 +00:00
Craig Topper	ff6e4d1928	Fix an unused variable warning from r161742. llvm-svn: 161743	2012-08-13 01:26:45 +00:00
Craig Topper	a7aaa62d54	Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result. llvm-svn: 161742	2012-08-13 01:23:55 +00:00
Craig Topper	3d2b271362	Remove call to setOperationAction for SETCC of v4f32. SETCC returns an integer type not an FP type. llvm-svn: 161738	2012-08-12 05:31:32 +00:00
Craig Topper	498228d089	Remove unnecessary call to setOperationAction for SETCC of v2i64 under SSE42. It was already called for the same under SSE2. llvm-svn: 161737	2012-08-12 05:15:16 +00:00
Craig Topper	10a8bf3b8c	Make replace many calls to getSizeInBits() with is128BitVector/is256BitVector llvm-svn: 161734	2012-08-12 02:23:29 +00:00
Craig Topper	03d2787275	Use MVT.isXBitVector instead of EVT.isXBitVector when setting up operation actions. Compiles to smaller code. llvm-svn: 161733	2012-08-12 00:34:56 +00:00
Michael Liao	e7e828fd64	fix PR13577, an issue introduced by r161687 - FCMOV only supports a subset of X86 conditions. Skip boolean simplification if X86 condition is not valid for FCMOV. - add a minimal test case for PR13577. llvm-svn: 161732	2012-08-11 23:47:06 +00:00
Craig Topper	b5bcf58ba1	Move setOperationAction for CONCAT_VECTORS for 256-bit vectors into loop since all 256-bit types are supported. llvm-svn: 161730	2012-08-11 22:34:26 +00:00
Michael Liao	5248e9913f	add X86-specific DAG optimization to simplify boolean test - if a boolean test (X86ISD::CMP or X86ISD:SUB) checks a boolean value generated from X86ISD::SETCC, try to simplify the boolean value generation and checking by reusing the original EFLAGS with proper condition code - add hooks to X86 specific SETCC/BRCOND/CMOV, the major 3 places consuming EFLAGS part of patches fixing PR12312 llvm-svn: 161687	2012-08-10 19:58:13 +00:00
Michael Liao	ea7d906b0f	remove tailing whitespaces and test commit llvm-svn: 161664	2012-08-10 14:39:24 +00:00
Joerg Sonnenberger	aa2f801ca3	Add some missing includes for the build against stdcxx. llvm-svn: 161657	2012-08-10 10:53:56 +00:00
Manman Ren	1be131ba27	X86: enable CSE between CMP and SUB We perform the following: 1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering. 2> Modify MachineCSE to correctly handle implicit defs. 3> Convert SUB back to CMP if possible at peephole. Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled by peephole now. rdar://11873276 llvm-svn: 161462	2012-08-08 00:51:41 +00:00
Evan Cheng	fbdd25c135	X86 cmp lowering is looking past truncate on the condition node. It should only do so when the high bits are known zero. This caused a subtle miscompilation. rdar://12027825 llvm-svn: 161451	2012-08-07 22:21:00 +00:00
Craig Topper	ab47fe4e16	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. llvm-svn: 161318	2012-08-06 06:22:36 +00:00
Craig Topper	6d0408d3a5	Remove custom inserter for MWAIT. It doesn't do anything that couldn't be represented in a pattern. llvm-svn: 161306	2012-08-05 00:36:57 +00:00
Craig Topper	43ee9fae92	Use a COPY node instead of an explicit MOVA opcode in the custom insterter for pcmpestrm/pcmpistrm. Allows the register allocator to handle it better and prevent wasted identity moves. llvm-svn: 161305	2012-08-05 00:17:48 +00:00
Bob Wilson	3e6fa462f3	Fall back to selection DAG isel for calls to builtin functions. Fast isel doesn't currently have support for translating builtin function calls to target instructions. For embedded environments where the library functions are not available, this is a matter of correctness and not just optimization. Most of this patch is just arranging to make the TargetLibraryInfo available in fast isel. <rdar://problem/12008746> llvm-svn: 161232	2012-08-03 04:06:28 +00:00
Chad Rosier	24c19d20c0	Whitespace. llvm-svn: 161122	2012-08-01 18:39:17 +00:00
Elena Demikhovsky	3cb3b0045c	Added FMA functionality to X86 target. llvm-svn: 161110	2012-08-01 12:06:00 +00:00
Rafael Espindola	11c38b9657	When a return struct pointer is passed in registers, the called has nothing to pop. llvm-svn: 160725	2012-07-25 13:41:10 +00:00
Sylvestre Ledru	35521e2310	Fix a typo (the the => the) llvm-svn: 160621	2012-07-23 08:51:15 +00:00
Evan Cheng	e6a3b03ee0	Back out r160101 and instead implement a dag combine to recover from instcombine transformation. llvm-svn: 160387	2012-07-17 18:54:11 +00:00
Evan Cheng	780f9b5f92	Implement r160312 as target indepedenet dag combine. llvm-svn: 160354	2012-07-17 08:31:11 +00:00
Evan Cheng	f579beca6d	This is another case where instcombine demanded bits optimization created large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 llvm-svn: 160346	2012-07-17 06:53:39 +00:00
Evan Cheng	75315b877c	For something like uint32_t hi(uint64_t res) { uint_32t hi = res >> 32; return !hi; } llvm IR looks like this: define i32 @hi(i64 %res) nounwind uwtable ssp { entry: %lnot = icmp ult i64 %res, 4294967296 %lnot.ext = zext i1 %lnot to i32 ret i32 %lnot.ext } The optimizer has optimize away the right shift and truncate but the resulting constant is too large to fit in the 32-bit immediate field. The resulting x86 code is worse as a result: movabsq $4294967296, %rax ## imm = 0x100000000 cmpq %rax, %rdi sbbl %eax, %eax andl $1, %eax This patch teaches the x86 lowering code to handle ult against a large immediate with trailing zeros. It will issue a right shift and a truncate followed by a comparison against a shifted immediate. shrq $32, %rdi testl %edi, %edi sete %al movzbl %al, %eax It also handles a ugt comparison against a large immediate with trailing bits set. i.e. X > 0x0ffffffff -> (X >> 32) >= 1 rdar://11866926 llvm-svn: 160312	2012-07-16 19:35:43 +00:00
Nadav Rotem	eec74c7279	Teach getTargetVShiftNode about TargetConstant nodes. llvm-svn: 160234	2012-07-15 20:27:43 +00:00
Nadav Rotem	9466e81df6	AVX: Fix a bug in getTargetVShiftNode. The shift amount has to be a 128bit vector with the same element type as the input vector. This is needed because of the patterns we have for the VP[SLL/SRA/SRL][W/D/Q] instructions. llvm-svn: 160222	2012-07-14 22:26:05 +00:00
Benjamin Kramer	4d0916788d	Give the rdrand instructions a SideEffect flag and a chain so MachineCSE and MachineLICM don't touch it. I already had the necessary things in place for IR-level passes but missed the machine passes. llvm-svn: 160137	2012-07-12 18:14:57 +00:00
Benjamin Kramer	0ab2794eda	Add intrinsics for Ivy Bridge's rdrand instruction. The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. llvm-svn: 160117	2012-07-12 09:31:43 +00:00
Nadav Rotem	d2bdcebb14	When ext-loading and trunc-storing vectors to memory, on x86 32bit systems, allow loads/stores of 64bit values from xmm registers. llvm-svn: 160044	2012-07-11 13:27:05 +00:00
Nadav Rotem	d908ddc186	Improve the loading of load-anyext vectors by allowing the codegen to load multiple scalars and insert them into a vector. Next, we shuffle the elements into the correct places, as before. Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the migration of bitcasts happened too late in the SelectionDAG process. llvm-svn: 159991	2012-07-10 13:25:08 +00:00
Jakob Stoklund Olesen	d14101e0b9	Make X86 call and return instructions non-variadic. Function argument and return value registers aren't part of the encoding, so they should be implicit operands. llvm-svn: 159728	2012-07-04 23:53:27 +00:00
Jakob Stoklund Olesen	2dee812445	Ensure CopyToReg nodes are always glued to the call instruction. The CopyToReg nodes that set up the argument registers before a call must be glued to the call instruction. Otherwise, the scheduler may emit the physreg copies long before the call, causing long live ranges for the fixed registers. Besides disabling good register allocation, that can also expose problems when EmitInstrWithCustomInserter() splits a basic block during the live range of a physreg. llvm-svn: 159721	2012-07-04 19:28:31 +00:00
Elena Demikhovsky	9af899fa88	Optimization of shuffle node that can fit to the register form of VBROADCAST instruction on AVX2. llvm-svn: 159504	2012-07-01 06:12:26 +00:00
Rafael Espindola	efdfb1e6b2	In the initial exec mode we always do a load to find the address of a variable. Before this patch in pic 32 bit code we would add the global base register and not load from that address. This is a really old bug, but before the introduction of the tls attributes we would never select initial exec for pic code. llvm-svn: 159409	2012-06-29 04:22:35 +00:00
Elena Demikhovsky	863d2d3235	Removed unused variable llvm-svn: 159197	2012-06-26 10:50:07 +00:00
Bill Wendling	8ed44466c2	Rename to match other X86_64* names. llvm-svn: 159196	2012-06-26 10:05:06 +00:00
Elena Demikhovsky	26088d2e24	Shuffle optimization for AVX/AVX2. The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction. Before: vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3] vextractf128 $1, %ymm1, %xmm1 vextractf128 $1, %ymm0, %xmm0 vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3] vinsertf128 $1, %xmm0, %ymm2, %ymm0 After: vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4] vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4] vunpcklps %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] llvm-svn: 159188	2012-06-26 08:04:10 +00:00
Eli Friedman	bbcd09cc00	Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196. llvm-svn: 159176	2012-06-25 23:42:33 +00:00
Jakob Stoklund Olesen	2e22e6a361	%RCX is not a function live-out in eh.return functions. The function live-out registers must be live at all function returns, and %RCX is only used by eh.return. When a function also has a normal return, only %RAX holds a return value. This fixes PR13188. llvm-svn: 159116	2012-06-24 15:53:01 +00:00
Pete Cooper	3c680dec8a	Remove code i'd been testing with but didn't mean to commit. Oops llvm-svn: 159094	2012-06-24 00:08:36 +00:00
Pete Cooper	fe212e762f	DAG legalisation can now handle illegal fma vector types by scalarisation llvm-svn: 159092	2012-06-24 00:05:44 +00:00
Rafael Espindola	a3088f09b3	Handle aliases to tls variables in all architectures, not just x86. llvm-svn: 159058	2012-06-23 00:30:03 +00:00
Craig Topper	b9e8e18949	Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases. llvm-svn: 158792	2012-06-20 05:39:26 +00:00
Rafael Espindola	ca3e0ee8b3	Move the support for using .init_array from ARM to the generic TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM, on X86 it is not easy to find out if .init_array should be used or not, so the decision is made via TargetOptions and defaults to off. Add a command line option to llc that enables it. llvm-svn: 158692	2012-06-19 00:48:28 +00:00
Craig Topper	a54893c662	Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279	2012-06-09 17:02:24 +00:00
Craig Topper	3352ba55b9	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. llvm-svn: 158278	2012-06-09 16:46:13 +00:00
Manman Ren	6bc2d27073	Enable optimization for integer ABS on X86 if Subtarget has CMOV. llvm-svn: 158220	2012-06-08 18:58:26 +00:00
Manman Ren	2cdc8afccf	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175	2012-06-07 22:39:10 +00:00
Nadav Rotem	bbd40f67d8	Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates. This may happen on MIC devices. llvm-svn: 158168	2012-06-07 20:53:48 +00:00
Manman Ren	746e4859d0	PR13046: we can't replace usage of SUB with CMP in the lowering phase. It will cause assertion failure later on. llvm-svn: 158160	2012-06-07 19:27:33 +00:00
Manman Ren	ae02c5a93e	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 158126	2012-06-07 00:42:47 +00:00
Benjamin Kramer	bde9176663	Fix typos found by http://github.com/lyda/misspell-check llvm-svn: 157885	2012-06-02 10:20:22 +00:00
Hans Wennborg	789acfb63d	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. llvm-svn: 157818	2012-06-01 16:27:21 +00:00
Jakob Stoklund Olesen	4f203ea34b	Add support for return value promotion in X86 calling conventions. Patch by Yiannis Tsiouris! llvm-svn: 157757	2012-05-31 17:28:20 +00:00
Justin Holewinski	aa58397b3c	Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB llvm-svn: 157479	2012-05-25 16:35:28 +00:00
Craig Topper	53b4b73be9	Fix constant used for pshufb mask when lowering v16i8 shuffles. Bug introduced in r157043. Fixes PR12908. llvm-svn: 157236	2012-05-22 06:09:38 +00:00
Craig Topper	e88f2fd4f7	Allow 256-bit shuffles to still be split even if only half of the shuffle comes from two 128-bit pieces. llvm-svn: 157175	2012-05-21 06:40:16 +00:00
Nadav Rotem	c93e91da27	On Haswell, perfer storing YMM registers using a single instruction. llvm-svn: 157129	2012-05-19 20:30:08 +00:00
Nadav Rotem	900c7cb7ce	Add support for additional in-reg vbroadcast patterns llvm-svn: 157127	2012-05-19 19:57:37 +00:00
Craig Topper	0cf4038c59	Simplify code a bit. No functional change intended. llvm-svn: 157044	2012-05-18 07:07:36 +00:00
Craig Topper	92db928ee9	Simplify handling of v16i8 shuffles and fix a missed optimization. llvm-svn: 157043	2012-05-18 06:42:06 +00:00
Hans Wennborg	f9d0e44b82	Implement initial-exec TLS model for 32-bit PIC x86 This fixes a TODO from 2007 :) Previously, LLVM would emit the wrong code here (see the update to test/CodeGen/X86/tls-pie.ll). llvm-svn: 156611	2012-05-11 10:11:01 +00:00
Nadav Rotem	1a65397017	Fix merge-typo and cleanup llvm-svn: 156541	2012-05-10 12:50:02 +00:00
Nadav Rotem	15946e50c1	AVX2: Add an additional broadcast idiom. llvm-svn: 156540	2012-05-10 12:39:13 +00:00
Nadav Rotem	b86a3fb8d0	Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program. Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users. Fix PR11900. llvm-svn: 156539	2012-05-10 12:22:05 +00:00
Chad Rosier	d8287fec17	Fix a regression from r147481. This combine should only happen if there is a single use. rdar://11360370 llvm-svn: 156316	2012-05-07 18:47:44 +00:00
Manman Ren	ef4e0479ec	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td: def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>; rdar: 10961709 llvm-svn: 156312	2012-05-07 18:06:23 +00:00
Craig Topper	00a1e6d48b	Use MVT instead of EVT as the argument to all the shuffle decode functions. Simplify some of the decode functions. llvm-svn: 156268	2012-05-06 19:46:21 +00:00
Craig Topper	804be3b546	Add VPERMQ/VPERMPD to the list of target specific shuffles that can be looked through for DAG combine purposes. llvm-svn: 156266	2012-05-06 18:54:26 +00:00
Benjamin Kramer	e31f31e5c0	Add a new target hook "predictableSelectIsExpensive". This will be used to determine whether it's profitable to turn a select into a branch when the branch is likely to be predicted. Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM. I'm not entirely happy with the name of this flag, suggestions welcome ;) llvm-svn: 156233	2012-05-05 12:49:14 +00:00
Craig Topper	bdd2e34b1f	Fix some loops to match coding standards. No functional change intended. llvm-svn: 156159	2012-05-04 06:39:13 +00:00
Craig Topper	d4d3237bb8	Fix up some spacing. No functional change. llvm-svn: 156158	2012-05-04 06:18:33 +00:00
Craig Topper	e2ae413746	Simplify broadcast lowering code. No functional change intended. llvm-svn: 156157	2012-05-04 05:49:51 +00:00
Craig Topper	42f2182366	Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles. llvm-svn: 156156	2012-05-04 04:44:49 +00:00
Craig Topper	59063c0a3d	Simplify shuffle narrowing code a bit. No functional change intended. llvm-svn: 156154	2012-05-04 04:08:44 +00:00
Craig Topper	242183834a	Use 'unsigned' instead of 'int' in a few places dealing with counts of vector elements. llvm-svn: 156060	2012-05-03 07:26:59 +00:00
Craig Topper	315a5cc789	Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982. llvm-svn: 156059	2012-05-03 07:12:59 +00:00
Preston Gurd	926afd7401	For Intel Atom, use ILP scheduling always, instead of ILP for 64 bit and Hybrid for 32 bit, since benchmarks show ILP scheduling is better most of the time. llvm-svn: 156028	2012-05-02 22:02:02 +00:00
Manman Ren	f02efc8731	Revert r155853 The commit is intended to fix rdar://10961709. But it is the root cause of PR12720. Revert it for now. llvm-svn: 155992	2012-05-02 15:24:32 +00:00
Craig Topper	c73bc39c22	Add support for selecting AVX2 vpshuflw and vpshufhw. Add decoding support for AsmPrinter. llvm-svn: 155982	2012-05-02 08:03:44 +00:00
Manman Ren	425a55c1ce	X86: optimization for max-like struct This patch will optimize the following cases on X86 (a > b) ? (a-b) : 0 (a >= b) ? (a-b) : 0 (b < a) ? (a-b) : 0 (b <= a) ? (a-b) : 0 FROM movl %edi, %ecx subl %esi, %ecx cmpl %edi, %esi movl $0, %eax cmovll %ecx, %eax TO xorl %eax, %eax subl %esi, %edi cmovll %eax, %edi movl %edi, %eax rdar: 10734411 llvm-svn: 155919	2012-05-01 17:16:15 +00:00
Manman Ren	4f4d5c8fc8	X86: optimization for -(x != 0) This patch will optimize -(x != 0) on X86 FROM cmpl $0x01,%edi sbbl %eax,%eax notl %eax TO negl %edi sbbl %eax %eax llvm-svn: 155853	2012-04-30 22:51:25 +00:00
Chad Rosier	d427d51c2b	Tidy up. No functional change intended. llvm-svn: 155832	2012-04-30 17:47:15 +00:00
Craig Topper	55b3990837	No need to normalize index before calling Extract128BitVector llvm-svn: 155811	2012-04-30 05:17:10 +00:00
Jakub Staszak	da03f3ba64	Remove unneeded casts. No functionality change. llvm-svn: 155800	2012-04-29 20:52:53 +00:00
Craig Topper	3b94fa63d6	Simplify code a bit. No functional change intended. llvm-svn: 155798	2012-04-29 20:22:05 +00:00
Craig Topper	0fa6c7e593	Use 'unsigned' instead of 'int' in several places when retrieving number of vector elements. llvm-svn: 155742	2012-04-27 22:54:43 +00:00
Chad Rosier	32c2178ef3	Add x86-specific DAG combine to simplify: x == -y --> x+y == 0 x != -y --> x+y != 0 On x86, the generated code goes from negl %esi cmpl %esi, %edi je .LBB0_2 to addl %esi, %edi je .L4 This case is correctly handled for ARM with "cmn". Patch by Manman Ren. rdar://11245199 PR12545 llvm-svn: 155739	2012-04-27 22:33:25 +00:00
Craig Topper	42cd8d2c00	Tidy up spacing. llvm-svn: 155733	2012-04-27 21:05:09 +00:00
Benjamin Kramer	913da4b261	X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! llvm-svn: 155704	2012-04-27 12:07:43 +00:00
Craig Topper	5ff6dc34b9	Use vector_shuffles instead of target specific unpack nodes for AVX ZERO_EXTEND/ANY_EXTEND combine. These will be converted to target specific nodes during lowering. This is more consistent with other code. llvm-svn: 155537	2012-04-25 06:39:39 +00:00
Nadav Rotem	7b7b99c74a	AVX2: The BLENDPW instruction selects between vectors of v16i16 using an i8 immediate. We can't use it here because the shuffle code does not check that the lower part of the word is identical to the upper part. llvm-svn: 155440	2012-04-24 11:27:53 +00:00
Craig Topper	0b65c40821	Remove dangling spaces. Fix some other formatting. llvm-svn: 155429	2012-04-24 06:36:35 +00:00
Craig Topper	6f2a535de2	Simplify code a bit and make it compile better. Remove unused parameters. llvm-svn: 155428	2012-04-24 06:02:29 +00:00
Nadav Rotem	3f8acfc3c4	Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics). llvm-svn: 155397	2012-04-23 21:53:37 +00:00
Craig Topper	153bb34a3c	Use MVT instead of EVT through all of LowerVECTOR_SHUFFLEtoBlend and not just the switch. Saves a little bit of binary size. llvm-svn: 155339	2012-04-23 07:36:33 +00:00
Craig Topper	0a2c809d09	Make getZeroVector and getOnesVector more alike as far as how they detect 128-bit versus 256-bit vectors. Be explicit about both sizes and use llvm_unreachable. Similar changes to getLegalSplat. llvm-svn: 155337	2012-04-23 07:24:41 +00:00
Craig Topper	2bbe8bcf4e	Tidy up by removing some 'else' after 'return' llvm-svn: 155336	2012-04-23 06:57:04 +00:00
Craig Topper	5c51eeecfc	Tidy up spacing in LowerVECTOR_SHUFFLEtoBlend. Remove code that checks if shuffle operand has a different type than the the shuffle result since it can never happen. llvm-svn: 155333	2012-04-23 06:38:28 +00:00
Craig Topper	a52f0d09b6	Add a couple llvm_unreachables. llvm-svn: 155332	2012-04-23 03:42:40 +00:00
Craig Topper	984dc015ae	Remove some tab characers. llvm-svn: 155331	2012-04-23 03:28:34 +00:00
Craig Topper	ea428fd79c	Remove some 'else' after 'return'. No functional change. llvm-svn: 155330	2012-04-23 03:26:18 +00:00
Craig Topper	bf7d5666f0	Make Extract128BitVector and Insert128BitVector take an unsigned instead of an ConstantNode SDValue. getConstant was almost always called just before only to have the functions take it apart and build a new ConstantSDNode. llvm-svn: 155325	2012-04-22 20:55:18 +00:00
Craig Topper	2d474d6d92	Convert getNode(UNDEF) to getUNDEF. llvm-svn: 155321	2012-04-22 19:29:34 +00:00
Craig Topper	860ed0d20a	Make calls to getVectorShuffle more consistent. Use shuffle VT for calls to getUNDEF instead of requerying. Use &Mask[0] instead of Mask.data(). llvm-svn: 155320	2012-04-22 19:17:57 +00:00
Craig Topper	43397c0900	Tidy up. 80 columns and argument alignment. llvm-svn: 155319	2012-04-22 18:51:37 +00:00
Craig Topper	ad56a744f1	Simplify code by converting multiple places that were manually concatenating 128-bit vectors to use either CONCAT_VECTORS or a helper function. CONCAT_VECTORS will itself be lowered to the same pattern as before. The helper function is needed for concats of BUILD_VECTORs since getNode(CONCAT_VECTORS) will just return a large BUILD_VECTOR and we may be trying to lower large BUILD_VECTORS when this occurs. llvm-svn: 155318	2012-04-22 18:15:59 +00:00
Elena Demikhovsky	8d7e56c409	ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2 llvm-svn: 155309	2012-04-22 09:39:03 +00:00
Craig Topper	6eadae8e60	Make some fixed arrays const. Use array_lengthof in a couple places instead of a hardcoded number. llvm-svn: 155294	2012-04-21 18:58:38 +00:00
Craig Topper	2568bf3089	Tidy up. 80 columns and some other spacing issues. llvm-svn: 155291	2012-04-21 18:13:35 +00:00
Craig Topper	abadc660e0	Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent. llvm-svn: 155186	2012-04-20 06:31:50 +00:00
Craig Topper	d3c9e404ba	Remove AVX vpermil intrinsics. I removed their uses from clang headers and builtins a while back. llvm-svn: 154985	2012-04-18 05:24:00 +00:00
Craig Topper	354103d8ca	Don't decode vperm2i128 or vperm2f128 into a shuffle if bit 3 or 7 of the immediate is set. llvm-svn: 154907	2012-04-17 05:54:54 +00:00
Richard Smith	12da79b859	Fix incorrect atomics codegen introduced in r154705, and extend test to catch it. llvm-svn: 154845	2012-04-16 18:43:53 +00:00
Craig Topper	4badeb3f0d	Replace vpermd/vpermps intrinic patterns with custom lowering to target specific nodes. llvm-svn: 154801	2012-04-16 07:13:00 +00:00
Craig Topper	26d7a94981	Change type profile for vpermv back to using operand type for the mask argument to match intrinsic behavior. Add a bitcast to the lowering code to convert mask from v8i32 to v8f32 for vpermps. llvm-svn: 154798	2012-04-16 06:43:40 +00:00
Craig Topper	b86fa404d3	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. llvm-svn: 154782	2012-04-16 00:41:45 +00:00
Craig Topper	1f8c9eb925	Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits. llvm-svn: 154780	2012-04-15 23:48:57 +00:00
Elena Demikhovsky	779a72b49e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
Richard Smith	3e8f1f6aea	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Nadav Rotem	372cf15125	remove unused argument llvm-svn: 154494	2012-04-11 11:05:21 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Chad Rosier	f7345b027a	Whitespace. llvm-svn: 154427	2012-04-10 19:42:07 +00:00
Chad Rosier	235a7a1746	Revert r154396, which looks to be the real culprit behind the bot failures. llvm-svn: 154426	2012-04-10 19:39:18 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
David Blaikie	2735136655	Remove unused variable. llvm-svn: 154398	2012-04-10 15:23:13 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Nadav Rotem	fb7e2ae53c	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Chandler Carruth	16f0ebcbb5	Move the TLSModel information into the TargetMachine rather than hiding in TargetLowering. There was already a FIXME about this location being odd. The interface is simplified as a consequence. This will also make it easier to change TLS models when compiling with PIE. llvm-svn: 154292	2012-04-08 17:20:55 +00:00
Nadav Rotem	82609df647	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Benjamin Kramer	3cacabfb04	Fix narrowing conversion. llvm-svn: 154171	2012-04-06 13:33:52 +00:00
Craig Topper	447417c932	Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413. llvm-svn: 154166	2012-04-06 07:45:23 +00:00
Rafael Espindola	ba0a6cabb8	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Nadav Rotem	b078350872	This commit contains a few changes that had to go in together. 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848	2012-04-01 19:31:22 +00:00
Craig Topper	9cfc69c779	Spacing fixes and using 'unsigned' instead of 'int' to index to select shuffle elements for consistency with other shuffle code in X86 backend. llvm-svn: 153154	2012-03-21 02:14:01 +00:00
Craig Topper	b34d96c614	Remove code that prevented lowering shuffles if they are used by load and themselves used by a extract_vector_elt. This was done to allow the DAG combiner to collapse to a single element load. Unfortunately, sometimes the extract_vector_elt would disappear before DAG combine could do the transformation leaving a vector_shuffle that isel couldn't handle. New code lets the shuffle be converted to a target specific node, but then adds a combine routine that can convert target specific nodes back to vector_shuffles if the folding criteria are met. llvm-svn: 153080	2012-03-20 07:17:59 +00:00
Craig Topper	cbc96a6e90	Factor out target shuffle mask decoding from getShuffleScalarElt and use a SmallVector of int instead of unsigned for shuffle mask in decode functions. Preparation for another change. llvm-svn: 153079	2012-03-20 06:42:26 +00:00
Craig Topper	129f9ef669	isCommutedMOVLMask should only look at 128-bit vectors to match isMOVLMask. llvm-svn: 153027	2012-03-18 22:50:10 +00:00
Craig Topper	bef78fc2ee	Convert more static tables of registers used by calling convention to uint16_t to reduce space. llvm-svn: 152538	2012-03-11 07:57:25 +00:00
Chad Rosier	9424aa1c51	Address Evan's comments for r151877. Specifically, remove the magic number when checking to see if the copy has a glue operand and simplify the checking logic. rdar://10930395 llvm-svn: 152041	2012-03-05 19:27:12 +00:00
Chad Rosier	f5e086f18e	Prevent obscure and incorrect tail-call optimization. In this instance we are generating the tail-call during legalizeDAG. The 2nd floor call can't be a tail call because it clobbers %xmm1, which is defined by the first floor call. The first floor call can't be a tail-call because it's not in the tail position. The only reasonable way I could think to fix this in a target-independent manner was to check for glue logic on the copy reg. rdar://10930395 llvm-svn: 151877	2012-03-02 02:50:46 +00:00
Evan Cheng	65f9d19c4f	Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call. llvm-svn: 151645	2012-02-28 18:51:51 +00:00
Daniel Dunbar	ee7b899343	Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part. llvm-svn: 151630	2012-02-28 15:36:07 +00:00
Evan Cheng	87c7b09d8d	Some ARM implementaions, e.g. A-series, does return stack prediction. That is, the processor keeps a return addresses stack (RAS) which stores the address and the instruction execution state of the instruction after a function-call type branch instruction. Calling a "noreturn" function with normal call instructions (e.g. bl) can corrupt RAS and causes 100% return misprediction so LLVM should use a unconditional branch instead. i.e. mov lr, pc b _foo The "mov lr, pc" is issued in order to get proper backtrace. rdar://8979299 llvm-svn: 151623	2012-02-28 06:42:03 +00:00
NAKAMURA Takumi	bdf94879df	Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff. [Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems: Clang raised a warning, and X86 LowerOperation would assert out for fptoui f64 to i32 because it improperly lowered to an illegal BUILD_PAIR. Here's a patch that addresses these issues. Let me know if any other changes are necessary. Thanks. llvm-svn: 151432	2012-02-25 03:37:25 +00:00
Michael J. Spencer	248d65e78b	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382	2012-02-24 19:01:22 +00:00
Craig Topper	760b134ffa	Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified. llvm-svn: 151134	2012-02-22 05:59:10 +00:00
Craig Topper	de121a1000	Remove some unneeded includes and fix ordering in X86ISelLowering.cpp. Remove unneeded 'using namespace'. llvm-svn: 150916	2012-02-19 07:15:48 +00:00
Craig Topper	65a4ceea1e	Unify all shuffle mask checking functions take a mask and VT instead of VectorShuffleSDNode. llvm-svn: 150913	2012-02-19 05:41:45 +00:00
Craig Topper	3e5c04e432	Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel. llvm-svn: 150908	2012-02-19 02:53:47 +00:00
Jakob Stoklund Olesen	97e3115dc2	Use the same CALL instructions for Windows as for everything else. The different calling conventions and call-preserved registers are represented with regmask operands that are added dynamically. llvm-svn: 150708	2012-02-16 17:56:02 +00:00
Jakob Stoklund Olesen	8a450cb2fa	Enable register mask operands for x86 calls. Call instructions no longer have a list of 43 call-clobbered registers. Instead, they get a single register mask operand with a bit vector of call-preserved registers. This saves a lot of memory, 42 x 32 bytes = 1344 bytes per call instruction, and it speeds up building call instructions because those 43 imp-def operands no longer need to be added to use-def lists. (And removed and shifted and re-added for every explicit call operand). Passes like LiveVariables, LiveIntervals, RAGreedy, PEI, and BranchFolding are significantly faster because they can deal with call clobbers in bulk. Overall, clang -O2 is between 0% and 8% faster, uniformly distributed depending on call density in the compiled code. Debug builds using clang -O0 are 0% - 3% faster. I have verified that this patch doesn't change the assembly generated for the LLVM nightly test suite when building with -disable-copyprop and -disable-branch-fold. Branch folding behaves slightly differently in a few cases because call instructions have different hash values now. Copy propagation flushes its data structures when it crosses a register mask operand. This causes it to leave a few dead copies behind, on the order of 20 instruction across the entire nightly test suite, including SPEC. Fixing this properly would require the pass to use different data structures. llvm-svn: 150638	2012-02-16 00:02:50 +00:00
Craig Topper	87119fa37f	Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel. llvm-svn: 150360	2012-02-13 04:30:38 +00:00
Anton Korobeynikov	c6b4017ce2	Add support for implicit TLS model used with MS VC runtime. Patch by Kai Nacke! llvm-svn: 150307	2012-02-11 17:26:53 +00:00
Craig Topper	11826a6e10	Fix shuffle lowering code to stop creating temporary DAG nodes to do shuffle mask checks on. This seemed to be confusing things such that vector_shuffle ops to got through to iselection. This is another step towards removing the vector_shuffle handling patterns from isel. llvm-svn: 150296	2012-02-11 06:24:48 +00:00
Elena Demikhovsky	1adc1d53dd	Fixed a bug in printing "cmp" pseudo ops. > This IR code > %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14) > fails with assertion: > > llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed. > 0 llc 0x0000000001355803 > 1 llc 0x0000000001355dc9 > 2 libpthread.so.0 0x00007f79a30575d0 > 3 libc.so.6 0x00007f79a23a1945 gsignal + 53 > 4 libc.so.6 0x00007f79a23a2f21 abort + 385 > 5 libc.so.6 0x00007f79a239a810 __assert_fail + 240 > 6 llc 0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const, unsigned int, llvm::raw_ostream&) + 119 I added the full testing for all possible pseudo-ops of cmp. I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp. You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in. llvm-svn: 150068	2012-02-08 08:37:26 +00:00
Craig Topper	5405571fe0	Remove GCC builtins for vpermilp* intrinsics as clang no longer needs them. Custom lower the intrinsics to the vpermilp target specific node and remove intrinsic patterns. llvm-svn: 150060	2012-02-08 06:36:57 +00:00
Craig Topper	b27fd77c3f	Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD. llvm-svn: 149968	2012-02-07 06:28:42 +00:00
Chris Lattner	8213c8af29	Remove some dead code and tidy things up now that vectors use ConstantDataVector instead of always using ConstantVector. llvm-svn: 149912	2012-02-06 21:56:39 +00:00
Benjamin Kramer	2496717052	X86: Don't call malloc for 4 bits. No functionality change. llvm-svn: 149866	2012-02-06 12:06:18 +00:00
Craig Topper	1f71057747	Add shuffle decoding support for 256-bit pshufd. Merge vpermilp* and pshufd decoding. llvm-svn: 149859	2012-02-06 07:17:51 +00:00
Duncan Sands	ae22c60f90	Persuade GCC that there is nothing worth warning about here (there isn't). llvm-svn: 149834	2012-02-05 14:20:11 +00:00
Craig Topper	4ed7278ff4	Convert assert(0) to llvm_unreachable in X86 Target directory. llvm-svn: 149809	2012-02-05 05:38:58 +00:00
Craig Topper	83f3bdaa45	Convert some assert(0) in default of switch statements to llvm_unreachable. llvm-svn: 149808	2012-02-05 03:43:23 +00:00
Craig Topper	1d471e31ba	Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies. llvm-svn: 149807	2012-02-05 03:14:49 +00:00
Craig Topper	47e6d26911	Remove getShuffleVPERMILPImmediate function, getShuffleSHUFImmediate performs the same calculation. llvm-svn: 149683	2012-02-03 06:52:33 +00:00
Craig Topper	d5ffe0900d	Remove unnecessary qualification on 256-bit vector handling in LowerBUILD_VECTOR. Condition was already guaranteed by earlier code. llvm-svn: 149680	2012-02-03 06:32:21 +00:00
Lang Hames	bb682450f9	Incorporate suggestions Chad, Jakob and Evan's suggestions on r149957. llvm-svn: 149655	2012-02-03 01:13:49 +00:00
Jakob Stoklund Olesen	5e1ac45b93	Require non-NULL register masks. It doesn't seem worthwhile to give meaning to a NULL register mask pointer. It complicates all the code using register mask operands. llvm-svn: 149646	2012-02-02 23:52:57 +00:00
Elena Demikhovsky	6fbb4d2842	Minor change in signature of the getZeroVector() llvm-svn: 149601	2012-02-02 09:20:18 +00:00
Elena Demikhovsky	fb44980b41	Optimization for SIGN_EXTEND operation on AVX. Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32 extensions. llvm-svn: 149600	2012-02-02 09:10:43 +00:00
Francois Pichet	26f302d568	Unbreak the MSVC build. llvm-svn: 149599	2012-02-02 08:36:09 +00:00
Lang Hames	0269caafa6	Set EFLAGS correctly in EmitLoweredSelect on X86. llvm-svn: 149597	2012-02-02 07:48:37 +00:00
Andrew Trick	8523b16ff5	Instruction scheduling itinerary for Intel Atom. Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT. Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches. Adds a test to verify that the scheduler is working. Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP. Patch by Preston Gurd! llvm-svn: 149558	2012-02-01 23:20:51 +00:00
Mon P Wang	9f05206659	Avoid creating an extract element to an illegal type after LegalizeTypes has run. llvm-svn: 149548	2012-02-01 22:15:20 +00:00
Chad Rosier	e273cb08c4	Tidy up. llvm-svn: 149521	2012-02-01 18:45:51 +00:00
Elena Demikhovsky	34cca175ab	Shortened code in shuffle masks llvm-svn: 149493	2012-02-01 10:33:05 +00:00
Elena Demikhovsky	0e48c70ba7	Optimization for "truncate" operation on AVX. Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles. llvm-svn: 149485	2012-02-01 07:56:44 +00:00
Craig Topper	9cdb8bdf04	Don't create VBROADCAST nodes if any nodes use the chain result from the load. Fixes PR11900. llvm-svn: 149478	2012-02-01 06:51:58 +00:00
Craig Topper	b85e40f738	Remove pcmpgt/pcmpeq intrinsics as clang is not using them. llvm-svn: 149367	2012-01-31 06:52:44 +00:00
Benjamin Kramer	396c590818	Fix refacto. llvm-svn: 149269	2012-01-30 20:01:35 +00:00
Douglas Gregor	e577cfe172	Eliminate narrowing conversion in initializer list, to make C++11 happy llvm-svn: 149254	2012-01-30 16:57:18 +00:00
Benjamin Kramer	20af25f47b	X86: Simplify shuffle mask generation code. llvm-svn: 149248	2012-01-30 15:16:21 +00:00
Craig Topper	516cba3380	Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary. llvm-svn: 149232	2012-01-30 07:50:31 +00:00
Craig Topper	ca29bcfc10	Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes. llvm-svn: 149216	2012-01-30 01:10:15 +00:00
Craig Topper	b91760eff8	Remove some more patterns by custom lowering intrinsics to target specific nodes. llvm-svn: 149052	2012-01-26 07:18:03 +00:00
Chris Lattner	33633a90a0	fix a bug I introduced in r148929, this is not a splat! Thanks to Eli for noticing. llvm-svn: 148947	2012-01-25 09:56:22 +00:00
Craig Topper	7834900950	Custom lower PSIGN and PSHUFB intrinsics to their corresponding target specific nodes so we can remove the isel patterns. llvm-svn: 148933	2012-01-25 06:43:11 +00:00
Chris Lattner	47a86bdbe2	use ConstantVector::getSplat in a few places. llvm-svn: 148929	2012-01-25 06:02:56 +00:00
Craig Topper	ce4f9c5668	Custom lower phadd and phsub intrinsics to target specific nodes. Remove the patterns that are no longer necessary. llvm-svn: 148927	2012-01-25 05:37:32 +00:00
Elena Demikhovsky	0b0c5d8c4c	ZERO_EXTEND operation is optimized for AVX. v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions. llvm-svn: 148803	2012-01-24 13:54:13 +00:00
Craig Topper	edd1d0acfc	Custom lower PCMPEQ/PCMPGT intrinsics to target specific nodes and remove the intrinsic patterns. llvm-svn: 148687	2012-01-23 08:18:28 +00:00
Craig Topper	6b90c5d03e	Update more places to use target specific nodes for vector shifts instead of intrinsics. llvm-svn: 148685	2012-01-23 06:46:22 +00:00
Craig Topper	5e80db4e4f	Custom lower vector shift intrinsics to target specific nodes and remove the patterns that are no longer needed. llvm-svn: 148684	2012-01-23 06:16:53 +00:00
Craig Topper	0b7ad76bd0	Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching. llvm-svn: 148670	2012-01-22 23:36:02 +00:00
Craig Topper	bd4884371b	Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching. llvm-svn: 148667	2012-01-22 22:42:16 +00:00
Craig Topper	094626414d	Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead. llvm-svn: 148664	2012-01-22 19:15:14 +00:00
Craig Topper	a4ed5246d8	Make code a little less verbose. llvm-svn: 148651	2012-01-22 03:07:48 +00:00
Craig Topper	cb3433cd58	Remove unused X86 ISD node type defines. llvm-svn: 148644	2012-01-22 01:15:56 +00:00
Craig Topper	39bc1e4d25	Fix PR11819 introduced by r148537. I'd commit the test case, but the generated code is terrible as it gets fully scalarized. Expect a future commit to fix that. llvm-svn: 148632	2012-01-21 08:49:33 +00:00
David Blaikie	46a9f016c5	More dead code removal (using -Wunreachable-code) llvm-svn: 148578	2012-01-20 21:51:11 +00:00
Craig Topper	a409479023	Improve 256-bit shuffle splitting to allow 2 sources in each 128-bit lane. As long as only a single lane of the source is used in the lane in the destination. This makes the splitting match much closer to what happens with 256-bit shuffles when AVX is disabled and only 128-bit XMM is allowed. llvm-svn: 148537	2012-01-20 09:29:03 +00:00
Craig Topper	3469212c82	Add support for selecting 256-bit PALIGNR. llvm-svn: 148532	2012-01-20 05:53:00 +00:00
Eli Friedman	32c7c25dcb	Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff. llvm-svn: 148513	2012-01-20 00:05:46 +00:00
Craig Topper	80576e8d1f	Merge 128-bit and 256-bit SHUFPS/SHUFPD handling. llvm-svn: 148466	2012-01-19 08:19:12 +00:00
Nick Lewycky	ecc0084f72	Add a TargetOption for disabling tail calls. llvm-svn: 148442	2012-01-19 00:34:10 +00:00
Jakob Stoklund Olesen	ff482f733b	Add experimental -x86-use-regmask command line option. It adds register mask operands to x86 call instructions. Once all the backend passes support register mask operands, this will be permanently enabled. llvm-svn: 148438	2012-01-18 23:52:22 +00:00
Nadav Rotem	86c3807b99	Fix warning. llvm-svn: 148301	2012-01-17 09:31:09 +00:00
Nadav Rotem	86e5390dbf	Fix 11769. In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner. However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part. llvm-svn: 148298	2012-01-17 09:13:19 +00:00
Craig Topper	9cafcd8baa	Remove unnecessary AVX check from an assert. hasSSE2 is enough. llvm-svn: 148295	2012-01-17 08:23:44 +00:00
Craig Topper	37b10ef250	Fix a crasher when PerformShiftCombine receives a BUILD_VECTOR of all UNDEF. Probably could use better handling in DAG combine or getNode. Fixes PR11772. llvm-svn: 148285	2012-01-17 04:44:50 +00:00
Nadav Rotem	57935243bd	[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225	2012-01-15 19:27:55 +00:00
Benjamin Kramer	339ced4e34	Return an ArrayRef from ShuffleVectorSDNode::getMask and push it through CodeGen. llvm-svn: 148218	2012-01-15 13:16:05 +00:00
Craig Topper	b1c2ebf6ee	use v8i32 as optimal mem type over v8f32 if AVX2 is enabled. Similar to SSE2 vs SSE1. llvm-svn: 148109	2012-01-13 08:32:21 +00:00
Craig Topper	cb7e13d7c0	Make X86 instruction selection use 256-bit VPXOR for build_vector of all ones if AVX2 is enabled. This gives the ExeDepsFix pass a chance to choose FP vs int as appropriate. Also use v8i32 as the type for getZeroVector if AVX2 is enabled. This is consistent with SSE2 using prefering v4i32. llvm-svn: 148108	2012-01-13 08:12:35 +00:00
Craig Topper	2aa07f832e	Fix typo in PerformAddCombine that caused any vector type to be checked for horizontal add/sub if AVX2 is enabled. This caused an assert to fail for non 128/256-bit vectors when done before type legalizing. Fixes PR11749. llvm-svn: 148096	2012-01-13 05:04:25 +00:00
Elena Demikhovsky	060f6ccdb8	Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed. Added a test. llvm-svn: 148044	2012-01-12 20:33:10 +00:00
Nadav Rotem	0a0a829bea	Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors. The attached testcase was stuck in an endless loop. llvm-svn: 148027	2012-01-12 15:31:55 +00:00
Rafael Espindola	6635ae1c17	Explicitly set the scale to 1 on some segstack prologue instrs. Patch by Brian Anderson. llvm-svn: 147952	2012-01-11 18:14:03 +00:00
Nadav Rotem	baae7e4577	Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead. llvm-svn: 147948	2012-01-11 14:07:51 +00:00
Lang Hames	995c63329a	Fixed order of operands in comment to match code. llvm-svn: 147890	2012-01-10 22:53:20 +00:00
Bill Wendling	d5ab02600e	For i386, don't use the generic code. As the comment around 7746 says, it's better to use the x87 extended precision here than SSE. And the generic code doesn't know how to do that. It also regains the speed lost for the uint64_to_float.c testcase. <rdar://problem/10669858> llvm-svn: 147869	2012-01-10 19:41:30 +00:00
Craig Topper	430f3f1bd6	Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside. llvm-svn: 147844	2012-01-10 08:23:59 +00:00
Craig Topper	b0c0f72ae6	Remove hasXMM/hasXMMInt functions. Move callers to hasSSE1/hasSSE2. This is the final piece to remove the AVX hack that disabled SSE. llvm-svn: 147843	2012-01-10 06:54:16 +00:00
Craig Topper	d97bbd7b60	Remove hasSSEorAVX functions and change all callers to use just hasSSE. AVX is now an SSE level and no longer disables SSE checks. llvm-svn: 147842	2012-01-10 06:37:29 +00:00
Craig Topper	210e4f81b3	Change some places that were checking for AVX OR SSE1/2 to use hasXMM/hasXMMInt instead. Also fix one place that checked SSE3, but accidentally excluded AVX to use hasSSE3orAVX. This is a step towards removing the AVX hack from the X86Subtarget.h llvm-svn: 147764	2012-01-09 02:28:15 +00:00
Victor Umansky	540651cf59	Reverted commit #147601 upon Evan's request. llvm-svn: 147748	2012-01-08 17:20:33 +00:00
Benjamin Kramer	6898db6269	Remove VectorExtras. This unused helper was written for a type of API that is discouraged now. llvm-svn: 147738	2012-01-07 19:42:13 +00:00
Craig Topper	ca66bba45e	Remove unnecessary check of hasAVX(). It's already included in hasXMM(). llvm-svn: 147734	2012-01-07 18:48:43 +00:00
Eric Christopher	c206d46709	Make the 'x' constraint work for AVX registers as well. Fixes rdar://10614894 llvm-svn: 147704	2012-01-07 01:02:09 +00:00
Victor Umansky	9255b6d9fe	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601	2012-01-05 08:46:19 +00:00
Bill Wendling	ac27f0c830	Replace the uint64_t -> double convertion algorithm with one that's more efficient. This small bit of ASM code is sufficient to do what the old algorithm did: movq %rax, %xmm0 punpckldq (c0), %xmm0 // c0: (uint4){ 0x43300000U, 0x45300000U, 0U, 0U } subpd (c1), %xmm0 // c1: (double2){ 0x1.0p52, 0x1.0p52 * 0x1.0p32 } #ifdef __SSE3__ haddpd %xmm0, %xmm0 #else pshufd $0x4e, %xmm0, %xmm1 addpd %xmm1, %xmm0 #endif It's arguably faster. One caveat, the 'haddpd' instruction isn't very fast on all processors. <rdar://problem/7719814> llvm-svn: 147593	2012-01-05 02:13:20 +00:00
Evan Cheng	104dbb0fd1	For x86, canonicalize max (x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512	2012-01-04 01:41:39 +00:00
Chad Rosier	6ca97df951	Fix 80-column violations. llvm-svn: 147495	2012-01-03 23:19:12 +00:00
Nadav Rotem	6d31bac85e	Revert 147426 because it caused pr11696. llvm-svn: 147485	2012-01-03 22:19:42 +00:00
Chad Rosier	493c1b3152	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Craig Topper	5bacb7e9e5	Miscellaneous shuffle lowering cleanup. No functional changes. Primarily converting the indexing loops to unsigned to be consistent across functions. llvm-svn: 147430	2012-01-02 09:17:37 +00:00
Craig Topper	53d559641f	Make CanXFormVExtractWithShuffleIntoLoad reject loads with multiple uses. Also make it return false if there's not even a load at all. This makes the code better match the code in DAGCombiner that it tries to match. These two changes prevent some cases where vector_shuffles were making it to instruction selection and causing the older shuffle selection code to be triggered. Also needed to fix a bad pattern that this change exposed. This is the first step towards getting rid of the old shuffle selection support. No test cases yet because there's no way to tell whether a shuffle was handled in the legalize stage or at instruction selection. llvm-svn: 147428	2012-01-02 08:46:48 +00:00
Nadav Rotem	6c7a0e6c8b	Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426	2012-01-02 08:05:46 +00:00
Craig Topper	6e54ba7eee	Merge X86 SHUFPS and SHUFPD node types. llvm-svn: 147394	2011-12-31 23:50:21 +00:00
Craig Topper	0fdf720ded	Make LowerBUILD_VECTOR keep node vector types consistent when creating MOVL for v16i16 and v32i8. llvm-svn: 147337	2011-12-29 03:34:54 +00:00
Craig Topper	862c9b65be	Remove some elses after returns. llvm-svn: 147336	2011-12-29 03:20:51 +00:00
Craig Topper	274e20a499	Remove trailing spaces. Fix an assert to use && instead of \|\| before string. Add same assert on similar code path. llvm-svn: 147335	2011-12-29 03:09:33 +00:00
Eli Friedman	3a01ddb7e9	Fix type-checking for load transformation which is not legal on floating-point types. PR11674. llvm-svn: 147323	2011-12-28 21:24:44 +00:00
Elena Demikhovsky	b3515a8d4b	Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. Matching MOVLP mask for AVX (265-bit vectors) was wrong. The failure was detected by conformance tests. llvm-svn: 147308	2011-12-28 08:14:01 +00:00
Craig Topper	df34d152bd	Add handling of x86_avx2_pmovmskb to computeMaskedBitsForTargetNode for consistency. Add comments and an assert for BMI instructions to PerformXorCombine since the enabling of the combine is conditional on it, but the function itself isn't. llvm-svn: 147287	2011-12-27 06:27:23 +00:00
Chandler Carruth	a3d54fe0ae	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	38ce24455d	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Chandler Carruth	c9fcde2347	Expand more when we have a nice 'tzcnt' instruction, to avoid generating 'bsf' instructions here. This one is actually debatable to my eyes. It's not clear that any chip implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding. Still, this restores the old behavior with 'tzcnt' enabled for now. llvm-svn: 147246	2011-12-24 11:11:38 +00:00
Chandler Carruth	7e9453e916	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chad Rosier	00bbedff03	Fix 80-column violations. llvm-svn: 147192	2011-12-22 22:35:21 +00:00
Chad Rosier	3ede414127	No case stmt for BUILD_VECTOR in PerformDAGCombine(), so I assume this isn't necessary. Please chime in if I'm mistaken. llvm-svn: 147065	2011-12-21 19:14:52 +00:00
Chandler Carruth	24680c24d8	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Benjamin Kramer	1b54835a10	Another variadics tweak. llvm-svn: 146852	2011-12-18 20:51:31 +00:00
Benjamin Kramer	530b820500	Use the fancy new VariadicFunction template instead of a plain variadic function. Some compilers were complaining about passing StringRef to it. llvm-svn: 146850	2011-12-18 19:59:20 +00:00
Craig Topper	a913dde0ef	Remove an unused X86ISD node type. llvm-svn: 146833	2011-12-17 19:16:44 +00:00
Benjamin Kramer	792edd3c75	X86: Factor the bswap asm matching to be slightly less horrible to read. llvm-svn: 146831	2011-12-17 14:36:05 +00:00
Lang Hames	da07b3ad42	Make sure that the lower bits on the VSELECT condition are properly set. llvm-svn: 146800	2011-12-17 01:08:46 +00:00
Craig Topper	a4d411cb1b	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00
Chad Rosier	75ed9dcbc6	Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX. Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>! llvm-svn: 146684	2011-12-15 21:34:44 +00:00
Lang Hames	c44b5e469b	Fix VSELECT operand order. Was previously backwards, causing bogus vector shift results - <rdar://problem/10559581>. llvm-svn: 146671	2011-12-15 18:57:27 +00:00
Chad Rosier	b7a0b89ff0	Use SmallVector/assign(), rather than std::vector/push_back(). llvm-svn: 146627	2011-12-15 01:16:09 +00:00
Chad Rosier	1940baa76b	Add support for lowering fneg when AVX is enabled. rdar://10566486 llvm-svn: 146625	2011-12-15 01:02:25 +00:00
Chandler Carruth	637cc6a8aa	Initial CodeGen support for CTTZ/CTLZ where a zero input produces an undefined result. This adds new ISD nodes for the new semantics, selecting them when the LLVM intrinsic indicates that the undef behavior is desired. The new nodes expand trivially to the old nodes, so targets don't actually need to do anything to support these new nodes besides indicating that they should be expanded. I've done this for all the operand types that I could figure out for all the targets. Owners of various targets, please review and let me know if any of these are incorrect. Note that the expand behavior is conservatively correct, and exactly matches LLVM's current behavior with these operations. Ideally this patch will not change behavior in any way. For example the regtest suite finds the exact same instruction sequences coming out of the code generator. That's why there are no new tests here -- all of this is being exercised by the existing test suite. Thanks to Duncan Sands for reviewing the various bits of this patch and helping me get the wrinkles ironed out with expanding for each target. Also thanks to Chris for clarifying through all the discussions that this is indeed the approach he was looking for. That said, there are likely still rough spots. Further review much appreciated. llvm-svn: 146466	2011-12-13 01:56:10 +00:00
Craig Topper	1fdfec63a4	Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast. llvm-svn: 146344	2011-12-11 19:12:35 +00:00
Benjamin Kramer	16bbfbec66	X86: Add patterns for the various rounding ops for SSE4.1 and AVX. llvm-svn: 146257	2011-12-09 15:44:03 +00:00
Owen Anderson	57a7f41d5d	Don't explicitly marked libm rounding ops as legal on SSE4.1/AVX. There don't seem to be patterns for these, so I don't know why they were marked legal in the first place. Fixes failures caused by r146171. llvm-svn: 146180	2011-12-08 20:51:38 +00:00
Owen Anderson	0b9b9da6c8	Teach SelectionDAG to match more calls to libm functions onto existing SDNodes. Mark these nodes as illegal by default, unless the target declares otherwise. llvm-svn: 146171	2011-12-08 19:32:14 +00:00
Craig Topper	83320e03e6	Add X86ISD::HADD/HSUB to getTargetNodeName llvm-svn: 145929	2011-12-06 09:31:36 +00:00
Craig Topper	8d4ba198d6	Merge floating point and integer UNPCK X86ISD node types. llvm-svn: 145926	2011-12-06 08:21:25 +00:00
Craig Topper	3cb802c775	Clean up some of the shuffle decoding code for UNPCK instructions. Add instruction commenting for AVX/AVX2 forms for integer UNPCKs. llvm-svn: 145924	2011-12-06 05:31:16 +00:00
Craig Topper	bf41eb3a98	Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted. llvm-svn: 145921	2011-12-06 04:59:07 +00:00
Jakob Stoklund Olesen	10e1252269	Use logarithmic units for basic block alignment. This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly documented as taking log2(bytes) units, but the x86 target would still set a preferred loop alignment of '16'. CodePlacementOpt passed this number on to the basic block, and AsmPrinter interpreted it as bytes. Now both MachineFunction and MachineBasicBlock use logarithmic alignments. Obviously, MachineConstantPool still measures alignments in bytes, so we can emulate the thrill of using as. llvm-svn: 145889	2011-12-06 01:26:19 +00:00
Craig Topper	51bec1a37a	Remove some leftover remnants that once tried to create 64-bit MMX PALIGNR instructions. llvm-svn: 145804	2011-12-05 07:27:14 +00:00
Craig Topper	6a55b1dd9f	Clean up and optimizations to the X86 shuffle lowering code. No functional change. llvm-svn: 145803	2011-12-05 06:56:46 +00:00
Nick Lewycky	50f02cb21b	Move global variables in TargetMachine into new TargetOptions class. As an API change, now you need a TargetOptions object to create a TargetMachine. Clang patch to follow. One small functionality change in PTX. PTX had commented out the machine verifier parts in their copy of printAndVerify. That now calls the version in LLVMTargetMachine. Users of PTX who need verification disabled should rely on not passing the command-line flag to enable it. llvm-svn: 145714	2011-12-02 22:16:29 +00:00
Craig Topper	b67440367f	Reduce duplicate code in isHorizontalBinOp and add some asserts to protect assumptions llvm-svn: 145681	2011-12-02 08:18:41 +00:00
Craig Topper	abeb79eee3	Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors. llvm-svn: 145680	2011-12-02 07:16:01 +00:00
Nadav Rotem	96923cc2bb	X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479). llvm-svn: 145488	2011-11-30 10:13:37 +00:00
Craig Topper	c4977ba413	Add instruction selection support for AVX2 horizontal add/sub instructions. llvm-svn: 145487	2011-11-30 09:10:50 +00:00
Craig Topper	0a672eaf9e	Merge VPERM2F128/VPERM2I128 ISD node types. llvm-svn: 145485	2011-11-30 07:47:51 +00:00
Craig Topper	bafd224c8b	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. llvm-svn: 145483	2011-11-30 06:25:25 +00:00
Craig Topper	c16db840be	Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD. llvm-svn: 145390	2011-11-29 07:49:05 +00:00
Craig Topper	818a983e93	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. llvm-svn: 145238	2011-11-28 10:14:51 +00:00
Craig Topper	b0456936da	Make isCommutedVSHUFP more like the way isCommutedSHUFP is handled. llvm-svn: 145218	2011-11-28 01:14:24 +00:00
Craig Topper	79ee88a511	Merge detecting and handling for VSHUFPSY and VSHUFPDY since a lot of the code was similar for both. llvm-svn: 145199	2011-11-27 21:41:12 +00:00
Craig Topper	51280d565b	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. llvm-svn: 145153	2011-11-26 22:55:48 +00:00
Craig Topper	7704bd7ac3	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. llvm-svn: 145148	2011-11-26 20:47:44 +00:00
Craig Topper	d65a444478	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. llvm-svn: 145126	2011-11-24 22:57:10 +00:00
Craig Topper	d26466748b	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. llvm-svn: 145125	2011-11-24 22:20:08 +00:00
Benjamin Kramer	ebcb451874	X86: Use btq for bit tests if the immediate can't be encoded in 32 bits. Before: movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00] testq %rax, %rdi ## encoding: [0x48,0x85,0xf8] jne LBB0_2 ## encoding: [0x75,A] After: btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20] jb LBB0_2 ## encoding: [0x72,A] btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off saving one register and a giant movabsq. llvm-svn: 145103	2011-11-23 13:54:17 +00:00
Elena Demikhovsky	779ba6d7b7	I added several lines in X86 code generator that allow to choose VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask. The patch was reviewed by Bruno. llvm-svn: 145099	2011-11-23 10:23:16 +00:00
Craig Topper	ccb7097509	Fix shuffle decoding logic to handle UNPCKLPS/UNPCKLPD on 256-bit vectors correctly. Add support for decoding UNPCKHPS/UNPCKHPD for AVX 128-bit and 256-bit forms. llvm-svn: 145055	2011-11-22 01:57:35 +00:00
Craig Topper	f563977795	Add methods for querying minimum SSE version along with AVX. Simplifies all the places that had to check a version of SSE and AVX. llvm-svn: 145053	2011-11-22 00:44:41 +00:00
Craig Topper	6270d072c5	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. llvm-svn: 145028	2011-11-21 08:26:50 +00:00
Craig Topper	669199ca94	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. llvm-svn: 145026	2011-11-21 06:57:39 +00:00
Craig Topper	a065238c6e	Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled. llvm-svn: 145022	2011-11-21 01:12:36 +00:00
Craig Topper	e79761df73	Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine. llvm-svn: 145005	2011-11-20 00:12:05 +00:00
Craig Topper	a3a6583694	Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled. llvm-svn: 145004	2011-11-19 22:34:59 +00:00
Craig Topper	3af6ae089f	Custom lower AVX2 variable shift intrinsics to shl/srl/sra nodes and remove the intrinsic patterns. llvm-svn: 144999	2011-11-19 17:46:46 +00:00
Craig Topper	f984efbfce	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. llvm-svn: 144989	2011-11-19 09:02:40 +00:00
Craig Topper	81390be00f	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. llvm-svn: 144988	2011-11-19 07:33:10 +00:00
Craig Topper	de6b73bb4d	Extend VPBLENDVB and VPSIGN lowering to work for AVX2. llvm-svn: 144987	2011-11-19 07:07:26 +00:00
Nadav Rotem	1ec141d0f9	Add AVX2 vpbroadcast support llvm-svn: 144967	2011-11-18 02:49:55 +00:00
Nadav Rotem	37010002f2	AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code. llvm-svn: 144720	2011-11-15 22:50:37 +00:00
Pete Cooper	7c7ba1baa1	Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used by later instructions. Only done for DEC64m right now. Fixes <rdar://problem/6172640> llvm-svn: 144705	2011-11-15 21:57:53 +00:00
Jay Foad	0745e645e0	Remove some unnecessary includes of PseudoSourceValue.h. llvm-svn: 144631	2011-11-15 07:24:32 +00:00
Pete Cooper	890e02e854	Changed SSE4/AVX <2 x i64> extract and insert ops to be Custom lowered Constant idx case is still done in tablegen but other cases are then expanded Fixes <rdar://problem/10435460> llvm-svn: 144557	2011-11-14 19:38:42 +00:00
Craig Topper	a331515c82	Add neverHasSideEffects, mayLoad, and mayStore to many patternless SSE/AVX instructions. Remove MMX check from LowerVECTOR_SHUFFLE since MMX vector types won't go through it anyway. llvm-svn: 144522	2011-11-14 06:46:21 +00:00
Craig Topper	b8bcb473e2	Add BLSI, BLSMSK, and BLSR to getTargetNodeName. llvm-svn: 144502	2011-11-13 17:31:07 +00:00
Craig Topper	3dc75f9e3b	Add more AVX2 shift lowering support. Move AVX2 variable shift to use patterns instead of custom lowering code. llvm-svn: 144457	2011-11-12 09:58:49 +00:00
Craig Topper	ea28a34c43	Add lowering for AVX2 shift instructions. llvm-svn: 144380	2011-11-11 07:39:23 +00:00
Nadav Rotem	1938482bfa	AVX2: Add patterns for variable shift operations llvm-svn: 144212	2011-11-09 21:22:13 +00:00
Nadav Rotem	79135d844d	Add AVX2 support for vselect of v32i8 llvm-svn: 144187	2011-11-09 13:21:28 +00:00
Craig Topper	c9eb09d3b8	Add instruction selection for AVX2 integer comparisons. llvm-svn: 144176	2011-11-09 08:06:13 +00:00
Craig Topper	8c8a431057	Add AVX2 instruction lowering for add, sub, and mul. llvm-svn: 144174	2011-11-09 07:28:55 +00:00
Pete Cooper	82cd9e81fc	Added invariant field to the DAG.getLoad method and changed all calls. When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses llvm-svn: 144100	2011-11-08 18:42:53 +00:00
Evan Cheng	91b56e0390	Add x86 isel logic and patterns to match movlps from clang generated IR for _mm_loadl_pi(). rdar://10134392, rdar://10050222 llvm-svn: 144052	2011-11-08 00:31:58 +00:00
Dan Gohman	198b7ffc11	Reapply r143206, with fixes. Disallow physical register lifetimes across calls, and only check for nested dependences on the special call-sequence-resource register. llvm-svn: 143660	2011-11-03 21:49:52 +00:00
Eli Friedman	3f5eccbe7a	Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289. llvm-svn: 143498	2011-11-01 21:18:39 +00:00
Benjamin Kramer	7402ee6ec2	X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in. llvm-svn: 143315	2011-10-30 17:31:21 +00:00
Nadav Rotem	c602b2c4de	Fix pr11266. On x86: (shl V, 1) -> add V,V Hardware support for vector-shift is sparse and in many cases we scalarize the result. Additionally, on sandybridge padd is faster than shl. llvm-svn: 143311	2011-10-30 13:24:22 +00:00
Dan Gohman	9b9c970148	Revert r143206, as there are still some failing tests. llvm-svn: 143262	2011-10-29 00:41:52 +00:00
Dan Gohman	73057ad24f	Reapply r143177 and r143179 (reverting r143188), with scheduler fixes: Use a separate register, instead of SP, as the calling-convention resource, to avoid spurious conflicts with actual uses of SP. Also, fix unscheduling of calling sequences, which can be triggered by pseudo-two-address dependencies. llvm-svn: 143206	2011-10-28 17:55:38 +00:00
Duncan Sands	225a7037d6	Speculatively disable Dan's commits 143177 and 143179 to see if it fixes the dragonegg self-host (it looks like gcc is miscompiled). Original commit messages: Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW on every node as it legalizes them. This makes it easier to use hasOneUse() heuristics, since unneeded nodes can be removed from the DAG earlier. Make LegalizeOps visit the DAG in an operands-last order. It previously used operands-first, because LegalizeTypes has to go operands-first, and LegalizeTypes used to be part of LegalizeOps, but they're now split. The operands-last order is more natural for several legalization tasks. For example, it allows lowering code for nodes with floating-point or vector constants to see those constants directly instead of seeing the lowered form (often constant-pool loads). This makes some things somewhat more complicated today, though it ought to allow things to be simpler in the future. It also fixes some bugs exposed by Legalizing using RAUW aggressively. Remove the part of LegalizeOps that attempted to patch up invalid chain operands on libcalls generated by LegalizeTypes, since it doesn't work with the new LegalizeOps traversal order. Instead, define what LegalizeTypes is doing to be correct, and transfer the responsibility of keeping calls from having overlapping calling sequences into the scheduler. Teach the scheduler to model callseq_begin/end pairs as having a physical register definition/use to prevent calls from having overlapping calling sequences. This is also somewhat complicated, though there are ways it might be simplified in the future. This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others. Please direct high-level questions about this patch to management. Delete #if 0 code accidentally left in. llvm-svn: 143188	2011-10-28 09:55:57 +00:00
Dan Gohman	4db3f7dd83	Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW on every node as it legalizes them. This makes it easier to use hasOneUse() heuristics, since unneeded nodes can be removed from the DAG earlier. Make LegalizeOps visit the DAG in an operands-last order. It previously used operands-first, because LegalizeTypes has to go operands-first, and LegalizeTypes used to be part of LegalizeOps, but they're now split. The operands-last order is more natural for several legalization tasks. For example, it allows lowering code for nodes with floating-point or vector constants to see those constants directly instead of seeing the lowered form (often constant-pool loads). This makes some things somewhat more complicated today, though it ought to allow things to be simpler in the future. It also fixes some bugs exposed by Legalizing using RAUW aggressively. Remove the part of LegalizeOps that attempted to patch up invalid chain operands on libcalls generated by LegalizeTypes, since it doesn't work with the new LegalizeOps traversal order. Instead, define what LegalizeTypes is doing to be correct, and transfer the responsibility of keeping calls from having overlapping calling sequences into the scheduler. Teach the scheduler to model callseq_begin/end pairs as having a physical register definition/use to prevent calls from having overlapping calling sequences. This is also somewhat complicated, though there are ways it might be simplified in the future. This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others. Please direct high-level questions about this patch to management. llvm-svn: 143177	2011-10-28 01:29:32 +00:00
Lang Hames	58dba012b6	Rename NonScalarIntSafe to something more appropriate. llvm-svn: 143080	2011-10-26 23:50:43 +00:00
Rafael Espindola	b3285224cd	Fixes an issue reported by -verify-machineinstrs. Patch by Sanjoy Das. llvm-svn: 143064	2011-10-26 21:16:41 +00:00
Nadav Rotem	e649d66552	Fix pr11193. SHL inserts zeros from the right, thus even when the original sign_extend_inreg value was of 1-bit, we need to sra. llvm-svn: 142724	2011-10-22 12:39:25 +00:00
Craig Topper	039a79067a	Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code. llvm-svn: 142642	2011-10-21 06:55:01 +00:00
Evan Cheng	54d678fff4	Fix TLS lowering bug. The CopyFromReg must be glued to the TLSCALL. rdar://10291355 llvm-svn: 142550	2011-10-19 22:22:54 +00:00
Duncan Sands	d278d35b13	Fix a bunch of unused variable warnings when doing a release build with gcc-4.6. llvm-svn: 142350	2011-10-18 12:44:00 +00:00
Benjamin Kramer	5fb5e3b384	SmallVector -> array llvm-svn: 142073	2011-10-15 13:28:31 +00:00
Craig Topper	965de2c197	Add X86 ANDN instruction. Including instruction selection. llvm-svn: 141947	2011-10-14 07:06:56 +00:00
Craig Topper	3657fe4b17	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141939	2011-10-14 03:21:46 +00:00
Bill Wendling	063f55ffdd	Revert r141854 because it was causing failures: http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101 --- Reverse-merging r141854 into '.': U test/MC/Disassembler/X86/x86-32.txt U test/MC/Disassembler/X86/simple-tests.txt D test/CodeGen/X86/bmi.ll U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86.td U lib/Target/X86/X86Subtarget.h llvm-svn: 141857	2011-10-13 07:48:07 +00:00
Craig Topper	8cc9388073	Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell. llvm-svn: 141854	2011-10-13 07:09:14 +00:00
Craig Topper	271064e873	Add X86 LZCNT instruction. Including instruction selection support. llvm-svn: 141651	2011-10-11 06:44:02 +00:00
Eli Friedman	8ec0897db6	Make sure the X86 backend doesn't explode on 128-bit shuffles in AVX mode. Fixes PR11102. llvm-svn: 141585	2011-10-10 22:28:47 +00:00
Nadav Rotem	814598563f	Fix 10892 - When lowering SIGN_EXTEND_INREG do not lower v2i64 because the instruction set has no 64-bit SRA support. llvm-svn: 141570	2011-10-10 19:31:45 +00:00
Evan Cheng	74db300f37	High bits of movmskp{s\|d} and pmovmskb are known zero. rdar://10247336 llvm-svn: 141371	2011-10-07 17:21:44 +00:00
Eli Friedman	2fb357a5b0	PR11033: Make sure we don't generate PCMPGTQ and PCMPEQQ if the target CPU does not support them. llvm-svn: 140723	2011-09-28 21:00:25 +00:00

... 8 9 10 11 12 ...

2664 Commits