c53dd2ac01 
								
							 
						 
						
							
							
								
								Fix comment!  
							
							... 
							
							
							
							llvm-svn: 137521 
							
						 
						
							2011-08-12 21:54:42 +00:00  
				
					
						
							
							
								 
						
							
								f15dfe5818 
								
							 
						 
						
							
							
								
								The VPERM2F128 is a AVX instruction which permutes between two 256-bit  
							
							... 
							
							
							
							vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519 
							
						 
						
							2011-08-12 21:48:26 +00:00  
				
					
						
							
							
								 
						
							
								8fbf023c9b 
								
							 
						 
						
							
							
								
								Add a dag combine to xform 256-bit shuffles into simple vector  
							
							... 
							
							
							
							inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
llvm-svn: 137362 
							
						 
						
							2011-08-11 21:50:44 +00:00  
				
					
						
							
							
								 
						
							
								043c820800 
								
							 
						 
						
							
							
								
								Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.  
							
							... 
							
							
							
							llvm-svn: 137324 
							
						 
						
							2011-08-11 18:59:13 +00:00  
				
					
						
							
							
								 
						
							
								efdd183f52 
								
							 
						 
						
							
							
								
								Add a comment, per Bruno's CR.  
							
							... 
							
							
							
							llvm-svn: 137313 
							
						 
						
							2011-08-11 17:05:47 +00:00  
				
					
						
							
							
								 
						
							
								1542d5a00a 
								
							 
						 
						
							
							
								
								[AVX] If the data which is going to be saved is already in two XMM registers  
							
							... 
							
							
							
							(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)
After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)
llvm-svn: 137308 
							
						 
						
							2011-08-11 16:41:21 +00:00  
				
					
						
							
							
								 
						
							
								a2d8bb97b9 
								
							 
						 
						
							
							
								
								Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing  
							
							... 
							
							
							
							infinite recursive calls in legalize. Fix PR10562
llvm-svn: 137296 
							
						 
						
							2011-08-11 02:49:44 +00:00  
				
					
						
							
							
								 
						
							
								572c9aaf53 
								
							 
						 
						
							
							
								
								Use the splat index to generate the desired shuffle. Otherwise we  
							
							... 
							
							
							
							could only get undefs and the vector shuffle becomes an undef,
generating wrong code.
llvm-svn: 137295 
							
						 
						
							2011-08-11 02:49:41 +00:00  
				
					
						
							
							
								 
						
							
								3ae39f8ad1 
								
							 
						 
						
							
							
								
								Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases.  This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).  
							
							... 
							
							
							
							Fixes PR9693.
llvm-svn: 137292 
							
						 
						
							2011-08-11 01:48:05 +00:00  
				
					
						
							
							
								 
						
							
								410a11fe82 
								
							 
						 
						
							
							
								
								When performing a truncating store, it is sometimes possible to rearrange the  
							
							... 
							
							
							
							data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
llvm-svn: 137238 
							
						 
						
							2011-08-10 19:30:14 +00:00  
				
					
						
							
							
								 
						
							
								278ffd7d8e 
								
							 
						 
						
							
							
								
								Fix a bug in vpermilps mask checking. Fix PR10560  
							
							... 
							
							
							
							llvm-svn: 137194 
							
						 
						
							2011-08-10 01:54:17 +00:00  
				
					
						
							
							
								 
						
							
								72323966c8 
								
							 
						 
						
							
							
								
								Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556  
							
							... 
							
							
							
							llvm-svn: 137179 
							
						 
						
							2011-08-09 23:27:13 +00:00  
				
					
						
							
							
								 
						
							
								6963062a99 
								
							 
						 
						
							
							
								
								Use fp unpack instructions to unpack int types. Until we have AVX2, this  
							
							... 
							
							
							
							is the best we can do for these patterns. This fix PR10554.
llvm-svn: 137161 
							
						 
						
							2011-08-09 22:18:37 +00:00  
				
					
						
							
							
								 
						
							
								24dd1d4a27 
								
							 
						 
						
							
							
								
								Revert r137114  
							
							... 
							
							
							
							llvm-svn: 137127 
							
						 
						
							2011-08-09 17:39:01 +00:00  
				
					
						
							
							
								 
						
							
								ad3453cf2d 
								
							 
						 
						
							
							
								
								Handle sitofp between v4f64 <- v4i32. Fix PR10559  
							
							... 
							
							
							
							llvm-svn: 137114 
							
						 
						
							2011-08-09 05:48:01 +00:00  
				
					
						
							
							
								 
						
							
								af6a85484c 
								
							 
						 
						
							
							
								
								Make LowerVSETCC aware of AVX types and add patterns to match them.  
							
							... 
							
							
							
							llvm-svn: 137090 
							
						 
						
							2011-08-09 00:46:57 +00:00  
				
					
						
							
							
								 
						
							
								c96953c12a 
								
							 
						 
						
							
							
								
								Add support for several vector shifts operations while in AVX mode. Fix PR10581  
							
							... 
							
							
							
							llvm-svn: 137067 
							
						 
						
							2011-08-08 21:31:08 +00:00  
				
					
						
							
							
								 
						
							
								19e3f80579 
								
							 
						 
						
							
							
								
								Fix an obvious type. Patch by Ivan Krasin.  
							
							... 
							
							
							
							llvm-svn: 136899 
							
						 
						
							2011-08-04 18:38:15 +00:00  
				
					
						
							
							
								 
						
							
								e234f6ae0c 
								
							 
						 
						
							
							
								
								Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.  
							
							... 
							
							
							
							Fixes PR10527.
llvm-svn: 136853 
							
						 
						
							2011-08-04 00:32:58 +00:00  
				
					
						
							
							
								 
						
							
								103e2ec2df 
								
							 
						 
						
							
							
								
								Remove unused variables.  
							
							... 
							
							
							
							llvm-svn: 136803 
							
						 
						
							2011-08-03 19:53:48 +00:00  
				
					
						
							
							
								 
						
							
								04c5025cd5 
								
							 
						 
						
							
							
								
								Don't create a ridiculous EXTRACT_ELEMENT.  PR10563.  
							
							... 
							
							
							
							The testcase looks extremely fragile, so I'm adding an assertion which should catch any cases like this.
llvm-svn: 136711 
							
						 
						
							2011-08-02 18:38:35 +00:00  
				
					
						
							
							
								 
						
							
								5ada908140 
								
							 
						 
						
							
							
								
								Make this kind of lowering to be supported by 256-bit instructions:  
							
							... 
							
							
							
							shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
  shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494
llvm-svn: 136691 
							
						 
						
							2011-08-02 16:06:18 +00:00  
				
					
						
							
							
								 
						
							
								a8e3673816 
								
							 
						 
						
							
							
								
								Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise  
							
							... 
							
							
							
							the legalizer. This commit together with the two previous ones fixes
PR10495.
llvm-svn: 136654 
							
						 
						
							2011-08-01 21:54:09 +00:00  
				
					
						
							
							
								 
						
							
								616fe60548 
								
							 
						 
						
							
							
								
								Teach PreprocessISelDAG to be aware of vector types and to not process them.  
							
							... 
							
							
							
							llvm-svn: 136653 
							
						 
						
							2011-08-01 21:54:05 +00:00  
				
					
						
							
							
								 
						
							
								bd30a4b584 
								
							 
						 
						
							
							
								
								Lower CONCAT_VECTORS to use two VINSERTF128 instructions instead of  
							
							... 
							
							
							
							using a stack store.
llvm-svn: 136652 
							
						 
						
							2011-08-01 21:54:02 +00:00  
				
					
						
							
							
								 
						
							
								7513939ddd 
								
							 
						 
						
							
							
								
								Since vectors with all ones can't be created with a 256-bit instruction,  
							
							... 
							
							
							
							avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547
llvm-svn: 136642 
							
						 
						
							2011-08-01 19:51:53 +00:00  
				
					
						
							
							
								 
						
							
								adec587d5c 
								
							 
						 
						
							
							
								
								Misc optimizer+codegen work for 'cmpxchg' and 'atomicrmw'.  They appear to be  
							
							... 
							
							
							
							working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457 
							
						 
						
							2011-07-29 03:05:32 +00:00  
				
					
						
							
							
								 
						
							
								65ce5ea3ba 
								
							 
						 
						
							
							
								
								Fix two tests that I crashed in the previous commits. The mask elts  
							
							... 
							
							
							
							on the second half must be reindexed.
llvm-svn: 136454 
							
						 
						
							2011-07-29 02:05:28 +00:00  
				
					
						
							
							
								 
						
							
								81eb193f2e 
								
							 
						 
						
							
							
								
								Match VPERMIL masks more strictly and update the target specific mask  
							
							... 
							
							
							
							generation to always catch the weird cases.
llvm-svn: 136453 
							
						 
						
							2011-07-29 01:31:15 +00:00  
				
					
						
							
							
								 
						
							
								795f558532 
								
							 
						 
						
							
							
								
								Add DecodeShuffle shuffle support for VPERMIPD variantes  
							
							... 
							
							
							
							llvm-svn: 136452 
							
						 
						
							2011-07-29 01:31:11 +00:00  
				
					
						
							
							
								 
						
							
								c00f6728bc 
								
							 
						 
						
							
							
								
								Fix a bug while generating target specific VPERMIL masks: skip  
							
							... 
							
							
							
							undef mask elements. This fixes PR10529.
llvm-svn: 136450 
							
						 
						
							2011-07-29 01:31:04 +00:00  
				
					
						
							
							
								 
						
							
								b9ba465de8 
								
							 
						 
						
							
							
								
								Enable usage of SSE4 extracts and inserts in their 128-bit AVX forms.  
							
							... 
							
							
							
							Also tidy up code a bit.
llvm-svn: 136449 
							
						 
						
							2011-07-29 01:31:02 +00:00  
				
					
						
							
							
								 
						
							
								6aee388423 
								
							 
						 
						
							
							
								
								Cleanup PALIGNR handling and remove the old palign pattern fragment.  
							
							... 
							
							
							
							Also make PALIGNR masks to don't match 256-bits, which isn't supported
It's also a step to solve PR10489
llvm-svn: 136448 
							
						 
						
							2011-07-29 01:30:59 +00:00  
				
					
						
							
							
								 
						
							
								8c19a8b5d5 
								
							 
						 
						
							
							
								
								Invert the subvector insertion to be more likely to be taken as a COPY  
							
							... 
							
							
							
							llvm-svn: 136324 
							
						 
						
							2011-07-28 01:26:53 +00:00  
				
					
						
							
							
								 
						
							
								9e2a301216 
								
							 
						 
						
							
							
								
								Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move  
							
							... 
							
							
							
							a convert pattern close to the instruction definition.
llvm-svn: 136320 
							
						 
						
							2011-07-28 01:26:39 +00:00  
				
					
						
							
							
								 
						
							
								26a484852e 
								
							 
						 
						
							
							
								
								Code generation for 'fence' instruction.  
							
							... 
							
							
							
							llvm-svn: 136283 
							
						 
						
							2011-07-27 22:21:52 +00:00  
				
					
						
							
							
								 
						
							
								6381c0100b 
								
							 
						 
						
							
							
								
								Explicitly cast narrowing conversions inside {}s that will become errors in  
							
							... 
							
							
							
							C++0x.
llvm-svn: 136211 
							
						 
						
							2011-07-27 06:22:51 +00:00  
				
					
						
							
							
								 
						
							
								f9324f4f6b 
								
							 
						 
						
							
							
								
								Move some code around to open opportunity for more shuffle matching  
							
							... 
							
							
							
							llvm-svn: 136201 
							
						 
						
							2011-07-27 00:56:37 +00:00  
				
					
						
							
							
								 
						
							
								27a30a7792 
								
							 
						 
						
							
							
								
								The vpermilps and vpermilpd have different behaviour regarding the  
							
							... 
							
							
							
							usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.
llvm-svn: 136200 
							
						 
						
							2011-07-27 00:56:34 +00:00  
				
					
						
							
							
								 
						
							
								124ac2b997 
								
							 
						 
						
							
							
								
								Add a neat little two's complement hack for x86.  
							
							... 
							
							
							
							On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.
This code is generated for __builtin_clz and friends.
llvm-svn: 136167 
							
						 
						
							2011-07-26 22:42:13 +00:00  
				
					
						
							
							
								 
						
							
								f8fe47bd2b 
								
							 
						 
						
							
							
								
								Recognize unpckh* masks and match 256-bit versions. The new versions are  
							
							... 
							
							
							
							different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
llvm-svn: 136157 
							
						 
						
							2011-07-26 22:03:40 +00:00  
				
					
						
							
							
								 
						
							
								93dc04d5ca 
								
							 
						 
						
							
							
								
								Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected).  Fixes a minor isel issue that was breaking the testcase from r136130.  
							
							... 
							
							
							
							llvm-svn: 136148 
							
						 
						
							2011-07-26 21:02:58 +00:00  
				
					
						
							
							
								 
						
							
								d77b383199 
								
							 
						 
						
							
							
								
								More movsldup/movshdup cleanup. Rewrite the mask matching function and add  
							
							... 
							
							
							
							support for 256-bit versions (but no instruction selection yet, coming next).
llvm-svn: 136050 
							
						 
						
							2011-07-26 02:39:28 +00:00  
				
					
						
							
							
								 
						
							
								5b268a4b82 
								
							 
						 
						
							
							
								
								More cleanup, subtarget info isn't used here.  
							
							... 
							
							
							
							llvm-svn: 136049 
							
						 
						
							2011-07-26 02:39:25 +00:00  
				
					
						
							
							
								 
						
							
								9212bf275d 
								
							 
						 
						
							
							
								
								Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128  
							
							... 
							
							
							
							This also fixes PR10452
llvm-svn: 136004 
							
						 
						
							2011-07-25 23:05:32 +00:00  
				
					
						
							
							
								 
						
							
								123dff0f58 
								
							 
						 
						
							
							
								
								- Handle special scalar_to_vector case: splats. Using a native 128-bit  
							
							... 
							
							
							
							shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.
llvm-svn: 136002 
							
						 
						
							2011-07-25 23:05:25 +00:00  
				
					
						
							
							
								 
						
							
								276eb8debf 
								
							 
						 
						
							
							
								
								Reintroduce r135730, this is indeed the right approach, there is no  
							
							... 
							
							
							
							native 256-bit vector instruction to do scalar_to_vector.
llvm-svn: 136001 
							
						 
						
							2011-07-25 23:05:16 +00:00  
				
					
						
							
							
								 
						
							
								ea8c66fea5 
								
							 
						 
						
							
							
								
								Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask.  
							
							... 
							
							
							
							Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.
llvm-svn: 135980 
							
						 
						
							2011-07-25 21:36:45 +00:00  
				
					
						
							
							
								 
						
							
								77242dd537 
								
							 
						 
						
							
							
								
								Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64  
							
							... 
							
							
							
							too. Patch by Jeff Muizelaar.
llvm-svn: 135789 
							
						 
						
							2011-07-22 18:56:05 +00:00  
				
					
						
							
							
								 
						
							
								c535278cf1 
								
							 
						 
						
							
							
								
								Fix x86's XALUO lowering to return its replacement values instead  
							
							... 
							
							
							
							of doing the RAUW calls for the overflow value itself. This makes
it more consistent with how the rest of LegalizeDAG works.
llvm-svn: 135788 
							
						 
						
							2011-07-22 18:45:15 +00:00