llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	155193c3aa	[x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701) This extends the fold introduced with: http://reviews.llvm.org/rL262036 llvm-svn: 262047	2016-02-26 18:42:50 +00:00
Sanjay Patel	4402a32b32	[x86, SSE] fold 'isPositive' vector integer operations (PR26701) This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26701 Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. Note that the 'pcmpeq' instructions are always generating the negative one for the actual 'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?). Differential Revision: http://reviews.llvm.org/D17630 llvm-svn: 262036	2016-02-26 16:56:03 +00:00
Simon Pilgrim	e4178ae510	[X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles Now that PerformShuffleCombine can handle unary shuffles. llvm-svn: 261843	2016-02-25 09:12:12 +00:00
Simon Pilgrim	3b6feeaa7c	[X86][SSE41] Combine vector blends with zero Part 2 of 2 This patch add support for combining target shuffles into blends-with-zero. Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261745	2016-02-24 15:14:21 +00:00
Simon Pilgrim	dd01f70085	[X86][SSE41] Combine insertion of zero scalars into vector blends with zero Part 1 of 2 This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets). (Part 2 will add support for combining multiple blends-with-zero). Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261743	2016-02-24 14:53:27 +00:00
Simon Pilgrim	c291c03702	[X86][SSE] Don't get target shuffle operands prematurely. PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement. llvm-svn: 261727	2016-02-24 09:07:47 +00:00
Davide Italiano	62b7f7a398	[X86ISelLowering] Stop typing the same return over and over and over. llvm-svn: 261666	2016-02-23 18:39:38 +00:00
Davide Italiano	2ec4717c2c	[X86ISelLowering] Consolidate duplicated code in a single place. llvm-svn: 261573	2016-02-22 21:06:46 +00:00
Simon Pilgrim	e9093adae0	[X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements. Differential Revision: http://reviews.llvm.org/D17297 llvm-svn: 261490	2016-02-21 19:15:48 +00:00
Simon Pilgrim	ecb0433599	[X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667) Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input. llvm-svn: 261434	2016-02-20 14:39:45 +00:00
Simon Pilgrim	ccf2cce67c	[X86][SSE] Move all undef/zero cases before target shuffle combining. First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041). llvm-svn: 261433	2016-02-20 12:57:32 +00:00
Davide Italiano	228978c0dc	[X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled. TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387	2016-02-20 00:44:47 +00:00
Davide Italiano	a8f1f2efaf	[X86ISelLowering] Provide a more informative assert message. I stumbled upon this while debugging a lowering bug. llvm-svn: 261371	2016-02-19 22:18:49 +00:00
Davide Italiano	4cfe2a9e38	[X86ISelLowering] Merge two conditions inside a single if. llvm-svn: 261370	2016-02-19 22:01:07 +00:00
Sanjay Patel	0adbea4b5c	[x86] fix initialization of PredictableSelectIsExpensive This is effectively NFC because Atom is the only in-order x86 subtarget currently, but the predicate would have become wrong if any other in-order CPU came along. See related discussion in: http://reviews.llvm.org/D16836 llvm-svn: 261275	2016-02-18 23:08:48 +00:00
Davide Italiano	440a676136	[X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC. llvm-svn: 261255	2016-02-18 20:43:15 +00:00
Hans Wennborg	23cdc643b9	Revert to extend i8/i16 return values on Darwin (PR26665) In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235	2016-02-18 18:17:05 +00:00
Igor Breger	ac02f1bb62	AVX512: Fix LowerMSCATTER() return value. Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090	2016-02-17 14:04:33 +00:00
Simon Pilgrim	c5b5dcb985	[X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082	2016-02-17 10:50:06 +00:00
Simon Pilgrim	a50e8d3627	[X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081	2016-02-17 10:37:49 +00:00
Simon Pilgrim	9904924e6b	[X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI. Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow. Renamed from V to Ops to more closely match their purpose. llvm-svn: 261078	2016-02-17 10:12:30 +00:00
Ahmed Bougacha	f3cccab1e0	[X86] Remove the now-unused X86ISD::PSIGN. NFC. llvm-svn: 261025	2016-02-16 22:14:12 +00:00
Ahmed Bougacha	af60a429c9	[X86] Generalize logic blend of (x, -x) combine to match (-x, x). I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024	2016-02-16 22:14:07 +00:00
Ahmed Bougacha	132fbf5476	[X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN. Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) \| (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023	2016-02-16 22:14:03 +00:00
Ahmed Bougacha	f211f685e4	[X86] Extract PSIGN/BLENDVP combine. NFC. llvm-svn: 261021	2016-02-16 22:13:55 +00:00
Ahmed Bougacha	7502768c6d	[X86] Extract ANDNP combine. NFC. This makes it IMO more readable and reduces indentation. llvm-svn: 261020	2016-02-16 22:13:49 +00:00
Ahmed Bougacha	71c992d853	[X86] Remove now-dead variable and redundant assert. NFC. The variable was made dead in NDEBUG by r260901, but the assert was redundant anyway: get rid of both. llvm-svn: 260908	2016-02-15 19:32:54 +00:00
Ahmed Bougacha	93cff7fb82	[CodeGen] Document and use getConstant's splat-building feature. NFC. Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901	2016-02-15 18:07:29 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Sanjay Patel	ac42fecf74	[x86] simplify getZeroVector() ; NFCI Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. See also: http://reviews.llvm.org/rL258833 http://reviews.llvm.org/rL260582 llvm-svn: 260609	2016-02-11 22:17:04 +00:00
Sanjay Patel	d76d4aabdd	[x86] refactor masked load/store combine logic ; NFCI llvm-svn: 260426	2016-02-10 20:02:45 +00:00
JF Bastien	08499a0110	X86: Remove useless semicolon llvm-svn: 260359	2016-02-10 04:04:12 +00:00
Sanjay Patel	c7dde5f502	[x86] convert masked load of exactly one element to scalar load This is the load counterpart to the store optimization that was added in: http://reviews.llvm.org/rL260145 llvm-svn: 260325	2016-02-09 23:44:35 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Ahmed Bougacha	244cd98474	[X86] Don't reuse an unrelated variable, create a new one. NFC. Using Op makes it look like we're doing something with it. We're really not. llvm-svn: 260315	2016-02-09 22:54:05 +00:00
Ahmed Bougacha	46db084c71	[X86] Remove unnecessary assignment. NFC. llvm-svn: 260314	2016-02-09 22:53:58 +00:00
Sanjay Patel	73200f72de	[SelectionDAG] make getMemBasePlusOffset() accessible; NFCI I reinvented this functionality in http://reviews.llvm.org/D16828 because it was hidden away as a static function. The changes in x86 are not based on a complete audit. I suspect there are other possible uses there, and there are almost certainly more potential users in other targets. llvm-svn: 260295	2016-02-09 21:42:04 +00:00
Sanjay Patel	62dde825d8	[x86] make getOneTrueElt() a helper function ; NFC As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform will need this logic, so I'm moving it out to make that patch smaller. llvm-svn: 260240	2016-02-09 17:39:58 +00:00
Simon Pilgrim	7e671e06a2	[X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets. On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG. This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead. Differential Revision: http://reviews.llvm.org/D16994 llvm-svn: 260210	2016-02-09 08:19:19 +00:00
Simon Pilgrim	a207436b01	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Sanjay Patel	264d7e5b68	[x86] convert masked store of one element to scalar store Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145	2016-02-08 21:05:08 +00:00
Hans Wennborg	850ec6ca18	[X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532) This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133	2016-02-08 19:34:30 +00:00
Simon Pilgrim	f116e4acc7	[X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input. This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle. Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle. Differential Revision: http://reviews.llvm.org/D16683 llvm-svn: 260063	2016-02-07 22:51:06 +00:00
Asaf Badouh	ad5c3fc47d	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Simon Pilgrim	73fc26b44a	[X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI. Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions. The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively. llvm-svn: 260032	2016-02-07 14:33:03 +00:00
Simon Pilgrim	9e369f2a51	[X86][SSE] Don't replace an existing 32-bit load with its duplicate If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991	2016-02-06 15:37:09 +00:00
Simon Pilgrim	11e4d1146f	Comment fix llvm-svn: 259990	2016-02-06 14:21:49 +00:00
Simon Pilgrim	7823fd2535	[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816	2016-02-04 19:27:51 +00:00
Simon Pilgrim	6788f33cf2	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796	2016-02-04 16:12:56 +00:00
Michael Zuckerman	7d73360479	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789	2016-02-04 14:41:08 +00:00
Simon Pilgrim	1d2d6c5a57	[X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC. llvm-svn: 259771	2016-02-04 09:27:19 +00:00
Sanjay Patel	460ce9cd9b	clean up; NFC llvm-svn: 259720	2016-02-03 22:37:37 +00:00
Simon Pilgrim	18bcf93efb	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635	2016-02-03 09:41:59 +00:00
Asaf Badouh	5a3a0231f4	[X86][AVX512VBMI] add encoding and intrinsics for Multishift Differential Revision: http://reviews.llvm.org/D16399 llvm-svn: 259363	2016-02-01 15:48:21 +00:00
Igor Breger	56b039ea17	AVX512: fix mask handling for gather/scatter/prefetch intrinsics. Differential Revision: http://reviews.llvm.org/D16755 llvm-svn: 259346	2016-02-01 09:57:15 +00:00
Simon Pilgrim	1358d86659	[X86][SSE] Find source of the inserted element of INSERTPS Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle. Differential Revision: http://reviews.llvm.org/D16652 llvm-svn: 259343	2016-02-01 08:59:30 +00:00
Igor Breger	6cc9115cec	AVX512 : Fix SETCCE lowering for KNL 32 bit. Differential Revision: http://reviews.llvm.org/D16752 llvm-svn: 259342	2016-02-01 07:56:09 +00:00
Mitch Bodart	e5cadbbcdd	[X86] Test commit, fixed typos in comments. NFC. llvm-svn: 259057	2016-01-28 16:40:51 +00:00
Simon Pilgrim	de16172d9d	[x86] Merge multiple calls to DAG.getTargetLoweringInfo(). NFC. llvm-svn: 259050	2016-01-28 15:29:11 +00:00
Igor Breger	fca0a34398	AVX512: Fix truncate v32i8 to v32i1 lowering implementation. Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions. Differential Revision: http://reviews.llvm.org/D16531 llvm-svn: 259044	2016-01-28 13:19:25 +00:00
Simon Pilgrim	d3b78430d1	[X86][SSE] Move setTargetShuffleZeroElements closer to getTargetShuffleMask. NFCI. Keep target shuffle mask helper functions closer together. llvm-svn: 259034	2016-01-28 09:45:01 +00:00
Igor Breger	d6c187b038	AVX512: Add store mask patterns. Differential Revision: http://reviews.llvm.org/D16596 llvm-svn: 258914	2016-01-27 08:43:25 +00:00
Sanjay Patel	06fe9183b0	[x86] make the subtarget member a const reference, not a pointer ; NFCI It's passed in as a reference; it's not optional; it's not a pointer. llvm-svn: 258867	2016-01-26 22:08:58 +00:00
Simon Pilgrim	00adc1e105	[X86] Add support for zeroed shuffle elements to getShuffleScalarElt Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads. llvm-svn: 258865	2016-01-26 21:39:25 +00:00
Sanjay Patel	3e1701da29	[x86] add materializeVectorConstant() helper function; NFC LowerBUILD_VECTOR is still over 300 lines long, but it's a start... llvm-svn: 258858	2016-01-26 21:05:00 +00:00
Sanjay Patel	70fa79fdf2	[x86] simplify getOnesVector() ; NFCI Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. llvm-svn: 258833	2016-01-26 18:49:36 +00:00
Simon Pilgrim	46696ef93c	[X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load). It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads. After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion. Differential Revision: http://reviews.llvm.org/D16217 llvm-svn: 258798	2016-01-26 09:30:08 +00:00
Matthias Braun	4e67e5c91a	X86ISelLowering: Fix cmov(cmov) special lowering bug There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729	2016-01-25 22:08:25 +00:00
Asaf Badouh	655822ab7e	[X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680	2016-01-25 11:14:24 +00:00
Igor Breger	1e5bafbc82	AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657	2016-01-24 08:04:33 +00:00
Simon Pilgrim	0423b382d3	[X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC. Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645	2016-01-23 22:02:48 +00:00
Simon Pilgrim	ead22d095e	Added missing comment. NFC. llvm-svn: 258624	2016-01-23 14:38:02 +00:00
Simon Pilgrim	fd66169341	[X86][SSE] Remove INSERTPS dependencies from unreferenced operands. If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622	2016-01-23 13:37:07 +00:00
Sanjay Patel	c4efadb665	fix typos; NFC llvm-svn: 258567	2016-01-22 22:09:41 +00:00
Simon Pilgrim	5ba1c127fc	[X86][SSE] Improve i16 splatting shuffles Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440	2016-01-21 22:07:41 +00:00
Igor Breger	d3341f5021	AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309	2016-01-20 13:11:47 +00:00
Simon Pilgrim	4b919b2ab3	[X86][SSE] Add VZEXT_MOVL target shuffle decoding. Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215	2016-01-19 23:04:56 +00:00
Simon Pilgrim	e74653b67a	[X86][SSE] Add INSERTPS target shuffle combines. As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205	2016-01-19 22:24:12 +00:00
Asaf Badouh	d4a0d9a78c	[X86][AVX512]fix dag & add intrinsics for fixupimm cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124	2016-01-19 14:21:39 +00:00
Simon Pilgrim	3e5fb61978	[X86][AVX2] Broadcast subvectors AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081	2016-01-18 20:59:04 +00:00
Igor Breger	239fda676c	AVX512: Masked store intrinsic implementation. Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047	2016-01-18 13:52:57 +00:00
Igor Breger	e1f273d900	AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic. Differential Revision: http://reviews.llvm.org/D16184 llvm-svn: 258009	2016-01-17 12:10:24 +00:00
Simon Pilgrim	20f31fa31a	[X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000	2016-01-16 22:30:20 +00:00
NAKAMURA Takumi	33ff1dda6a	[Cygwin] Use -femulated-tls by default since r257718 introduced the new pass. FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp. FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress. llvm-svn: 257984	2016-01-16 03:44:52 +00:00
Manman Ren	4fe01bd8f9	CXX_FAST_TLS calling convention: fix issue on X86-64. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257925	2016-01-15 19:35:42 +00:00
David Majnemer	3463e696fb	[X86] Don't alter HasOpaqueSPAdjustment after we've relied on it We rely on HasOpaqueSPAdjustment not changing after we've calculated things based on it. Things like whether or not we can use 'rep;movs' to copy bytes around, that sort of thing. If it changes, invariants in the backend will quietly break. This situation arose when we had a call to memcpy and a COPY of the FLAGS register where we would attempt to reference local variables using %esi, a register that was clobbered by the 'rep;movs'. This fixes PR26124. llvm-svn: 257730	2016-01-14 01:20:03 +00:00
Michael Zuckerman	6b35f460ac	Fixing warning by adding the X86ISD::VROTRI case. Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257607	2016-01-13 15:48:42 +00:00
Michael Zuckerman	2ddcbcf464	[AVX512] adding PROLQ and PROLD Intrinsics Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523	2016-01-12 21:19:17 +00:00
Igor Breger	ea8e8e9f97	AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16042 llvm-svn: 257463	2016-01-12 10:02:32 +00:00
Manman Ren	ed967f3752	CXX_FAST_TLS calling convention: performance improvement for x86-64. This is the same change on x86-64 as r255821 on AArch64. rdar://9001553 llvm-svn: 257428	2016-01-12 01:08:46 +00:00
Elena Demikhovsky	542dfcf44c	Optimized instruction sequence for sitofp operation on X86-32 Optimized sitofp i64 %x to double. The current sequence movl %ecx, 8(%esp) movl %edx, 12(%esp) fildll 8(%esp) is replaced with: movd %ecx, %xmm0 movd %edx, %xmm1 punpckldq %xmm1, %xmm0 movq %xmm0, 8(%esp) Differential Revision: http://reviews.llvm.org/D15946 llvm-svn: 257285	2016-01-10 09:41:22 +00:00
Simon Pilgrim	c7bebcbfd8	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266	2016-01-09 20:59:39 +00:00
Simon Pilgrim	2e7a1849c9	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264	2016-01-09 19:59:27 +00:00
Nico Weber	4324b9b236	Revert r257055, it caused PR26064. llvm-svn: 257066	2016-01-07 15:01:46 +00:00
Simon Pilgrim	bcc11a059e	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur. Follow up to D15310 llvm-svn: 257055	2016-01-07 11:34:27 +00:00
Simon Pilgrim	83e44c66ae	[X86][SSE} Add INSERTPS as a target shuffle Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS. llvm-svn: 257046	2016-01-07 10:24:19 +00:00
Simon Pilgrim	bc82dedd26	[X86] Determine if target shuffle can contain zero elements getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult. This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier. I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it. Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261. Differential Revision: http://reviews.llvm.org/D15378 llvm-svn: 256992	2016-01-06 23:24:40 +00:00
Quentin Colombet	eb61e8e6b0	[X86] Correctly model TLS calls w.r.t. frame requirements. TLS calls need the stack frame to be properly set up and this implies that such calls need ADJUSTSTACK_xxx markers. Fixes PR25820. llvm-svn: 256959	2016-01-06 19:09:26 +00:00
Sanjay Patel	ab69e9f497	refactor divrem8 lowering; NFCI The code duplication contributed to PR25754: https://llvm.org/bugs/show_bug.cgi?id=25754 llvm-svn: 256957	2016-01-06 18:47:09 +00:00
Artyom Skrobov	51f2d11be9	PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result Reviewers: spatel, srking Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15331 llvm-svn: 256924	2016-01-06 09:41:10 +00:00
Simon Pilgrim	267163e713	[X86][SSE] There is no zmm addsubpd/addsubps instruction. Replace the assert in combineShuffleToAddSub with an early out. llvm-svn: 256922	2016-01-06 09:08:49 +00:00
Simon Pilgrim	eaabd64a11	[X86][SSE] An empty target shuffle mask is always a failure. As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure. llvm-svn: 256921	2016-01-06 08:59:32 +00:00
David Majnemer	861a0ae349	[X86] Determine if we have an OpaqueSPAdjustment earlier We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos manipulated state that hasFP relied on, potentially changing the result after it has been queried elsewhere. While I am not aware of any particular bug due to this state of affairs, it seems best to avoid it entirely by changing the state during DAG construction. llvm-svn: 256849	2016-01-05 17:46:36 +00:00
Simon Pilgrim	d47ac60f00	[X86][SSE] Merge PerformBLENDICombine into PerformShuffleCombine PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward. llvm-svn: 256819	2016-01-05 09:12:17 +00:00
Simon Pilgrim	e6955f3211	[X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type llvm-svn: 256782	2016-01-04 21:41:11 +00:00
David Majnemer	ca1c9f074f	[X86] Make hasFP constant time We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730	2016-01-04 04:49:41 +00:00
David Majnemer	011980cd50	[X86] Add intrinsics for reading and writing to the flags register LLVM's targets need to know if stack pointer adjustments occur after the prologue. This is needed to correctly determine if the red-zone is appropriate to use or if a frame pointer is required. Normally, LLVM can figure this out very precisely by reasoning about the contents of the MachineFunction. There is an interesting corner case: inline assembly. The vast majority of inline assembly which will perform a push or pop is done so to pair up with pushf or popf as appropriate. Unfortunately, this inline assembly doesn't mark the stack pointer as clobbered because, well, it isn't. The stack pointer is decremented and then immediately incremented. Because of this, LLVM was changed in r256456 to conservatively assume that inline assembly contain a sequence of stack operations. This is unfortunate because the vast majority of inline assembly will not end up manipulating the stack pointer in any way at all. Instead, let's provide a more principled solution: an intrinsic. FWIW, other compilers (MSVC and GCC among them) also provide this functionality as an intrinsic. llvm-svn: 256685	2016-01-01 06:50:01 +00:00
Craig Topper	69653af748	[X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library. llvm-svn: 256680	2015-12-31 22:40:45 +00:00
Asaf Badouh	af6569afd2	[X86][PKU] Add {RD,WR}PKRU intrinsics Differential Revision: http://reviews.llvm.org/D15808 llvm-svn: 256670	2015-12-31 08:31:13 +00:00
Sanjay Patel	b3c53e512f	[x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 http://reviews.llvm.org/rL256510 llvm-svn: 256522	2015-12-28 21:16:55 +00:00
Sanjay Patel	9da2b647c7	[x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 llvm-svn: 256510	2015-12-28 19:20:19 +00:00
Michael Kuperstein	2ea81baf3a	[X86] Better support for the MCU psABI (LLVM part) This adds support for the MCU psABI in a way different from r251223 and r251224, basically reverting most of these two patches. The problem with the approach taken in r251223/4 is that it only handled libcalls that originated from the backend. However, the mid-end also inserts quite a few libcalls and assumes these use the platform's default calling convention. The previous patch tried to insert inregs when necessary both in the FE and, somewhat hackily, in the CG. Instead, we now define a new default calling convention for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does. Differential Revision: http://reviews.llvm.org/D15054 llvm-svn: 256494	2015-12-28 14:39:21 +00:00
Asaf Badouh	fba562004b	[X86][AVX512] Lower broadcast sub vector to vector inrtrinsics lower broadcast<type>x<vector> to shuffles. there are two cases: 1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0. 2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op). Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256490	2015-12-28 08:26:26 +00:00
Craig Topper	f3ed5c115c	[AVX512] Remove separate instruction and patterns for lowering ctlz_zero_undef. Change the operation for CTLZ_ZERO_UNDEF to Expand so SelectionDAG will convert them to CTLZ before lowering. llvm-svn: 256477	2015-12-27 21:33:50 +00:00
Igor Breger	756c289dd8	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE. Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470	2015-12-27 13:56:16 +00:00
Sanjay Patel	bcff3f7d92	[x86] lower calls to llvm.maxnum.v4f32 using maxps This is a follow-on to: http://reviews.llvm.org/rL255700 llvm-svn: 256454	2015-12-26 21:44:55 +00:00
Craig Topper	fa5f35e6ad	[X86] Fold some variable declarations and initializations into if statements. NFC llvm-svn: 256451	2015-12-26 19:48:37 +00:00
Craig Topper	91dab7baee	[X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425	2015-12-25 22:09:45 +00:00
Igor Breger	268f6f53c5	AVX512: VPMOVM2B/W/D/Q intrinsic implementation. Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364	2015-12-24 07:11:53 +00:00
Simon Pilgrim	17377bdd45	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332	2015-12-23 13:10:07 +00:00
Igor Breger	7b46b4e798	AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts. Differential Revision: http://reviews.llvm.org/D15713 llvm-svn: 256324	2015-12-23 08:06:50 +00:00
Cong Hou	8df93ce455	[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194	2015-12-21 20:42:43 +00:00
Igor Breger	44b60a3687	AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored. llvm-svn: 256157	2015-12-21 14:40:36 +00:00
Amjad Aboud	60b5e1b6c0	Implemented Support of IA interrupt and exception handlers: http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html Differential Revision: http://reviews.llvm.org/D15567 llvm-svn: 256155	2015-12-21 14:07:14 +00:00
NAKAMURA Takumi	9ec6a826dd	[Cygwin] Enable TLS as emutls. It resolves clang selfhosting with std::once() for Cygwin. FIXME: It may be EmulatedTLS-generic also for X86-Android. FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls. llvm-svn: 256134	2015-12-21 02:37:23 +00:00
Michael Kuperstein	e75e6e2a23	[X86] Improve shift combining This folds (ashr (shl a, [56,48,32,24,16]), SarConst) into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or into (lshr, (sext (a), SarConst - [56,48,32,24,16])) depending on sign of (SarConst - [56,48,32,24,16]) sexts in X86 are MOVs. The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size). However the MOVs have 2 advantages to SHIFTs on x86: 1. MOVs can write to a register that differs from source. 2. MOVs accept memory operands. This fixes PR24373. Patch by: evgeny.v.stupachenko@intel.com Differential Revision: http://reviews.llvm.org/D13161 llvm-svn: 255761	2015-12-16 11:22:37 +00:00
Reid Kleckner	7850c9f5ca	[WinEH] Make llvm.x86.seh.recoverfp work on x64 It adjusts from RSP-after-prologue to RBP, which is what SEH filters need to do before they can use llvm.localrecover. Fixes SEH filter captures, which were broken in r250088. Issue reported by Alex Crichton. llvm-svn: 255707	2015-12-15 23:40:58 +00:00
Sanjay Patel	271efcdf20	[x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475) This patch improves on the suggested codegen from PR24475: https://llvm.org/bugs/show_bug.cgi?id=24475 but only for the fmaxf() case to start, so we can sort out any bugs before extending to fmin, f64, and vectors. The fmax / maxnum definitions provide us flexibility for signed zeros, so the only thing we have to worry about in this replacement sequence is NaN handling. Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes a problem: SelectionDAGBuilder::visitSelect() transforms compare/select instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps that should be checking for NaN inputs or global unsafe-math before transforming? As it stands, that bypasses a big set of optimizations that the x86 backend already has in PerformSELECTCombine(). Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so we have completely unnecessary operations happening on undef elements of the vector. Differential Revision: http://reviews.llvm.org/D15294 llvm-svn: 255700	2015-12-15 23:11:43 +00:00
Reid Kleckner	d7045faa10	[WinEH] Remove unused intrinsic llvm.x86.seh.restoreframe We can clean this up now that we have the X86 CATCHRET instruction to restore the FP, SP, and BP. llvm-svn: 255677	2015-12-15 21:41:34 +00:00
Elena Demikhovsky	6015f5c823	Type legalizer for masked gather and scatter intrinsics. Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double). This intrinsic, for example void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask ) requires type widening for data and type promotion for mask. Differential Revision: http://reviews.llvm.org/D13633 llvm-svn: 255629	2015-12-15 08:40:41 +00:00
Chih-Hung Hsieh	7993e18e80	[X86] Part 2 to fix x86-64 fp128 calling convention. Part 1 was submitted in http://reviews.llvm.org/D15134. Changes in this part: * X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class. * X86CallingConv.td: Pass f128 values in XMM registers or on stack. * X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128. * X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code. * SelectionDAGBuilder.cpp: Fix infinite loop when f128 type can have VT == TLI.getTypeToTransformTo(Ctx, VT). * Add unit tests for x86-64 fp128 type. Differential Revision: http://reviews.llvm.org/D11438 llvm-svn: 255558	2015-12-14 22:08:36 +00:00
Chen Li	1b26b9ec9d	[X86ISelLowering] Add additional support for multiplication-to-shift conversion. Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this. Reviewers: craig.topper, RKSimon Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14603 llvm-svn: 255415	2015-12-12 01:04:15 +00:00
Chen Li	02ef2e1385	Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion. because it broke buildbot. llvm-svn: 255395	2015-12-12 00:08:37 +00:00
Chen Li	e8f9387e0c	[X86ISelLowering] Add additional support for multiplication-to-shift conversion. Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this. Reviewers: craig.topper, RKSimon Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14603 llvm-svn: 255391	2015-12-11 23:39:32 +00:00
Simon Pilgrim	323e00d9c7	[X86][AVX] Fold loads + splats into broadcast instructions On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector. This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element. Fix for PR23022 Differential Revision: http://reviews.llvm.org/D15310 llvm-svn: 255061	2015-12-08 22:17:11 +00:00
Elena Demikhovsky	291fe0159f	VX-512: Fixed a bug in FP logic operation lowering FP logic instructions are supported in DQ extension on AVX-512 target. I use integer operations instead. Added tests. I also enabled FABS in this patch in order to check ANDPS. The operations are FOR, FXOR, FAND, FANDN. The instructions, that supported for 512-bit vector under DQ are: VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD. Differential Revision: http://reviews.llvm.org/D15110 llvm-svn: 254913	2015-12-07 14:33:34 +00:00
Elena Demikhovsky	33e61eceb4	AVX-512: Fixed masked load / store instruction selection for KNL. Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store. This intrinsic comes with all legal types: <8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru), but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only. All data operands should be widened to 512-bit vector. The mask operand should be widened to v16i1 with zeroes. Differential Revision: http://reviews.llvm.org/D15265 llvm-svn: 254909	2015-12-07 13:39:24 +00:00
Igor Breger	3ab6f17530	AVX-512: implement kunpck intrinsics. Differential Revision: http://reviews.llvm.org/D14821 llvm-svn: 254908	2015-12-07 13:25:18 +00:00
Igor Breger	076dfe5c12	AVX512: support AVX512BW Intrinsic in 32bit mode. Differential Revision: http://reviews.llvm.org/D15076 llvm-svn: 254873	2015-12-06 11:35:18 +00:00
Hans Wennborg	5000ce8a63	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793	2015-12-04 23:00:33 +00:00
Reid Kleckner	93fc520339	[X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings Summary: These ADJCALLSTACK markers don't generate code, but they keep dynamic alloca code that calls chkstk out of the prologue. This slightly pessimizes inalloca calls by preventing some register copy coalescing, but I can live with that. Reviewers: qcolombet Subscribers: hans, llvm-commits Differential Revision: http://reviews.llvm.org/D15200 llvm-svn: 254645	2015-12-03 20:46:59 +00:00
David Majnemer	70497c696a	Move EH-specific helper functions to a more appropriate place No functionality change is intended. llvm-svn: 254562	2015-12-02 23:06:39 +00:00
Simon Pilgrim	3fc3454a0c	[X86][FMA] Optimize FNEG(FMUL) Patterns On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0) Fix for PR24366 Differential Revision: http://reviews.llvm.org/D14909 llvm-svn: 254495	2015-12-02 09:07:55 +00:00
Asaf Badouh	2489f350c0	[X86][AVX512] add comi with Sae add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 254493	2015-12-02 08:17:51 +00:00
Craig Topper	f419a1f69a	[X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types. llvm-svn: 254490	2015-12-02 06:39:19 +00:00
Craig Topper	6164297f46	[X86] Fix weird identation. NFC llvm-svn: 254487	2015-12-02 05:24:38 +00:00
Sanjay Patel	60216f6943	[x86] add a convenience method to check for FMA capability; NFCI llvm-svn: 254425	2015-12-01 17:27:55 +00:00
Sanjay Patel	239be1fb0d	fix formatting; NFC llvm-svn: 254310	2015-11-30 17:52:02 +00:00
Aaron Ballman	33c95f08b0	Silencing a 32-bit to 64-bit implicit conversion warning; NFC. llvm-svn: 254302	2015-11-30 14:52:33 +00:00
Craig Topper	aad5f11e5f	[AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns. For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output. llvm-svn: 254277	2015-11-30 00:13:24 +00:00
Craig Topper	ecae476e4c	[X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices. llvm-svn: 254269	2015-11-29 22:53:22 +00:00
Simon Pilgrim	88aa627c0b	[X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching. llvm-svn: 254259	2015-11-29 16:41:04 +00:00
Craig Topper	0009656335	[X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel. llvm-svn: 254172	2015-11-26 19:41:34 +00:00
Artyom Skrobov	314ee04268	Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC) Summary: Many target lowerings copy-paste the code to test SDValues for known constants. This code can instead be shared in SelectionDAG.cpp, and reused in the targets. Reviewers: MatzeB, andreadb, tstellarAMD Subscribers: arsenm, jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D14945 llvm-svn: 254085	2015-11-25 19:41:11 +00:00
Elena Demikhovsky	f07df9fcac	AVX-512: Fixed a bug in VPERMT2* intrinsic. It was wrong order of operands (from intrinsic to DAG node). I added more strict type specification for instruction selection. Differential Revision: http://reviews.llvm.org/D14942 llvm-svn: 254059	2015-11-25 08:17:56 +00:00
Kaelyn Takata	d0955312d9	Fix an asan error where NumElements > 32 for at least one case in test/CodeGen/X86/avg.ll. llvm-svn: 254043	2015-11-25 00:03:29 +00:00
Simon Pilgrim	1b4fecb098	[X86][FMA] Optimize FNEG(FMA) Patterns X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction. Fix for PR24364 Differential Revision: http://reviews.llvm.org/D14906 llvm-svn: 254016	2015-11-24 20:31:46 +00:00
Cong Hou	db6220f84d	[X86] Fix several issues related to X86's psadbw instruction. This patch fixes the following issues: 1. Fix the return type of X86psadbw: it should not be the same type of inputs. For vNi8 inputs the output should be vMi64, where M = N/8. 2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly. 3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly. 4. Adjust the return type when building a DAG node of X86ISD::PSADBW type. 5. Update related tests. Differential revision: http://reviews.llvm.org/D14897 llvm-svn: 254010	2015-11-24 19:51:26 +00:00
Cong Hou	bed60d35ed	[X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW. This patch detects the AVG pattern in vectorized code, which is simply c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to i32 before any arithmetic operations. The following IR shows such an example: %1 = zext <N x i8> %a to <N x i32> %2 = zext <N x i8> %b to <N x i32> %3 = add nuw nsw <N x i32> %1, <i32 1 x N> %4 = add nuw nsw <N x i32> %3, %2 %5 = lshr <N x i32> %N, <i32 1 x N> %6 = trunc <N x i32> %5 to <N x i8> and with this patch it will be converted to a X86ISD::AVG instruction. The pattern recognition is done when combining instructions just before type legalization during instruction selection. We do it here because after type legalization, it is much more difficult to do pattern recognition based on many instructions that are doing type conversions. Therefore, for target-specific instructions (like X86ISD::AVG), we need to take care of type legalization by ourselves. However, as X86ISD::AVG behaves similarly to ISD::ADD, I am wondering if there is a way to legalize operands and result types of X86ISD::AVG together with ISD::ADD. It seems that the current design doesn't support this idea. Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of variant vector sizes. Differential revision: http://reviews.llvm.org/D14761 llvm-svn: 253952	2015-11-24 05:44:19 +00:00
Elena Demikhovsky	0fd11526e2	AVX-512: Optimized INSERT_SUBVECTOR for i1 vector types ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector. CONCAT_VECTORS uses ISERT_SUBVECTOR. Differential Revision: http://reviews.llvm.org/D14815 llvm-svn: 253819	2015-11-22 13:57:38 +00:00
Sanjay Patel	8066d906f1	fix formatting; NFC llvm-svn: 253802	2015-11-22 00:03:16 +00:00
Simon Pilgrim	a9912617c8	[X86][SSE4A] Fix issue with EXTRQI shuffles not starting at the correct start index. Found during stress testing. llvm-svn: 253611	2015-11-19 22:13:56 +00:00
Hans Wennborg	dcc2500452	X86: More efficient legalization of wide integer compares In particular, this makes the code for 64-bit compares on 32-bit targets much more efficient. Example: define i32 @test_slt(i64 %a, i64 %b) { entry: %cmp = icmp slt i64 %a, %b br i1 %cmp, label %bb1, label %bb2 bb1: ret i32 1 bb2: ret i32 2 } Before this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax setae %al cmpl 16(%esp), %ecx setge %cl je .LBB2_2 movb %cl, %al .LBB2_2: testb %al, %al jne .LBB2_4 movl $1, %eax retl .LBB2_4: movl $2, %eax retl After this patch: test_slt: movl 4(%esp), %eax movl 8(%esp), %ecx cmpl 12(%esp), %eax sbbl 16(%esp), %ecx jge .LBB1_2 movl $1, %eax retl .LBB1_2: movl $2, %eax retl Differential Revision: http://reviews.llvm.org/D14496 llvm-svn: 253572	2015-11-19 16:35:08 +00:00
Asaf Badouh	0d957b8b09	[X86][AVX512CD] add mask broadcast intrinsics Differential Revision: http://reviews.llvm.org/D14573 llvm-svn: 253450	2015-11-18 09:42:45 +00:00
Reid Kleckner	c20276d0b2	[WinEH] Move WinEHFuncInfo from MachineModuleInfo to MachineFunction Summary: Now that there is a one-to-one mapping from MachineFunction to WinEHFuncInfo, we don't need to use a DenseMap to select the right WinEHFuncInfo for the current funclet. The main challenge here is that X86WinEHStatePass is an IR pass that doesn't have access to the MachineFunction. I gave it its own WinEHFuncInfo object that it uses to calculate state numbers, which it then throws away. As long as nobody creates or removes EH pads between this pass and SDAG construction, we will get the same state numbers. The other thing X86WinEHStatePass does is to mark the EH registration node. Instead of communicating which alloca was the registration through WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic. This intrinsic generates no code and simply marks the alloca in use. Reviewers: JCTremoulet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14668 llvm-svn: 253378	2015-11-17 21:10:25 +00:00
Simon Pilgrim	cbba348ae7	[X86][SSE] Tidyup with implicit SDValue bool check. NFC. llvm-svn: 253171	2015-11-15 14:57:07 +00:00
Cong Hou	ef4074bac2	[X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one instruction for sext from v16i8 to v16i16 and v8i16 to v8i32. This patch is enabling combining UNPCKL with vector_shuffle that moves the upper half of a vector into the lower half, into a UNPCKH instruction. For example: t2: v16i8 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t1, undef:v16i8 t3: v16i8 = X86ISD::UNPCKL undef:v16i8, t2 will be combined to: t3: v16i8 = X86ISD::UNPCKH undef:v16i8, t1 Differential revision: http://reviews.llvm.org/D14399 llvm-svn: 253067	2015-11-13 19:47:43 +00:00
Reid Kleckner	94b57065c6	[WinEH] Make UnwindHelp a fixed stack object allocated after XMM CSRs Now the offset of UnwindHelp in our EH tables and the offset that we store to in the prologue agree. llvm-svn: 253059	2015-11-13 19:06:01 +00:00
Joseph Tremoulet	149c433bcc	[WinEH] Find root frame correctly in CLR funclets Summary: The value that the CoreCLR personality passes to a funclet for the establisher frame may be the root function's frame or may be the parent funclet's (mostly empty) frame in the case of nested funclets. Each funclet stores a pointer to the root frame in its own (mostly empty) frame, as does the root function itself. All frames allocate this slot at the same offset, measured from the post-prolog stack pointer, so that the same sequence can accept any ancestor as an establisher frame parameter value, and so that a single offset can be reported to the GC, which also looks at this slot. This change allocate the slot when processing function entry, and records its frame index on the WinEHFuncInfo object, then inserts the code to set/copy it during prolog emission. Reviewers: majnemer, AndyAyers, pgavlin, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14614 llvm-svn: 252983	2015-11-13 00:39:23 +00:00
Manman Ren	3f2b9c18e2	[TLS on Darwin] use a different mask for tls calls on x86-64. Calls involved in thread-local variable lookup save more registers than normal calls. rdar://problem/23073171 llvm-svn: 252837	2015-11-12 00:54:04 +00:00
Joseph Tremoulet	9f467353a5	[WinEH] Only generate UnwindHelp slot for MSVCXX Summary: Other personalities don't use this special frame slot. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14580 llvm-svn: 252778	2015-11-11 19:21:09 +00:00
Reid Kleckner	7f84a939ed	[WinEH] Insert the MBB for EH_RESTORE after the catchret Inserting it before the target block could be bad, we might already have a fallthrough edge to it. llvm-svn: 252670	2015-11-10 23:22:20 +00:00
Michael Kuperstein	a01a5ee72f	[X86] Do not try to custom-lower sitofp/fptosi in soft-float mode Differential Revision: http://reviews.llvm.org/D14495 llvm-svn: 252621	2015-11-10 17:37:49 +00:00
David Blaikie	578a31fe0a	Remove another variable unused in -Asserts build llvm-svn: 252582	2015-11-10 04:10:04 +00:00
David Blaikie	e35168f008	Remove some unused variables to clean up the -Werror build llvm-svn: 252580	2015-11-10 03:16:28 +00:00
Andy Ayers	809cbe9ea0	Support for emitting inline stack probes For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages between the current stack limit and the desired new stack pointer location. This implements support for the inline expansion on x64. For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications that arise when introducing new machine basic blocks during prolog and epilog creation. Added a new test case, modified an existing one to exclude non-x64 coreclr (for now). Add test case Fix tests llvm-svn: 252578	2015-11-10 01:50:49 +00:00
David Majnemer	2652b75700	[WinEH] Don't emit CATCHRET from visitCatchPad Instead, emit a CATCHPAD node which will get selected to a target specific sequence. llvm-svn: 252528	2015-11-09 23:07:48 +00:00
David Majnemer	e35244cf63	[WinEH] Update PHIs of CATCHRET successors The TailDuplication machine pass ran across a malformed CFG: a PHI node referred it's predecessor's predecessor instead of it's predecessor. This occurred because we split the edge in X86ISelLowering when we processed the CATCHRET but forgot to do something about the PHI nodes. This fixes PR25444. llvm-svn: 252413	2015-11-08 02:36:00 +00:00
Joseph Tremoulet	f748c8937e	[WinEH] Update exception pointer registers Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383	2015-11-07 01:11:31 +00:00
Ahmed Bougacha	05a0514b12	[X86] SRL non-LSB extracts when folding to truncating broadcasts. Now that we recognize this, we can support it instead of bailing out. That is, we can fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc (srl Y, 16))))) llvm-svn: 252362	2015-11-06 23:16:43 +00:00
Ahmed Bougacha	68614a36d1	[X86] Don't fold non-LSB extracts into truncating broadcasts. We used to incorrectly assume that the offset we're extracting from was a multiple of the element size. So, we'd fold: (v8i16 (shufflevector (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))), <1,1,...,1>)) into: (v8i16 (vbroadcast (i16 (trunc Y)))) whereas we should have extracted the higher bits from X. Instead, bail out if the assumption doesn't hold. llvm-svn: 252361	2015-11-06 23:16:38 +00:00
Reid Kleckner	51460c139e	[WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH This adds the EH_RESTORE x86 pseudo instr, which is responsible for restoring the stack pointers: EBP and ESP, and ESI if stack realignment is involved. We only need this on 32-bit x86, because on x64 the runtime restores CSRs for us. Previously we had to keep the CATCHRET instruction around during SEH so that we could convince X86FrameLowering to restore our frame pointers. Now we can split these instructions earlier. This was confusing, because we had a return instruction which wasn't really a return and was ultimately going to be removed by X86FrameLowering. This change also simplifies X86FrameLowering, which really shouldn't be building new MBBs. No observable functional change currently, but with the new register mask stuff in D14407, CATCHRET will become a register allocator barrier, and our existing tests rely on us having reasonable register allocation around SEH. llvm-svn: 252266	2015-11-06 01:49:05 +00:00
Reid Kleckner	6ddae31045	[WinEH] Fix funclet prologues with stack realignment We already had a test for this for 32-bit SEH catchpads, but those don't actually create funclets. We had a bug that only appeared in funclet prologues, where we would establish EBP and ESI as our FP and BP, and then downstream prologue code would overwrite them. While I was at it, I fixed Win64+funclets+stackrealign. This issue doesn't come up as often there due to the ABI requring 16 byte stack alignment, but now we can rest easy that AVX and WinEH will work well together =P. llvm-svn: 252210	2015-11-05 21:09:49 +00:00
Asaf Badouh	f99c054ebc	revert rev. 252153 due to build failure on ubuntu [X86][AVX512] add comi with Sae llvm-svn: 252154	2015-11-05 08:55:54 +00:00
Asaf Badouh	7fdabf0a35	[X86][AVX512] add comi with Sae add builtin_ia32_vcomisd and builtin_ia32_vcomisd Differential Revision: http://reviews.llvm.org/D14331 llvm-svn: 252153	2015-11-05 08:45:06 +00:00
Simon Pilgrim	7e6606f4f1	[X86][SSE] Add general memory folding for (V)INSERTPS instruction This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074	2015-11-04 20:48:09 +00:00
Michael Kuperstein	b34de72269	[X86] DAGCombine should not introduce FILD in soft-float mode The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode. llvm-svn: 252042	2015-11-04 11:17:53 +00:00
Craig Topper	45e83b8ba7	[X86] Remove assertions that check for valid scale values on scatter/gather intrinsics. Nothing upstream prevented illegal values from getting here. llvm-svn: 251780	2015-11-02 07:24:40 +00:00
Craig Topper	e69eb78510	[X86] Fold 'if' followed by just an llvm_unreachable into an assert. llvm-svn: 251778	2015-11-02 07:24:34 +00:00
Craig Topper	aebab7c03f	[X86] Use isa instead of dyn_cast in a bool context. NFC llvm-svn: 251777	2015-11-02 07:24:32 +00:00
Craig Topper	c70af642a2	[X86] Remove some llvm_unreachables after switches that already have an unreachable in their default case. llvm-svn: 251776	2015-11-02 07:24:30 +00:00
Craig Topper	d6a77ca4bb	[X86] Remove a 'break' after an llvm_unreachable. llvm-svn: 251775	2015-11-02 07:24:27 +00:00
Craig Topper	d49a41793c	[X86] Use cast instead of dyn_cast and a null check marked unreachable. llvm-svn: 251774	2015-11-02 07:24:25 +00:00
Craig Topper	95ceb5a60a	[X86] Use MVT instead of EVT when the type is known to be simple. NFC llvm-svn: 251772	2015-11-02 05:24:22 +00:00
Elena Demikhovsky	db738d9cc3	AVX-512: Optimized SIMD truncate operations for AVX512F set. Optimized <8 x i32> to <8 x i16> <4 x i64> to < 4 x i32> <16 x i16> to <16 x i8> All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set. Differential Revision: http://reviews.llvm.org/D14108 llvm-svn: 251764	2015-11-01 11:45:47 +00:00
Craig Topper	ec2ea4817e	[X86] Replace getScalarType with getVectorElementType when the type is already known to be a vector. This should result in slightly less code. NFC llvm-svn: 251751	2015-10-31 21:44:52 +00:00
Craig Topper	476be8f94a	[X86] Convert to MVT instead of calling EVT functions since we already know the type is simple. NFC llvm-svn: 251745	2015-10-31 18:14:17 +00:00
Craig Topper	0fec4d8ce7	[X86] Call getScalarSizeInBits() instead of getScalarType().getScalarSizeInBits(). NFC llvm-svn: 251744	2015-10-31 18:14:15 +00:00

... 2 3 4 5 6 ...

3836 Commits