llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	be37e62e0c	fix function names; NFC llvm-svn: 263646	2016-03-16 18:00:09 +00:00
Igor Breger	0ba7b04f5f	AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624	2016-03-16 08:48:26 +00:00
Eric Christopher	da8b3f1914	Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512	2016-03-14 23:59:57 +00:00
Sanjay Patel	7506852709	[DAG] use !isUndef() ; NFCI llvm-svn: 263453	2016-03-14 18:09:43 +00:00
Sanjay Patel	5719584129	[DAG] use isUndef() ; NFCI llvm-svn: 263448	2016-03-14 17:28:46 +00:00
Sanjay Patel	62d707c8d9	[x86, AVX] replace masked load with full vector load when possible Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446	2016-03-14 16:54:43 +00:00
Igor Breger	a949100532	AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417	2016-03-14 10:26:39 +00:00
Simon Pilgrim	035b19ecf5	[X86][SSE41] Avoid variable blend for constant v8i16 shifts The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383	2016-03-13 18:35:59 +00:00
Quentin Colombet	cf9732b417	[X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer. cmpxchg[8\|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325	2016-03-12 02:25:27 +00:00
Simon Pilgrim	33d57c7547	[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303	2016-03-11 22:18:05 +00:00
Simon Pilgrim	7b2164ffe0	Fix spelling. llvm-svn: 263266	2016-03-11 17:31:43 +00:00
Simon Pilgrim	7ca9614c71	[X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. llvm-svn: 263239	2016-03-11 14:39:10 +00:00
Sanjay Patel	0181943b89	[x86] don't use a shuffle when a vselect will do; NFCI Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167	2016-03-10 22:35:33 +00:00
Simon Pilgrim	61eb49e437	[X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159	2016-03-10 20:40:26 +00:00
Elena Demikhovsky	cd9967d160	AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512) (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111	2016-03-10 13:44:22 +00:00
Simon Pilgrim	13d4056795	[X86][AVX] Improve target shuffle combining of BLEND+zero The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105	2016-03-10 11:50:15 +00:00
Simon Pilgrim	16d11785a5	[X86][SSE] Basic combining of unary target shuffles of binary target shuffles. This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102	2016-03-10 11:23:51 +00:00
Elena Demikhovsky	38f78a2b92	AVX-512: Fixed a bug in shuffle for v64i8 type Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097	2016-03-10 08:32:09 +00:00
Sanjay Patel	4a8dd89128	[x86, AVX] optimize masked loads with constant masks Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067	2016-03-09 22:12:08 +00:00
Quentin Colombet	4340b55593	Revert r262759 and r262760. The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943	2016-03-08 17:29:11 +00:00
Hans Wennborg	e00b6e7249	Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935	2016-03-08 16:21:41 +00:00
Simon Pilgrim	253ca348b2	[X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809	2016-03-06 21:54:52 +00:00
Igor Breger	4d94d4d5f7	AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803	2016-03-06 12:38:58 +00:00
Igor Breger	f1bd761e00	AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797	2016-03-06 07:46:03 +00:00
Simon Pilgrim	40e1a71cdd	[X86][AVX] Improved VPERMILPS variable shuffle mask decoding. Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool. Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization Followup to D17681 llvm-svn: 262784	2016-03-05 22:53:31 +00:00
Quentin Colombet	2a7676b442	[X86] Fix the lowering of setjmp intrinsic on i386. When the lowering of the setjmp intrinsic requires a global base pointer to be set, make sure such pointer gets defined by the CGBR pass. This fixes PR26742. llvm-svn: 262762	2016-03-05 00:31:04 +00:00
Quentin Colombet	13b524597d	[X86] Do not use cmpxchgXXb when we need the base pointer (RBX). cmpxchgXXb uses RBX as one of its implicit argument. I.e., when we use that instruction we need to clobber RBX. This is generally fine, expect when RBX is a reserved register because in that case, the register allocator will not track its value and will not save and restore it when interferences occur. rdar://problem/24851412 llvm-svn: 262759	2016-03-04 23:29:39 +00:00
David Majnemer	d2f767d2f6	[X86] Support cleaning more than 2**16 bytes of stack The x86 ret instruction has a 16 bit immediate indicating how many bytes to pop off of the stack beyond the return address. There is a problem when extremely large structs are passed by value: we might not be able to fit the number of bytes to pop into the return instruction. To fix this, expand RET_FLAG a little later and use a special sequence to clean the stack: pop %ecx ; return address is now in %ecx add $n, %esp ; clean the stack push %ecx ; bring the return address back on the stack ret ; pop the return address and jmp to it's value llvm-svn: 262755	2016-03-04 22:56:17 +00:00
Simon Pilgrim	abcee45b7a	[X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic. This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle. Differential Revision: http://reviews.llvm.org/D17681 llvm-svn: 262635	2016-03-03 18:13:53 +00:00
Simon Pilgrim	022afe2538	[X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction. lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway. llvm-svn: 262633	2016-03-03 17:54:35 +00:00
Simon Pilgrim	0107d24810	[X86] Pulled out repeated code testing for constant vector shift amount. NFCI. llvm-svn: 262631	2016-03-03 17:35:43 +00:00
Ahmed Bougacha	671795a985	[X86] Don't assume that shuffle non-mask operands starts at #0 . That's not the case for VPERMV/VPERMV3, which cover all possible combinations (the C intrinsics use a different order; the AVX vs AVX512 intrinsics are different still). Since: r246981 AVX-512: Lowering for 512-bit vector shuffles. VPERMV is recognized in getTargetShuffleMask. This breaks assumptions in most callers, as they expect the non-mask operands to start at index 0. VPERMV has the mask as operand #0; VPERMV3 has it in the middle. Instead of the faulty assumption, have getTargetShuffleMask return its operands as well. One alternative we considered was to change the operand order of VPERMV, but we agreed to stick to the instruction order, as there are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular). Differential Revision: http://reviews.llvm.org/D17041 llvm-svn: 262627	2016-03-03 16:53:50 +00:00
Igor Breger	639fde79b0	AVX512: Combine AND + TESTM instructions . Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621	2016-03-03 14:18:38 +00:00
Simon Pilgrim	91dd0a796c	[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599	2016-03-03 09:43:28 +00:00
Hans Wennborg	153e4b0f11	[X86] Enable forwarding bool arguments in tail calls (PR26305) The code was previously not able to track a boolean argument at a call site back to the formal argument of the caller. Differential Revision: http://reviews.llvm.org/D17786 llvm-svn: 262575	2016-03-03 02:06:32 +00:00
David Majnemer	1ef654024f	[X86] Don't give catch objects a displacement of zero Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546	2016-03-03 00:01:25 +00:00
Reid Kleckner	65f9d9cd32	Revert "[X86] Elide references to _chkstk for dynamic allocas" This reverts commit r262370. It turns out there is code out there that does sequences of allocas greater than 4K: http://crbug.com/591404 The goal of this change was to improve the code size of inalloca call sequences, but we got tangled up in the mess of dynamic allocas. Instead, we should come back later with a separate MI pass that uses dominance to optimize the full sequence. This should also be able to remove the often unneeded stacksave/stackrestore pairs around the call. llvm-svn: 262505	2016-03-02 19:20:59 +00:00
Simon Pilgrim	c02b72627a	[X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478	2016-03-02 11:43:05 +00:00
David Majnemer	5aadde1ecc	[X86] Permit reading of the FLAGS register without it being previously defined We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS. While technically correct, this is not be desirable for folks who want to examine aspects of the FLAGS register which are not related to computation like whether or not CPUID is a valid instruction. Differential Revision: http://reviews.llvm.org/D17782 llvm-svn: 262465	2016-03-02 06:46:52 +00:00
Sanjay Patel	18988ae66c	[x86] use getBitcast() This isn't quite NFC because some of the SDLocs may change which could cause scheduling differences. But no regression tests are affected and there is no functional change intended. llvm-svn: 262391	2016-03-01 20:47:02 +00:00
David Majnemer	791b88b6da	[X86] Elide references to _chkstk for dynamic allocas The _chkstk function is called by the compiler to probe the stack in an order consistent with Windows' expectations. However, it is possible to elide the call to _chkstk and manually adjust the stack pointer if we can prove that the allocation is fixed size and smaller than the probe size. This shrinks chrome.dll, chrome_child.dll and chrome.exe by a cummulative ~133 KB. Differential Revision: http://reviews.llvm.org/D17679 llvm-svn: 262370	2016-03-01 19:20:23 +00:00
Sanjay Patel	8a37a988e6	fix function names; NFC llvm-svn: 262367	2016-03-01 19:14:09 +00:00
Matt Arsenault	03dac8d8e4	DAGCombiner: Turn extract of bitcasted integer into truncate This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358	2016-03-01 18:01:37 +00:00
Hans Wennborg	e64cf9dddb	[X86] Check that attribute parameters match for tail calls (PR26590) In the code below on 32-bit targets, x would previously get forwarded to g() without sign-extension to 32 bits as required by the parameter attribute. void g(signed short); void f(unsigned short x) { g(x); } llvm-svn: 262352	2016-03-01 17:45:23 +00:00
Sanjay Patel	2ca144f14c	fix documentation comments; NFC llvm-svn: 262351	2016-03-01 17:25:35 +00:00
Sanjay Patel	9fea531fec	function names start with a lowercase letter; NFC llvm-svn: 262347	2016-03-01 16:17:48 +00:00
Ahmed Bougacha	bb5d7d7ed8	[X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI. This is long-standing dirtiness, as acknowledged by r77582: The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. Doing this before selection will let us combine away some constructs. Differential Revision: http://reviews.llvm.org/D17659 llvm-svn: 262244	2016-02-29 19:28:07 +00:00
Simon Pilgrim	9e10b1655c	Tidyup for loops - don't repeat upper limit evaluation if you don't have to. NFCI. llvm-svn: 262137	2016-02-27 13:26:58 +00:00
Simon Pilgrim	4d1a088323	Strip trailing whitespace. NFCI. llvm-svn: 262083	2016-02-26 22:28:50 +00:00
Sanjay Patel	51488ed2d5	[x86] refactor to eliminate duplicated code; NFCI llvm-svn: 262062	2016-02-26 20:59:05 +00:00
Sanjay Patel	155193c3aa	[x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701) This extends the fold introduced with: http://reviews.llvm.org/rL262036 llvm-svn: 262047	2016-02-26 18:42:50 +00:00
Sanjay Patel	4402a32b32	[x86, SSE] fold 'isPositive' vector integer operations (PR26701) This is one of the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26701 Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. Note that the 'pcmpeq' instructions are always generating the negative one for the actual 'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?). Differential Revision: http://reviews.llvm.org/D17630 llvm-svn: 262036	2016-02-26 16:56:03 +00:00
Simon Pilgrim	e4178ae510	[X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles Now that PerformShuffleCombine can handle unary shuffles. llvm-svn: 261843	2016-02-25 09:12:12 +00:00
Simon Pilgrim	3b6feeaa7c	[X86][SSE41] Combine vector blends with zero Part 2 of 2 This patch add support for combining target shuffles into blends-with-zero. Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261745	2016-02-24 15:14:21 +00:00
Simon Pilgrim	dd01f70085	[X86][SSE41] Combine insertion of zero scalars into vector blends with zero Part 1 of 2 This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets). (Part 2 will add support for combining multiple blends-with-zero). Differential Revision: http://reviews.llvm.org/D17483 llvm-svn: 261743	2016-02-24 14:53:27 +00:00
Simon Pilgrim	c291c03702	[X86][SSE] Don't get target shuffle operands prematurely. PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement. llvm-svn: 261727	2016-02-24 09:07:47 +00:00
Davide Italiano	62b7f7a398	[X86ISelLowering] Stop typing the same return over and over and over. llvm-svn: 261666	2016-02-23 18:39:38 +00:00
Davide Italiano	2ec4717c2c	[X86ISelLowering] Consolidate duplicated code in a single place. llvm-svn: 261573	2016-02-22 21:06:46 +00:00
Simon Pilgrim	e9093adae0	[X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements. Differential Revision: http://reviews.llvm.org/D17297 llvm-svn: 261490	2016-02-21 19:15:48 +00:00
Simon Pilgrim	ecb0433599	[X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667) Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input. llvm-svn: 261434	2016-02-20 14:39:45 +00:00
Simon Pilgrim	ccf2cce67c	[X86][SSE] Move all undef/zero cases before target shuffle combining. First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041). llvm-svn: 261433	2016-02-20 12:57:32 +00:00
Davide Italiano	228978c0dc	[X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled. TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent shrink-wrapping from pushing prologue/epilogue past them (which result in TLS variables being accessed before the stack frame is set up), we put markers, so that the stack gets adjusted properly. Thanks to Quentin Colombet for guidance/help on how to fix this problem! llvm-svn: 261387	2016-02-20 00:44:47 +00:00
Davide Italiano	a8f1f2efaf	[X86ISelLowering] Provide a more informative assert message. I stumbled upon this while debugging a lowering bug. llvm-svn: 261371	2016-02-19 22:18:49 +00:00
Davide Italiano	4cfe2a9e38	[X86ISelLowering] Merge two conditions inside a single if. llvm-svn: 261370	2016-02-19 22:01:07 +00:00
Sanjay Patel	0adbea4b5c	[x86] fix initialization of PredictableSelectIsExpensive This is effectively NFC because Atom is the only in-order x86 subtarget currently, but the predicate would have become wrong if any other in-order CPU came along. See related discussion in: http://reviews.llvm.org/D16836 llvm-svn: 261275	2016-02-18 23:08:48 +00:00
Davide Italiano	440a676136	[X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC. llvm-svn: 261255	2016-02-18 20:43:15 +00:00
Hans Wennborg	23cdc643b9	Revert to extend i8/i16 return values on Darwin (PR26665) In r260133, LLVM was changed to no longer extend i8/i16 return values, as it's not required by the ABI. However, code was found in the wild that relies on the old behaviour on Darwin, so this commit reverts back to that old behaviour for Darwin. On other platforms, it's less likely that code would be depending on the old behaviour, as GCC and MSVC haven't been extending such return values. llvm-svn: 261235	2016-02-18 18:17:05 +00:00
Igor Breger	ac02f1bb62	AVX512: Fix LowerMSCATTER() return value. Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090	2016-02-17 14:04:33 +00:00
Simon Pilgrim	c5b5dcb985	[X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082	2016-02-17 10:50:06 +00:00
Simon Pilgrim	a50e8d3627	[X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081	2016-02-17 10:37:49 +00:00
Simon Pilgrim	9904924e6b	[X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI. Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow. Renamed from V to Ops to more closely match their purpose. llvm-svn: 261078	2016-02-17 10:12:30 +00:00
Ahmed Bougacha	f3cccab1e0	[X86] Remove the now-unused X86ISD::PSIGN. NFC. llvm-svn: 261025	2016-02-16 22:14:12 +00:00
Ahmed Bougacha	af60a429c9	[X86] Generalize logic blend of (x, -x) combine to match (-x, x). I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024	2016-02-16 22:14:07 +00:00
Ahmed Bougacha	132fbf5476	[X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN. Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) \| (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023	2016-02-16 22:14:03 +00:00
Ahmed Bougacha	f211f685e4	[X86] Extract PSIGN/BLENDVP combine. NFC. llvm-svn: 261021	2016-02-16 22:13:55 +00:00
Ahmed Bougacha	7502768c6d	[X86] Extract ANDNP combine. NFC. This makes it IMO more readable and reduces indentation. llvm-svn: 261020	2016-02-16 22:13:49 +00:00
Ahmed Bougacha	71c992d853	[X86] Remove now-dead variable and redundant assert. NFC. The variable was made dead in NDEBUG by r260901, but the assert was redundant anyway: get rid of both. llvm-svn: 260908	2016-02-15 19:32:54 +00:00
Ahmed Bougacha	93cff7fb82	[CodeGen] Document and use getConstant's splat-building feature. NFC. Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901	2016-02-15 18:07:29 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	08ba012973	[X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834	2016-02-13 21:54:04 +00:00
Sanjay Patel	e9bf993cee	[x86-64] allow mfence even with -mno-sse (PR23203) As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828	2016-02-13 17:26:29 +00:00
Sanjay Patel	ac42fecf74	[x86] simplify getZeroVector() ; NFCI Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. See also: http://reviews.llvm.org/rL258833 http://reviews.llvm.org/rL260582 llvm-svn: 260609	2016-02-11 22:17:04 +00:00
Sanjay Patel	d76d4aabdd	[x86] refactor masked load/store combine logic ; NFCI llvm-svn: 260426	2016-02-10 20:02:45 +00:00
JF Bastien	08499a0110	X86: Remove useless semicolon llvm-svn: 260359	2016-02-10 04:04:12 +00:00
Sanjay Patel	c7dde5f502	[x86] convert masked load of exactly one element to scalar load This is the load counterpart to the store optimization that was added in: http://reviews.llvm.org/rL260145 llvm-svn: 260325	2016-02-09 23:44:35 +00:00
Ahmed Bougacha	f8dfb47c02	[CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI. llvm-svn: 260316	2016-02-09 22:54:12 +00:00
Ahmed Bougacha	244cd98474	[X86] Don't reuse an unrelated variable, create a new one. NFC. Using Op makes it look like we're doing something with it. We're really not. llvm-svn: 260315	2016-02-09 22:54:05 +00:00
Ahmed Bougacha	46db084c71	[X86] Remove unnecessary assignment. NFC. llvm-svn: 260314	2016-02-09 22:53:58 +00:00
Sanjay Patel	73200f72de	[SelectionDAG] make getMemBasePlusOffset() accessible; NFCI I reinvented this functionality in http://reviews.llvm.org/D16828 because it was hidden away as a static function. The changes in x86 are not based on a complete audit. I suspect there are other possible uses there, and there are almost certainly more potential users in other targets. llvm-svn: 260295	2016-02-09 21:42:04 +00:00
Sanjay Patel	62dde825d8	[x86] make getOneTrueElt() a helper function ; NFC As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform will need this logic, so I'm moving it out to make that patch smaller. llvm-svn: 260240	2016-02-09 17:39:58 +00:00
Simon Pilgrim	7e671e06a2	[X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets. On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG. This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead. Differential Revision: http://reviews.llvm.org/D16994 llvm-svn: 260210	2016-02-09 08:19:19 +00:00
Simon Pilgrim	a207436b01	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Sanjay Patel	264d7e5b68	[x86] convert masked store of one element to scalar store Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but this should be a win for any AVX machine. Code comments note that this transform could be extended for other targets / cases. Differential Revision: http://reviews.llvm.org/D16828 llvm-svn: 260145	2016-02-08 21:05:08 +00:00
Hans Wennborg	850ec6ca18	[X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532) This matches GCC and MSVC's behaviour, and saves on code size. We were already not extending i1 return values on x86_64 after r127766. This takes that patch further by applying it to x86 target as well, and also for i8 and i16. The ABI docs have been unclear about the required behaviour here. The new i386 psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being updated to say the same [2]. Differential Revision: http://reviews.llvm.org/D16907 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ llvm-svn: 260133	2016-02-08 19:34:30 +00:00
Simon Pilgrim	f116e4acc7	[X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input. This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle. Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle. Differential Revision: http://reviews.llvm.org/D16683 llvm-svn: 260063	2016-02-07 22:51:06 +00:00
Asaf Badouh	ad5c3fc47d	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Simon Pilgrim	73fc26b44a	[X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI. Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions. The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively. llvm-svn: 260032	2016-02-07 14:33:03 +00:00
Simon Pilgrim	9e369f2a51	[X86][SSE] Don't replace an existing 32-bit load with its duplicate If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991	2016-02-06 15:37:09 +00:00
Simon Pilgrim	11e4d1146f	Comment fix llvm-svn: 259990	2016-02-06 14:21:49 +00:00
Simon Pilgrim	7823fd2535	[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816	2016-02-04 19:27:51 +00:00
Simon Pilgrim	6788f33cf2	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796	2016-02-04 16:12:56 +00:00
Michael Zuckerman	7d73360479	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789	2016-02-04 14:41:08 +00:00
Simon Pilgrim	1d2d6c5a57	[X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC. llvm-svn: 259771	2016-02-04 09:27:19 +00:00
Sanjay Patel	460ce9cd9b	clean up; NFC llvm-svn: 259720	2016-02-03 22:37:37 +00:00
Simon Pilgrim	18bcf93efb	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635	2016-02-03 09:41:59 +00:00
Asaf Badouh	5a3a0231f4	[X86][AVX512VBMI] add encoding and intrinsics for Multishift Differential Revision: http://reviews.llvm.org/D16399 llvm-svn: 259363	2016-02-01 15:48:21 +00:00
Igor Breger	56b039ea17	AVX512: fix mask handling for gather/scatter/prefetch intrinsics. Differential Revision: http://reviews.llvm.org/D16755 llvm-svn: 259346	2016-02-01 09:57:15 +00:00
Simon Pilgrim	1358d86659	[X86][SSE] Find source of the inserted element of INSERTPS Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle. Differential Revision: http://reviews.llvm.org/D16652 llvm-svn: 259343	2016-02-01 08:59:30 +00:00
Igor Breger	6cc9115cec	AVX512 : Fix SETCCE lowering for KNL 32 bit. Differential Revision: http://reviews.llvm.org/D16752 llvm-svn: 259342	2016-02-01 07:56:09 +00:00
Mitch Bodart	e5cadbbcdd	[X86] Test commit, fixed typos in comments. NFC. llvm-svn: 259057	2016-01-28 16:40:51 +00:00
Simon Pilgrim	de16172d9d	[x86] Merge multiple calls to DAG.getTargetLoweringInfo(). NFC. llvm-svn: 259050	2016-01-28 15:29:11 +00:00
Igor Breger	fca0a34398	AVX512: Fix truncate v32i8 to v32i1 lowering implementation. Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions. Differential Revision: http://reviews.llvm.org/D16531 llvm-svn: 259044	2016-01-28 13:19:25 +00:00
Simon Pilgrim	d3b78430d1	[X86][SSE] Move setTargetShuffleZeroElements closer to getTargetShuffleMask. NFCI. Keep target shuffle mask helper functions closer together. llvm-svn: 259034	2016-01-28 09:45:01 +00:00
Igor Breger	d6c187b038	AVX512: Add store mask patterns. Differential Revision: http://reviews.llvm.org/D16596 llvm-svn: 258914	2016-01-27 08:43:25 +00:00
Sanjay Patel	06fe9183b0	[x86] make the subtarget member a const reference, not a pointer ; NFCI It's passed in as a reference; it's not optional; it's not a pointer. llvm-svn: 258867	2016-01-26 22:08:58 +00:00
Simon Pilgrim	00adc1e105	[X86] Add support for zeroed shuffle elements to getShuffleScalarElt Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads. llvm-svn: 258865	2016-01-26 21:39:25 +00:00
Sanjay Patel	3e1701da29	[x86] add materializeVectorConstant() helper function; NFC LowerBUILD_VECTOR is still over 300 lines long, but it's a start... llvm-svn: 258858	2016-01-26 21:05:00 +00:00
Sanjay Patel	70fa79fdf2	[x86] simplify getOnesVector() ; NFCI Let DAG.getConstant() handle the splatting; there's no need to repeat that logic here. llvm-svn: 258833	2016-01-26 18:49:36 +00:00
Simon Pilgrim	46696ef93c	[X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load). It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads. After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion. Differential Revision: http://reviews.llvm.org/D16217 llvm-svn: 258798	2016-01-26 09:30:08 +00:00
Matthias Braun	4e67e5c91a	X86ISelLowering: Fix cmov(cmov) special lowering bug There's a special case in EmitLoweredSelect() that produces an improved lowering for cmov(cmov) patterns. However this special lowering is currently broken if the inner cmov has multiple users so this patch stops using it in this case. If you wonder why this wasn't fixed by continuing to use the special lowering and inserting a 2nd PHI for the inner cmov: I believe this would incur additional copies/register pressure so the special lowering does not improve upon the normal one anymore in this case. This fixes http://llvm.org/PR26256 (= rdar://24329747) llvm-svn: 258729	2016-01-25 22:08:25 +00:00
Asaf Badouh	655822ab7e	[X86][IFMA] adding intrinsics and encoding for multiply and add of unsigned 52bit integer VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators Differential Revision: http://reviews.llvm.org/D16407 llvm-svn: 258680	2016-01-25 11:14:24 +00:00
Igor Breger	1e5bafbc82	AVX512: VMOVDQU8/16/32/64 (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16137 llvm-svn: 258657	2016-01-24 08:04:33 +00:00
Simon Pilgrim	0423b382d3	[X86][SSE] Generalised TRUNC -> PACKSS/PACKUS code. NFC. Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types. llvm-svn: 258645	2016-01-23 22:02:48 +00:00
Simon Pilgrim	ead22d095e	Added missing comment. NFC. llvm-svn: 258624	2016-01-23 14:38:02 +00:00
Simon Pilgrim	fd66169341	[X86][SSE] Remove INSERTPS dependencies from unreferenced operands. If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies. llvm-svn: 258622	2016-01-23 13:37:07 +00:00
Sanjay Patel	c4efadb665	fix typos; NFC llvm-svn: 258567	2016-01-22 22:09:41 +00:00
Simon Pilgrim	5ba1c127fc	[X86][SSE] Improve i16 splatting shuffles Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440	2016-01-21 22:07:41 +00:00
Igor Breger	d3341f5021	AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309	2016-01-20 13:11:47 +00:00
Simon Pilgrim	4b919b2ab3	[X86][SSE] Add VZEXT_MOVL target shuffle decoding. Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215	2016-01-19 23:04:56 +00:00
Simon Pilgrim	e74653b67a	[X86][SSE] Add INSERTPS target shuffle combines. As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205	2016-01-19 22:24:12 +00:00
Asaf Badouh	d4a0d9a78c	[X86][AVX512]fix dag & add intrinsics for fixupimm cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124	2016-01-19 14:21:39 +00:00
Simon Pilgrim	3e5fb61978	[X86][AVX2] Broadcast subvectors AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081	2016-01-18 20:59:04 +00:00
Igor Breger	239fda676c	AVX512: Masked store intrinsic implementation. Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047	2016-01-18 13:52:57 +00:00
Igor Breger	e1f273d900	AVX512: Use MemIntrinsicSDNode to implement load/store intrinsic. Differential Revision: http://reviews.llvm.org/D16184 llvm-svn: 258009	2016-01-17 12:10:24 +00:00
Simon Pilgrim	20f31fa31a	[X86][AVX] Enable extraction of upper 128-bit subvectors for 'half undef' shuffle lowering Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks. Minor follow up to D15477. llvm-svn: 258000	2016-01-16 22:30:20 +00:00
NAKAMURA Takumi	33ff1dda6a	[Cygwin] Use -femulated-tls by default since r257718 introduced the new pass. FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp. FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress. llvm-svn: 257984	2016-01-16 03:44:52 +00:00
Manman Ren	4fe01bd8f9	CXX_FAST_TLS calling convention: fix issue on X86-64. When we have a single basic block, the explicit copy-back instructions should be inserted right before the terminator. Before this fix, they were wrongly placed at the beginning of the basic block. I will commit fixes to other platforms as well. PR26136 llvm-svn: 257925	2016-01-15 19:35:42 +00:00
David Majnemer	3463e696fb	[X86] Don't alter HasOpaqueSPAdjustment after we've relied on it We rely on HasOpaqueSPAdjustment not changing after we've calculated things based on it. Things like whether or not we can use 'rep;movs' to copy bytes around, that sort of thing. If it changes, invariants in the backend will quietly break. This situation arose when we had a call to memcpy and a COPY of the FLAGS register where we would attempt to reference local variables using %esi, a register that was clobbered by the 'rep;movs'. This fixes PR26124. llvm-svn: 257730	2016-01-14 01:20:03 +00:00
Michael Zuckerman	6b35f460ac	Fixing warning by adding the X86ISD::VROTRI case. Differential Revision: http://reviews.llvm.org/D16052 llvm-svn: 257607	2016-01-13 15:48:42 +00:00
Michael Zuckerman	2ddcbcf464	[AVX512] adding PROLQ and PROLD Intrinsics Differential Revision: http://reviews.llvm.org/D16048 llvm-svn: 257523	2016-01-12 21:19:17 +00:00
Igor Breger	ea8e8e9f97	AVX512: VPMOVAPS/PD and VPMOVUPS/PD (load) intrinsic implementation. Differential Revision: http://reviews.llvm.org/D16042 llvm-svn: 257463	2016-01-12 10:02:32 +00:00
Manman Ren	ed967f3752	CXX_FAST_TLS calling convention: performance improvement for x86-64. This is the same change on x86-64 as r255821 on AArch64. rdar://9001553 llvm-svn: 257428	2016-01-12 01:08:46 +00:00
Elena Demikhovsky	542dfcf44c	Optimized instruction sequence for sitofp operation on X86-32 Optimized sitofp i64 %x to double. The current sequence movl %ecx, 8(%esp) movl %edx, 12(%esp) fildll 8(%esp) is replaced with: movd %ecx, %xmm0 movd %edx, %xmm1 punpckldq %xmm1, %xmm0 movq %xmm0, 8(%esp) Differential Revision: http://reviews.llvm.org/D15946 llvm-svn: 257285	2016-01-10 09:41:22 +00:00
Simon Pilgrim	c7bebcbfd8	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266	2016-01-09 20:59:39 +00:00
Simon Pilgrim	2e7a1849c9	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264	2016-01-09 19:59:27 +00:00
Nico Weber	4324b9b236	Revert r257055, it caused PR26064. llvm-svn: 257066	2016-01-07 15:01:46 +00:00
Simon Pilgrim	bcc11a059e	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur. Follow up to D15310 llvm-svn: 257055	2016-01-07 11:34:27 +00:00
Simon Pilgrim	83e44c66ae	[X86][SSE} Add INSERTPS as a target shuffle Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS. llvm-svn: 257046	2016-01-07 10:24:19 +00:00
Simon Pilgrim	bc82dedd26	[X86] Determine if target shuffle can contain zero elements getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult. This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier. I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it. Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261. Differential Revision: http://reviews.llvm.org/D15378 llvm-svn: 256992	2016-01-06 23:24:40 +00:00
Quentin Colombet	eb61e8e6b0	[X86] Correctly model TLS calls w.r.t. frame requirements. TLS calls need the stack frame to be properly set up and this implies that such calls need ADJUSTSTACK_xxx markers. Fixes PR25820. llvm-svn: 256959	2016-01-06 19:09:26 +00:00

1 2 3 4 5 ...

3836 Commits