llvm-project

Commit Graph

Author	SHA1	Message	Date
Nico Weber	1e058160dd	Revert 273848, it caused PR28329 llvm-svn: 273879	2016-06-27 14:36:46 +00:00
Simon Pilgrim	a45da385f8	[X86][AVX] Peek through bitcasts to find the source of broadcasts AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 273848	2016-06-27 07:44:32 +00:00
Rafael Espindola	ae0d866f56	Refactor a duplicated predicate. NFC. llvm-svn: 273826	2016-06-26 22:13:55 +00:00
Craig Topper	8f577fd5b5	[X86] Rewrite lowerVectorShuffleWithPSHUFB to not require a ZeroableMask to be created. We can do everything with the starting mask and zeroable bit vector. This removes the last usage of isSingleInputShuffleMask. NFC llvm-svn: 273804	2016-06-26 05:10:56 +00:00
Craig Topper	8bba749a48	[X86] Replace calls to isSingleInputShuffleMask with just checking if V2 is UNDEF. Canonicalization and creation of shuffle vector ensures this is equivalent. llvm-svn: 273803	2016-06-26 05:10:53 +00:00
Craig Topper	9a2e979b3d	[X86] Convert ==/!= comparisons with -1 for checking undef in shuffle lowering to comparisons of <0 or >=0. While there do the same for other kinds of index checks that can just check for greater than 0. No functional change intended. llvm-svn: 273788	2016-06-25 19:05:29 +00:00
Craig Topper	53a39d1a63	[X86] Pull similar bitcasts on different paths to earlier shared point. NFC llvm-svn: 273787	2016-06-25 19:05:23 +00:00
Ahmed Bougacha	0851ecd1b0	[X86] Remove dead ISD opcodes. NFC. llvm-svn: 273716	2016-06-24 20:37:55 +00:00
David Majnemer	d770877328	Switch more loops to be range-based This makes the code a little more concise, no functional change is intended. llvm-svn: 273644	2016-06-24 04:05:21 +00:00
Craig Topper	024402dcdf	[X86] Combine two nearby calls to isSingleInputShuffleVector. NFC llvm-svn: 273643	2016-06-24 03:06:11 +00:00
Kyle Butt	991df7889b	Codegen: [X86] preservere memory refs for folded umul_lohi Memory references were not being propagated for this folded load. This prevented optimizations like LICM from hoisting the load. Added test to verify that this allows LICM to proceed. llvm-svn: 273617	2016-06-23 21:40:35 +00:00
Michael Kuperstein	0194d30e09	[X86] Extract HiPE prologue constants into metadata X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset into an Erlang Runtime System-internal data structure (the PCB). As the layout of this data structure is prone to change, this poses problems for maintaining compatibility. To address this problem, the compiler can produce this information as module-level named metadata. For example (where P_NSP_LIMIT is the offending offset): !hipe.literals = !{ !2, !3, !4 } !2 = !{ !"P_NSP_LIMIT", i32 152 } !3 = !{ !"X86_LEAF_WORDS", i32 24 } !4 = !{ !"AMD64_LEAF_WORDS", i32 24 } Patch by Magnus Lang Differential Revision: http://reviews.llvm.org/D20363 llvm-svn: 273593	2016-06-23 18:17:25 +00:00
Craig Topper	597aa42fec	[AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects. llvm-svn: 273543	2016-06-23 07:37:33 +00:00
Craig Topper	8f8bd37dd3	[X86] Add assert to ensure only 128-bit vector types are used. 256 or 512-bit would require lane handling which is missing. llvm-svn: 273542	2016-06-23 07:37:26 +00:00
Reid Kleckner	5340b279ae	[codeview] Add EFLAGS to the list of CodeView register numbers llvm-svn: 273516	2016-06-22 23:50:19 +00:00
Krzysztof Parzyszek	e116d500a7	[SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee The setCallee function will set the number of fixed arguments based on the size of the argument list. The FixedArgs parameter was often explicitly set to 0, leading to a lack of consistent value for non- vararg functions. Differential Revision: http://reviews.llvm.org/D20376 llvm-svn: 273403	2016-06-22 12:54:25 +00:00
Etienne Bergeron	f6be62f2c8	[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4 Summary: Fix the computation of the offsets present in the scopetable when using the SEH (__except_handler4). This patch added an intrinsic to track the position of the allocation on the stack of the EHGuard. This position is needed when producing the ScopeTable. ``` struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (FilterFunc)(); union { void (HandlerAddress)(); void (*FinallyFunc)(); }; }; ``` The code to generate the EHCookie is added in `X86WinEHState.cpp`. Which is adding these instructions when using SEH4. ``` Lfunc_begin0: # BB#0: # %entry pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %ebp, %eax <<-- Loading FramePtr movl %esp, -36(%ebp) movl $-2, -16(%ebp) movl $L__ehtable$use_except_handler4_ssp, %ecx xorl ___security_cookie, %ecx movl %ecx, -20(%ebp) xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie movl %eax, -40(%ebp) <<-- Storing EHGuard leal -28(%ebp), %eax movl $__except_handler4, -24(%ebp) movl %fs:0, %ecx movl %ecx, -28(%ebp) movl %eax, %fs:0 movl $0, -16(%ebp) calll _may_throw_or_crash LBB1_1: # %cont movl -28(%ebp), %eax movl %eax, %fs:0 addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp retl ``` And the corresponding offset is computed: ``` Luse_except_handler4_ssp$parent_frame_offset = -36 .p2align 2 L__ehtable$use_except_handler4_ssp: .long -2 # GSCookieOffset .long 0 # GSCookieXOROffset .long -40 # EHCookieOffset <<---- .long 0 # EHCookieXOROffset .long -2 # ToState .long _catchall_filt # FilterFunction .long LBB1_2 # ExceptionHandler ``` Clang is not yet producing function using SEH4, but it's a work in progress. This patch is a step toward having a valid implementation of SEH4. Unfortunately, it is not yet fully working. The EH registration block is not allocated at the right offset on the stack. Reviewers: rnk, majnemer Subscribers: llvm-commits, chrisha Differential Revision: http://reviews.llvm.org/D21231 llvm-svn: 273281	2016-06-21 15:58:55 +00:00
Craig Topper	283418fbb6	[AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 273253	2016-06-21 07:37:32 +00:00
Craig Topper	0a0fb0fda1	[AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps. llvm-svn: 273240	2016-06-21 03:53:24 +00:00
Craig Topper	e4cf09ad07	[X86] Pre-allocate SmallVector instead of using push_back in a loop. NFC llvm-svn: 273234	2016-06-21 03:05:40 +00:00
Rafael Espindola	0d34826218	Simplify PICStyles. The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222	2016-06-20 23:41:56 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Simon Pilgrim	225b2e37a0	[X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128. Added 32-bit tests for fp128 cast/conversions. llvm-svn: 273210	2016-06-20 22:41:17 +00:00
Rafael Espindola	94eb31a7a9	Delete dead code. NFC. llvm-svn: 273206	2016-06-20 22:08:35 +00:00
Igor Breger	e59165ca63	[AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138	2016-06-20 07:05:43 +00:00
Craig Topper	4296c025c0	[X86] Pass the SDLoc and Mask ArrayRef down from lowerVectorShuffle through all of the other routines instead of recreating them in the handlers for each type. NFC llvm-svn: 273137	2016-06-20 04:00:55 +00:00
Craig Topper	ddf5d2a4a5	[X86] Use existing ArrayRef variable instead of calling SVOp->getMask() repeatedly. Remove nearby else after return as well. NFC llvm-svn: 273136	2016-06-20 04:00:53 +00:00
Craig Topper	01ef65dd79	[X86] Avoid making a copy of a shuffle mask until we're sure we really need to. And just use a SmallVector to do the copy because its easy. llvm-svn: 273135	2016-06-20 04:00:50 +00:00
Simon Pilgrim	0887d5b02e	[X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support llvm-svn: 273125	2016-06-19 20:59:19 +00:00
Simon Pilgrim	0c62bc0324	Strip trailing whitespace. NFCI. llvm-svn: 273124	2016-06-19 20:22:43 +00:00
Simon Pilgrim	2b007189b0	Fixed signed/unsigned warning. llvm-svn: 273120	2016-06-19 18:20:44 +00:00
Simon Pilgrim	3d881a0230	[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values We currently only allow exact matches of shuffle mask patterns during target shuffle combining. This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value. I've adjusted some tests that were requiring exact shuffle masks to now include undef values. Differential Revision: http://reviews.llvm.org/D21495 llvm-svn: 273119	2016-06-19 18:03:52 +00:00
Craig Topper	bbb9a8d255	[X86] Add an assert to ensure that a routine is only used with 128-bit vectors. Reduce SmallVector size accordingly. llvm-svn: 273117	2016-06-19 15:37:39 +00:00
Craig Topper	969457e0e3	[X86] Make is128BitLaneRepeatedShuffleMask correct the indices of the second vector for the smaller mask. This removes some custom correction code and can potentially provide other benefits in the future. llvm-svn: 273116	2016-06-19 15:37:37 +00:00
Craig Topper	54ec3d6b1b	[X86] Remove a dead path through one of the shuffle lowering routines. It's only called on single input shuffles masks already. Add an assert instead to verify. llvm-svn: 273115	2016-06-19 15:37:35 +00:00
Craig Topper	ae21810ce4	[X86] Pre-allocate a SmallVector instead of using push_back in a loop. NFC llvm-svn: 273114	2016-06-19 15:37:33 +00:00
Craig Topper	4181c03c54	[X86] Use SmallVector::assign instead of resize to ensure we really start with a vector of all -1s. Otherwise we're trusting the caller to pass the right thing. This should be no functional change with current code. llvm-svn: 273113	2016-06-19 15:37:30 +00:00
Joerg Sonnenberger	2298203056	doesSetDirectiveSuppressesReloc -> doesSetDirectiveSuppressReloc, the former is grammatically incorrect. llvm-svn: 273100	2016-06-18 23:25:37 +00:00
Zvi Rackover	b346eaa647	test commit: remove trailing whitespace llvm-svn: 273094	2016-06-18 19:13:38 +00:00
Simon Pilgrim	f4b2af1b9f	[X86][SSE4A] Autoupgrade and remove MOVNTSD/MOVNTSS intrinsics Required better annotation of the instruction defs upon removal of the builtin intrinsic pattern. llvm-svn: 273077	2016-06-18 02:38:26 +00:00
Davide Italiano	ef5d8bead1	[X86Subtarget] Use isPositionIndependent(). NFC. Differential Revision: http://reviews.llvm.org/D21480 llvm-svn: 273071	2016-06-18 00:03:20 +00:00
Michael Kuperstein	18d6d3d95e	[X86] Add missing AVX512 anyext patterns. Add AVX512 anyext patterns for i16 and i64, modeled on the existing i8 and i32 patterns. llvm-svn: 273038	2016-06-17 20:21:17 +00:00
Craig Topper	1f083543c9	[X86] Pre-size several SmallVectors instead of calling push_back in a loop. NFC llvm-svn: 272997	2016-06-17 12:20:50 +00:00
Craig Topper	07984f2068	[X86] Fix formatting. NFC llvm-svn: 272996	2016-06-17 12:20:48 +00:00
Rafael Espindola	498b9e06c8	Refactor more duplicated code. llvm-svn: 272939	2016-06-16 19:30:55 +00:00
Sanjoy Das	0ebc9616b4	NFC; refactor getFrameIndexReferenceFromSP Summary: ... into getFrameIndexReferencePreferSP. This change folds the fail-then-retry logic into getFrameIndexReferencePreferSP. There is a non-functional but behaviorial change in WinException -- earlier if `getFrameIndexReferenceFromSP` failed we'd trip an assert, but now we'll silently use the (wrong) offset from the base pointer. I could not write the assert I'd like to write ("FrameReg == StackRegister", like I've done in X86FrameLowering) since there is no easy way to get to the stack register from WinException (happy to be proven wrong here). One solution to this is to add a `bool OnlyStackPointer` parameter to `getFrameIndexReferenceFromSP` that asserts if it could not satisfy its promise of returning an offset from a stack pointer, but that seems overkill. Reviewers: rnk Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D21427 llvm-svn: 272938	2016-06-16 18:54:06 +00:00
Rafael Espindola	ed44cf6ccd	Refactor duplicated code. llvm-svn: 272936	2016-06-16 18:50:12 +00:00
Sanjay Patel	0e9afea3c8	[x86] autoupgrade and remove AVX2 integer min/max intrinsics This will (hopefully very temporarily) break clang. The clang side of this should be the next commit. llvm-svn: 272932	2016-06-16 18:44:20 +00:00
Sanjay Patel	51ab757941	[x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics Follow-up to: http://reviews.llvm.org/rL272806 http://reviews.llvm.org/rL272807 llvm-svn: 272907	2016-06-16 15:48:30 +00:00
Craig Topper	97b1fc92e8	[X86] Pre-size some SmallVectors using the constructor in the shuffle lowering code instead of using push_back. Some of these already did this but used resize or assign instead of the constructor. NFC llvm-svn: 272872	2016-06-16 03:58:45 +00:00
Craig Topper	66f1a8b608	[X86] Remove else after return. NFC llvm-svn: 272871	2016-06-16 03:58:42 +00:00
Craig Topper	ceda65bdc4	[X86] Inline a couple lambdas into their callers since they are only used once and it all fits on a single line. NFC llvm-svn: 272869	2016-06-16 03:11:00 +00:00
Kevin B. Smith	4f81990049	[X86]: Fix for uninitialized access introduced in r272797. llvm-svn: 272835	2016-06-15 20:52:19 +00:00
Sanjay Patel	1a4569df54	[x86] add folds for x86 vector compare nodes (PR27924) Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend too while we're working on that and as a backstop. This fixes: https://llvm.org/bugs/show_bug.cgi?id=27924 Differential Revision: http://reviews.llvm.org/D21356 llvm-svn: 272828	2016-06-15 20:26:58 +00:00
Kevin B. Smith	acbda9ef30	[X86]: Updated r272801 to promote 16 bit compares with immediate operand to 32 bits. This is in response to a comment by Eli Friedman. llvm-svn: 272814	2016-06-15 18:18:05 +00:00
Sanjay Patel	30e0456562	[x86] fix function name; NFC llvm-svn: 272805	2016-06-15 17:12:29 +00:00
Kevin B. Smith	54566a0e9a	[X86]: Quit promoting 8 and 16 bit compares to 32 bit. Differential Revision: http://reviews.llvm.org/D21144 llvm-svn: 272801	2016-06-15 16:37:46 +00:00
Kevin B. Smith	c3c82cdbd0	[X86]: Improve Liveness checking for X86FixupBWInsts.cpp Differential Revision: http://reviews.llvm.org/D21085 llvm-svn: 272797	2016-06-15 16:03:06 +00:00
Igor Breger	64cfd3a442	[AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior. Use BLENDM instead of masked move instruction. Differential Revision: http://reviews.llvm.org/D21001 llvm-svn: 272763	2016-06-15 07:30:38 +00:00
Sanjoy Das	4f7a86c74d	Push a dependent computation into the assert that uses it; NFC ... instead of explicitly conditioning on NDEBUG. Also use an easier to read conditional expression. (Addresses post-commit review from David Blaikie.) llvm-svn: 272762	2016-06-15 07:27:04 +00:00
Sanjoy Das	3f59c0c3ab	Fix unused variable warning; NFC TailCallReturnAddrDelta is used only in an assert, so put it under defined(NDEBUG). llvm-svn: 272760	2016-06-15 06:53:59 +00:00
Sanjoy Das	0272be206a	Don't force SP-relative addressing for statepoints Summary: ... when the offset is not statically known. Prioritize addresses relative to the stack pointer in the stackmap, but fallback gracefully to other modes of addressing if the offset to the stack pointer is not a known constant. Patch by Oscar Blumberg! Reviewers: sanjoy Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm Differential Revision: http://reviews.llvm.org/D21259 llvm-svn: 272756	2016-06-15 05:35:14 +00:00
David Majnemer	cbf614a93b	Remove the ScalarReplAggregates pass Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737	2016-06-15 00:19:09 +00:00
Wei Mi	b799a625f9	[X86] Reduce the width of multiplification when its operands are extended from i8 or i16 For <N x i32> type mul, pmuludq will be used for targets without SSE41, which often introduces many extra pack and unpack instructions in vectorized loop body because pmuludq generates <N/2 x i64> type value. However when the operands of <N x i32> mul are extended from smaller size values like i8 and i16, the type of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which generates better code. For targets with SSE41, pmulld is supported so no shrinking is needed. Differential Revision: http://reviews.llvm.org/D20931 llvm-svn: 272694	2016-06-14 18:53:20 +00:00
Simon Pilgrim	cf1165b86e	[X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS llvm-svn: 272651	2016-06-14 09:43:38 +00:00
Craig Topper	34d9707825	[AVX512] Use AND32ri8 instead of AND32ri when anding with 1 to create single bit masks. This results in a smaller encoding. llvm-svn: 272627	2016-06-14 03:13:03 +00:00
Craig Topper	99e30e6a66	[AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626	2016-06-14 03:13:00 +00:00
Craig Topper	ddab395397	[AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 272625	2016-06-14 03:12:54 +00:00
David Majnemer	248190ba69	[X86] Remove llvm.x86.bit.scan.{forward,reverse}.32 The need for these intrinsics has been obviated by r272564 which reimplements their functionality using generic IR. llvm-svn: 272566	2016-06-13 17:33:13 +00:00
Haojian Wu	7900ca1e7e	Fix an enumeral mismatch warning. Summary: The "-Werror=enum-compare" shows that the statement is using two different enums: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' A follow-up fix on D21235. Reviewers: klimek Subscribers: spatel, cfe-commits Differential Revision: http://reviews.llvm.org/D21278 llvm-svn: 272539	2016-06-13 09:03:45 +00:00
Craig Topper	13cf7cac07	[AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector. llvm-svn: 272527	2016-06-13 02:36:48 +00:00
Benjamin Kramer	4ca41fd09e	Run clang-tidy's performance-unnecessary-copy-initialization over LLVM. No functionality change intended. llvm-svn: 272516	2016-06-12 17:30:47 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Sanjay Patel	977530a8c9	[x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044) This patch is intended to solve: https://llvm.org/bugs/show_bug.cgi?id=28044 By changing the definition of X86ISD::CMPP to use float types, we allow it to be created and pass legalization for an SSE1-only target where v4i32 is not legal. The motivational trail for this change includes: https://llvm.org/bugs/show_bug.cgi?id=28001 and eventually makes this trigger: http://reviews.llvm.org/D21190 Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86 intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86 intrinsics, a big pile of generic optimizations can trigger. Differential Revision: http://reviews.llvm.org/D21235 llvm-svn: 272511	2016-06-12 15:03:25 +00:00
Craig Topper	1067986c5b	[X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector. llvm-svn: 272510	2016-06-12 14:11:32 +00:00
Craig Topper	251030babe	[AVX512] Remove the masked palignr intrinsics that I forgot to remove when I added auto-upgrade code to turn them into shufflevectors and selects. llvm-svn: 272497	2016-06-12 04:14:13 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Simon Pilgrim	5b9bade8dd	[X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach. llvm-svn: 272477	2016-06-11 15:44:13 +00:00
Simon Pilgrim	b13961d25b	Strip trailing whitespace. NFCI. llvm-svn: 272476	2016-06-11 14:34:10 +00:00
Craig Topper	504fba5c8a	[AVX512] Lower v8i64 and v16i32 to pshufd when possible. llvm-svn: 272473	2016-06-11 13:43:21 +00:00
Simon Pilgrim	6800a45790	[X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining llvm-svn: 272471	2016-06-11 13:38:28 +00:00
Simon Pilgrim	255fdd0666	[X86][SSE] Use vXi8 return type for PSLLDQ/PSRLDQ instructions These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements llvm-svn: 272468	2016-06-11 12:54:37 +00:00
Simon Pilgrim	d386941676	[X86][AVX512] Tidied up VSHUFF32x4/VSHUFF64x2/VSHUFI32x4/VSHUFI64x2 comment generation Now matches other shuffles llvm-svn: 272464	2016-06-11 11:18:38 +00:00
Chandler Carruth	4c0e94dce6	Try a bit harder to remove the signed and unsigned comparison warning. Hopefully this time it actually works and stays away. llvm-svn: 272463	2016-06-11 09:13:00 +00:00
Chandler Carruth	306e270b83	Compare to an unsigned literal to avoid a -Wsign-compare warning. llvm-svn: 272459	2016-06-11 08:02:01 +00:00
Craig Topper	40abd1cc61	[AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW. llvm-svn: 272450	2016-06-11 03:27:42 +00:00
Craig Topper	b9b86fcfff	[AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled. llvm-svn: 272449	2016-06-11 03:27:37 +00:00
Sanjoy Das	39c226fdba	[STLExtras] Introduce and use llvm::count_if; NFC (This is split out from was D21115) llvm-svn: 272435	2016-06-10 21:18:39 +00:00
Sanjay Patel	b114fd65fc	[x86] enable bitcasted fabs/fneg transforms The vector cases don't change because we already have folds in X86ISelLowering to look through and remove bitcasts. llvm-svn: 272427	2016-06-10 20:33:50 +00:00
Michael Kuperstein	9a0542a792	[X86] Add costs for SSE zext/sext to v4i64 to TTI The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406	2016-06-10 17:01:05 +00:00
Roman Shirokiy	d93998f606	Test commit llvm-svn: 272393	2016-06-10 13:12:48 +00:00
Craig Topper	200d237e57	[AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ. llvm-svn: 272371	2016-06-10 05:12:40 +00:00
Craig Topper	89c1761474	[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names. llvm-svn: 272367	2016-06-10 04:48:05 +00:00
Simon Pilgrim	643734c565	[X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments llvm-svn: 272319	2016-06-09 22:03:15 +00:00
Simon Pilgrim	f718682eb9	[X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ llvm-svn: 272308	2016-06-09 21:09:03 +00:00
Simon Pilgrim	47c76e201a	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR llvm-svn: 272307	2016-06-09 20:53:12 +00:00
Simon Pilgrim	0ab9d3026a	[X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget llvm-svn: 272300	2016-06-09 20:13:58 +00:00
Igor Breger	f635367e2b	[AVX512] Remove masked_move/blendm intrinsic from back-end. This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 llvm-svn: 272257	2016-06-09 11:46:55 +00:00
Craig Topper	6f7288dc44	[AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. llvm-svn: 272252	2016-06-09 07:49:08 +00:00
Craig Topper	7a2993093e	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249	2016-06-09 07:06:38 +00:00
Craig Topper	565a5b5451	[X86] Fix bad comment in assert. NFC llvm-svn: 272248	2016-06-09 07:06:33 +00:00
Benjamin Kramer	46e38f3678	Avoid copies of std::strings and APInt/APFloats where we only read from it As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126	2016-06-08 10:01:20 +00:00
Igor Breger	982e4003a6	[AVX512] Fix cvtusi2sd instruction Opcode, it should be 0x7B instead of 0x2A. llvm-svn: 272122	2016-06-08 07:48:23 +00:00
Etienne Bergeron	22bfa83208	[stack-protection] Add support for MSVC buffer security check Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053	2016-06-07 20:15:35 +00:00
Simon Pilgrim	35c06a0282	[X86][SSE] Add general lowering of nontemporal vector loads (fixed bad merge) Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272011	2016-06-07 13:47:23 +00:00
Simon Pilgrim	9a89623b57	[X86][SSE] Add general lowering of nontemporal vector loads Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010	2016-06-07 13:34:24 +00:00
Igor Breger	61e628591f	[AVX512] Fix load opcode for fast isel. Differential Revision: http://reviews.llvm.org/D21067 llvm-svn: 272006	2016-06-07 13:08:45 +00:00
Simon Pilgrim	ca1da1bf07	[X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. llvm-svn: 272003	2016-06-07 12:20:14 +00:00
Craig Topper	2f90c1fedf	[AVX512] Allow avx2 and sse41 nontemporal load intrinsics to select EVEX encoded instructions when VLX is enabled. llvm-svn: 271988	2016-06-07 07:27:57 +00:00
Craig Topper	e1cac15feb	[AVX512] Remove unnecessary mayLoad, mayStore, hasSidEffects flags from instructions that have patterns that imply them. Add the same set of flags to instructions that don't have patterns to imply them. llvm-svn: 271987	2016-06-07 07:27:54 +00:00
Craig Topper	0fcf925699	[AVX512] Add NoVLX to a couple patterns that have VLX equivalents. Ordering of the patterns in the .td file protects this, but its better to be explicit. llvm-svn: 271986	2016-06-07 07:27:51 +00:00
Igor Breger	edafb0595e	[KNL] Fix UMULO lowering. Differential Revision: http://reviews.llvm.org/D21013 llvm-svn: 271891	2016-06-06 12:24:52 +00:00
Filipe Cabecinhas	6e7d5467c0	[NFC] Silence gcc warning (-Wsign-compare) llvm-svn: 271882	2016-06-06 10:49:56 +00:00
Craig Topper	143446d5c1	[AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32. llvm-svn: 271870	2016-06-06 05:39:10 +00:00
Simon Pilgrim	64c6de4525	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines llvm-svn: 271834	2016-06-05 15:21:30 +00:00
Simon Pilgrim	478295dadd	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type llvm-svn: 271831	2016-06-05 15:01:45 +00:00
Simon Pilgrim	163987a235	[X86][XOP] Tidied up DecodeVPERMIL2PMask to more closely match DecodeVPERMILPMask. llvm-svn: 271830	2016-06-05 14:33:43 +00:00
Craig Topper	8eeda57a40	[AVX512] Add support for lowering PALIGNR for v64i8. Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang. llvm-svn: 271828	2016-06-05 06:29:12 +00:00
Craig Topper	9f51c9ef15	[AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled. v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled. llvm-svn: 271826	2016-06-05 05:35:11 +00:00
Simon Pilgrim	2ead861d07	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding llvm-svn: 271809	2016-06-04 21:44:28 +00:00
Craig Topper	e609bd6600	[X86] Add the VR128L/H and VR256L/H to the list of vector register classes for inline asm constraints. Also fix the comment on the function. llvm-svn: 271802	2016-06-04 20:15:08 +00:00
Saleem Abdulrasool	1fcdc23a6e	X86: enable TLS on Windows itanium Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for C++). Enable the TLS support for the target similar to the MSVC model. llvm-svn: 271797	2016-06-04 18:27:22 +00:00
Simon Pilgrim	fd2eda4f64	[X86][AVX2] Fix v16i16 SHL lowering (PR27730) The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result. The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits. llvm-svn: 271796	2016-06-04 16:45:33 +00:00
Craig Topper	6ae375c9ba	[X86] Use smaller types to shrink the intrinsic lowering tables by about 12K. llvm-svn: 271776	2016-06-04 04:32:17 +00:00
Craig Topper	5250634334	[X86] Use X86ISD::ABS for lowering pabs SSSE3/AVX intrinsics to match AVX512. Should allow those intrinsics to use the EVEX encoded instructions and get the extra registers when available. llvm-svn: 271775	2016-06-04 04:32:15 +00:00
Simon Pilgrim	e85506b6e0	[X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage. The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls. Mask decoding/target shuffle support will be added in future patches. Differential Revision: http://reviews.llvm.org/D20049 llvm-svn: 271633	2016-06-03 08:06:03 +00:00
Craig Topper	5e3d314488	[X86] Fix some isel patterns to remove an operand from some multiclasses. NFC llvm-svn: 271631	2016-06-03 05:58:52 +00:00
Craig Topper	e7ae106147	[AVX512] Ensure EVEX vpshufd, vpshuflw, and vpshufhw have isel priority over the VEX encoded ones. llvm-svn: 271629	2016-06-03 05:31:04 +00:00
Craig Topper	01f53b1773	[AVX512] Fix shuffle comment printing for EVEX encoded PSHUFD, PSHUFHW, and PSHUFLW. llvm-svn: 271628	2016-06-03 05:31:00 +00:00
Craig Topper	dc70d8a4b7	[X86] Simplify a multiclass to remove a parameter. NFC llvm-svn: 271627	2016-06-03 05:30:56 +00:00
Craig Topper	2388b4610a	[X86] Remove unnecessary pattern predicates from the vector bit cast patterns. The types have to be legal and there are no alternative patterns. Saves almost 200 bytes in isel table. llvm-svn: 271625	2016-06-03 04:15:27 +00:00
Craig Topper	19462f02bb	[X86] Cleanup formatting a bit to align similar parts of adjacent lines. llvm-svn: 271624	2016-06-03 04:15:25 +00:00
Craig Topper	895897f85b	[X86] Remove redundant bitcast patterns for 128/256-bit vectors. These only differ from the SSE/AVX versions by the register class, but register class has no bearing on isel. llvm-svn: 271623	2016-06-03 04:15:22 +00:00
Ahmed Bougacha	63f78b0206	[X86] Define segment MI operands as regs instead of i8imm. We've been pretending that segments are i8imm since the initial support (r68645), predating the addition of the SEGMENT_REG class (r81895). That happens to works, but is wrong, and inconsistent with how we print (e.g., X86ATTInstPrinter::printMemReference) and parse them (e.g., X86Operand::addMemOperands). This change shouldn't affect any tool users, but is visible to library users or out-of-tree tablegen backends: this causes MCOperandInfo for the segment op to have an RC instead of "unknown", and TII::getRegClass to actually return something. As the registers are reserved and no vregs of the class ever created, that shouldn't change anything. No test change; no suspicious getRegClass() in X86 and CodeGen. llvm-svn: 271559	2016-06-02 18:29:15 +00:00
Dimitry Andric	6a482a73d6	Only attempt to detect AVG if SSE2 is available Summary: In PR29973 Sanjay Patel reported an assertion failure when a certain loop was optimized, for a target without SSE2 support. It turned out this was because of the AVG pattern detection introduced in rL253952. Prevent the assertion failure by bailing out early in `detectAVGPattern()`, if the target does not support SSE2. Also add a minimized test case. Reviewers: congh, eli.friedman, spatel Subscribers: emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D20905 llvm-svn: 271548	2016-06-02 17:30:49 +00:00
Simon Pilgrim	0afd5a4d80	[X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm) This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead. Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower. Differential Revision: http://reviews.llvm.org/D20860 llvm-svn: 271510	2016-06-02 10:55:21 +00:00
Craig Topper	048a08af66	[AVX512] Add 512-bit load/stores to fast isel. llvm-svn: 271486	2016-06-02 04:51:37 +00:00
Craig Topper	292a86db5b	[X86] No need to use 256-bit VMOVNTPS for integer types when only AVX1 is supported. VMOVNTDQ is available with AVX1. We were getting this right for v4i64 but not the other integer types. llvm-svn: 271482	2016-06-02 04:19:48 +00:00
Craig Topper	ca9c0801e1	[X86] Add AVX 256-bit load and stores to fast isel. I'm not sure why this was missing for so long. This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit. llvm-svn: 271481	2016-06-02 04:19:45 +00:00
Craig Topper	6611188633	[X86] Use uint16_t for a couple arrays of instruction opcodes. NFC llvm-svn: 271480	2016-06-02 04:19:42 +00:00
Craig Topper	5bb9cda620	[AVX512] Remove LOADA/LOADU/STOREA/STOREU intrinsic types now that they are unused. llvm-svn: 271479	2016-06-02 04:19:40 +00:00
Craig Topper	f10fbfa738	[AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead. The intrinsics will be autoupgraded to the same generic masked loads. llvm-svn: 271478	2016-06-02 04:19:36 +00:00
Michael Zuckerman	6a894956fc	Adding back-end support to two bit scanning intrinsics Adding LLVM back-end support to two intrinsics dealing with bit scan: _bit_scan_forward and _bit_scan_reverse. Their functionality is as described in Intel intrinsics guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_forward&expand=371,370 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_bit_scan_reverse&expand=371,370 Commit on behalf of Omer Paparo Bivas Differential Revision: http://reviews.llvm.org/D19915 llvm-svn: 271386	2016-06-01 12:02:37 +00:00
Craig Topper	4f2d5a68d3	Revert r271362 "[AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead." Looks like something isn't quite right still. Also forgot to move the test cases to an autoupgrade test. llvm-svn: 271363	2016-06-01 05:57:55 +00:00
Craig Topper	dacd9d2bac	[AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead. The intrinsics will be autoupgraded to the same generic masked loads. llvm-svn: 271362	2016-06-01 05:35:16 +00:00
Kevin B. Smith	ed0b620a65	[X86]: Add a pattern that uses GR16_ABCD rather than GR32_ABCD to avoid falsely marking whole 32 bit register as live. Differential Revision: http://reviews.llvm.org/D20649 llvm-svn: 271341	2016-05-31 22:00:12 +00:00
Yaron Keren	a34bfa000f	Do not modify a std::vector while looping it. Introduced in r271244, this is probably undefined behaviour and asserts when compiled with Visual C++ debug mode. On further note, the loop is quadratic with regard to the number of successors since removeSuccessor is linear and could probably be modified to linear time. llvm-svn: 271278	2016-05-31 13:45:05 +00:00
Simon Pilgrim	e05dc45897	[X86][SSE] Add load-folding patterns for (V)CVTDQ2PD (PR27291) Added patterns for (V)CVTDQ2PD -> 2f64 loading from a 64-bit source. llvm-svn: 271269	2016-05-31 12:04:35 +00:00
Igor Breger	73ee8ba9b0	[AVX512] Fix intrinsic vcvtps2ph lowering. Differential Revision: http://reviews.llvm.org/D20788 llvm-svn: 271255	2016-05-31 08:04:21 +00:00
Igor Breger	52bd1d5fcc	Fix intrinsic vbroadcast{i32\|f32}x2 lowering. Differential Revision: http://reviews.llvm.org/D20780 llvm-svn: 271254	2016-05-31 07:43:39 +00:00
Craig Topper	50f85c22c5	[AVX512] Remove masked store intrinsics. Clang now emits generic masked store intrinsics instead. The intrinsics will be autoupgraded to the same generic masked stores. llvm-svn: 271245	2016-05-31 01:50:02 +00:00
Saleem Abdulrasool	d2f705ddf9	X86: permit using SjLj EH on x86 targets as an option This adds support to the backed to actually support SjLj EH as an exception model. This is NOT the default model, and requires explicitly opting into it from the frontend. GCC supports this model and for MinGW can still be enabled via the `--using-sjlj-exceptions` options. Addresses PR27749! llvm-svn: 271244	2016-05-31 01:48:07 +00:00
Craig Topper	8287fd8abd	[X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses them. Auto upgrade to native unaligned store instructions. llvm-svn: 271236	2016-05-30 23:15:56 +00:00
Rafael Espindola	4f1062adb8	Fix a crash when producing COFF. llvm-svn: 271229	2016-05-30 20:18:53 +00:00
Rafael Espindola	9768d73c74	Move RelaxELFRel out to llvm-mc. llvm-svn: 271160	2016-05-29 01:11:00 +00:00
Simon Pilgrim	9602d678cb	[X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 271131	2016-05-28 18:03:41 +00:00
Rafael Espindola	52bd330500	Fix production of R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX. We were producing R_X86_64_GOTPCRELX for invalid instructions and sometimes producing R_X86_64_GOTPCRELX instead of R_X86_64_REX_GOTPCRELX. llvm-svn: 271118	2016-05-28 15:51:38 +00:00
Sanjay Patel	97c2c108fd	[x86] avoid printing unnecessary sign bits of hex immediates in asm comments (PR20347) It would be better to check the valid/expected size of the immediate operand, but this is generally better than what we print right now. Differential Revision: http://reviews.llvm.org/D20385 llvm-svn: 271114	2016-05-28 14:58:37 +00:00
Ahmed Bougacha	a3dc1ba142	[X86] Try to zero elts when lowering 256-bit shuffle with PSHUFB. Otherwise we fallback to a blend of PSHUFBs later on. Differential Revision: http://reviews.llvm.org/D19661 llvm-svn: 271113	2016-05-28 14:38:04 +00:00
Rafael Espindola	2d39bb3c6a	Simplify and clang-format a table. llvm-svn: 271112	2016-05-28 11:13:34 +00:00
Michael Kuperstein	a75c77b127	[X86] Detect SAD patterns and emit psadbw instructions. This recommits r267649 with a fix for PR27539. Differential Revision: http://reviews.llvm.org/D20598 llvm-svn: 271033	2016-05-27 18:53:22 +00:00
Ahmed Bougacha	346b98011c	[X86] Clarify PSHUFB+blend lowering function name. NFC. Also guard against v32i8 users. llvm-svn: 271024	2016-05-27 17:58:17 +00:00
Simon Pilgrim	4642a57fbf	Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) llvm-svn: 270976	2016-05-27 09:02:25 +00:00
Simon Pilgrim	c013e5737b	[X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. A companion patch (D20684) removes/auto-upgrade the clang intrinsics. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 270973	2016-05-27 08:49:15 +00:00
Simon Pilgrim	cf340bd9c1	[X86][SSE] When lowering a 256-bit shuffle as PMOVZX, reduce the input vector to the lower 128-bit subvector. Most often as not this is what it started out as, the extraction is zero-cost on AVX and the PMOVZX/PMOVSX folding logic is based around 128-bit loads. llvm-svn: 270858	2016-05-26 15:40:36 +00:00
Rafael Espindola	a224de06bc	Use shouldAssumeDSOLocal on AArch64. This reduces code duplication and now AArch64 also handles PIE. llvm-svn: 270844	2016-05-26 12:42:55 +00:00
Igor Breger	8437bb70fd	[AVX512] Fix intrinsic cmp{sd\|ss} lowering. Differential Revision: http://reviews.llvm.org/D20615 llvm-svn: 270843	2016-05-26 12:42:25 +00:00
Simon Pilgrim	4810683ddf	Simplify std::all_of/any_of predicates by using llvm::all_of/any_of. NFCI. llvm-svn: 270753	2016-05-25 20:41:11 +00:00
Rafael Espindola	84f0562064	Fix shouldAssumeDSOLocal for private linkage. llvm-svn: 270746	2016-05-25 19:55:16 +00:00
Sanjay Patel	aedc347b29	[x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729	2016-05-25 17:27:54 +00:00
Sanjay Patel	3955360b24	[x86, AVX] allow explicit calls to VZERO* to modify state in VZeroUpperInserter pass (PR27823) As noted in the review, there are still problems, so this doesn't the bug completely. Differential Revision: http://reviews.llvm.org/D20529 llvm-svn: 270718	2016-05-25 16:39:47 +00:00
Simon Pilgrim	4298d06d0f	[X86][SSE] Replace (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) lossless conversion intrinsics with generic IR Followup to D20528 clang patch, this removes the (V)CVTDQ2PD(Y) and (V)CVTPS2PD(Y) llvm intrinsics and auto-upgrades to sitofp/fpext instead. Differential Revision: http://reviews.llvm.org/D20568 llvm-svn: 270678	2016-05-25 08:59:18 +00:00
Craig Topper	12e322a8cf	[X86] Remove the llvm.x86.sse2.storel.dq intrinsic. It hasn't been used in a long time. llvm-svn: 270677	2016-05-25 06:56:32 +00:00
Igor Breger	23c2090606	[llvm][AVX512][intrinsics] Fix vperm{b\|w\|d\|q\|ps\|pd} intrinsics. Index is second argument to buildin function but it is first instruction operand. Differential Revision: http://reviews.llvm.org/D20515 llvm-svn: 270548	2016-05-24 11:06:22 +00:00
Simon Pilgrim	14000b3cea	[CostModel][X86][XOP] Added XOP costmodel for BITREVERSE Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537	2016-05-24 08:17:50 +00:00
Sanjay Patel	8099fb7e0a	fix typo; NFC llvm-svn: 270469	2016-05-23 18:01:20 +00:00
Sanjay Patel	13a0d49813	use range-loop; NFCI llvm-svn: 270467	2016-05-23 18:00:50 +00:00
Aaron Ballman	a81264ba09	Removing a switch statement that contains only a default label; NFC. llvm-svn: 270444	2016-05-23 15:52:59 +00:00
Craig Topper	a6d0104823	[X86] Use instruction aliases to replace custom asm parser code for optimizing moves to use 2 byte VEX prefix. llvm-svn: 270394	2016-05-23 04:02:27 +00:00
Craig Topper	95bdabd338	[AVX512] Add patterns to implement stores of extracts of least signficant subvectors using XMM or YMM stores instead of the vector extract instructions. Similar is already done for AVX and we had lost it going to AVX512VL. llvm-svn: 270383	2016-05-22 23:44:33 +00:00
Sanjay Patel	2959ff4a88	[x86, AVX] don't add a vzeroupper if that's what the code is already doing (PR27823) This isn't the complete fix, but it handles the trivial examples of duplicate vzero* ops in PR27823: https://llvm.org/bugs/show_bug.cgi?id=27823 ...and amusingly, the bogus cases already exist as regression tests, so let's take this baby step. We'll need to do more in the general case where there's legitimate AVX usage in the function + there's already a vzero in the code. Differential Revision: http://reviews.llvm.org/D20477 llvm-svn: 270378	2016-05-22 20:22:47 +00:00
Igor Breger	2ba64ab9ae	[AVX512] Implement missing patterns for any_extend load lowering. Differential Revision: http://reviews.llvm.org/D20513 llvm-svn: 270357	2016-05-22 10:21:04 +00:00
Craig Topper	5f3fef884f	[AVX512] The AVX512 file only need subtract_subvector index 0 patterns where the source is 512-bits. The 256-bit source patterns were redundant with AVX. llvm-svn: 270356	2016-05-22 07:40:58 +00:00
Craig Topper	a1041ff001	[AVX512] Add an AddedComplexity line to the 512-bit insert_subvector undef index 0 patterns. This gives them higher priority than the memory patterns. This matches AVX1/2. llvm-svn: 270355	2016-05-22 07:40:40 +00:00
Craig Topper	de5498546e	[AVX512] Change the AddedComplexity on some patterns to match their AVX/SSE equivalents. This helps group them close together in the isel tables and enable table compression. llvm-svn: 270354	2016-05-22 06:09:34 +00:00
Craig Topper	33c550cb95	[AVX512] Add a couple patterns to fix some cases where two vector mask inversions could appear in a row. llvm-svn: 270344	2016-05-22 00:39:30 +00:00
Craig Topper	dbac1ff9c1	[AVX512] Remove seemingly unnecessary AddedComplexity adjustment. llvm-svn: 270343	2016-05-22 00:39:27 +00:00
Craig Topper	3fc3b4453d	[X86] Remove unnecessary alignment check on patterns that use VEXTRACTF128 for integer types when only AVX1 is supported. llvm-svn: 270335	2016-05-21 22:50:18 +00:00
Craig Topper	db960eddfa	[AVX512] Add patterns for extracting subvectors and storing to memory. llvm-svn: 270334	2016-05-21 22:50:14 +00:00
Craig Topper	03b849eb44	[AVX512] Capitalize the Z in VEXTRACTPSzmr. Lowercase z has been primarily used to indicating the zero masking behavior which is not the case here. NFC llvm-svn: 270333	2016-05-21 22:50:11 +00:00
Craig Topper	d5da6a39f2	[AVX512] Rename vector extract instructions so 'mr' intead of 'rm' to reflect the fact that memory is the destination. llvm-svn: 270332	2016-05-21 22:50:09 +00:00
Craig Topper	08a6857c82	[AVX512] Fix copy/paste mistake a I made in a comment. llvm-svn: 270331	2016-05-21 22:50:04 +00:00
Michael Zuckerman	a63a129749	[Clang][AVX512][intrinsics] Fix rcp and sqrt intrinsics. Differential Revision: http://reviews.llvm.org/D20438 llvm-svn: 270322	2016-05-21 14:44:18 +00:00
Michael Zuckerman	11b55b29d1	[Clang][AVX512][intrinsics] Fix vscalef intrinsics. Differential Revision: http://reviews.llvm.org/D20324 llvm-svn: 270321	2016-05-21 11:09:53 +00:00
Craig Topper	02626c076b	[AVX512] Add patterns for VEXTRACT v16i16->v8i16 and v32i8->v16i8. Disable AVX2 versions of vector extract when AVX512VL is enabled. llvm-svn: 270318	2016-05-21 07:08:56 +00:00
Craig Topper	22ae353207	[AVX512] Disable AVX2 VPERMD, VPERMQ, VPERMPS, and VPERMPD patterns when AVX512VL is enabled. Also add shuffle comment printing for AVX512VL VPERMPD/VPERMQ to keep some tests that now use these instructions instead of the AVX2 ones. llvm-svn: 270317	2016-05-21 06:07:18 +00:00
Craig Topper	6be70deda3	[AVX512] Disable AVX/AVX2 VBROADCASTSS/VBROADCASTSD patterns when AVX512VL is enabled. llvm-svn: 270316	2016-05-21 05:47:25 +00:00
Craig Topper	97565ded80	[AVX512] Disable AVX/AVX2 patterns for VPSADBW and VPMULUDQ when the AVX512VL/AVX512BWI equivalents are available. llvm-svn: 270311	2016-05-21 03:52:32 +00:00
Craig Topper	b395105584	[X86] Convert some SSE2/AVX2 intrinsics to ISD opcodes during lowering instead of pattern matching the intrinsics. This unifies handling with AVX512 and allows these intrinsics to select EVEX encoded instructions to increase available registers. llvm-svn: 270310	2016-05-21 03:52:28 +00:00
David Majnemer	498f2fd11b	Address post-review for r270246 This gets rid of some unnecessary SmallStrings in X86TargetMachine::getSubtargetImpl. No functionality change is intended. llvm-svn: 270270	2016-05-20 20:41:24 +00:00
David Majnemer	ca29023b02	[X86] Reduce memory allocations in X86TargetMachine::getSubtargetImpl We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246	2016-05-20 18:16:06 +00:00
Sanjay Patel	5496a23316	fix comments; NFC llvm-svn: 270237	2016-05-20 17:07:19 +00:00
Sanjay Patel	1dc57cb944	use range-loops; NFCI llvm-svn: 270236	2016-05-20 17:00:10 +00:00
Sanjay Patel	8bc63b2f47	fix documentation comments; NFC llvm-svn: 270234	2016-05-20 16:46:01 +00:00
Simon Pilgrim	55ef3da27b	[X86][AVX] Generalized matching for target shuffle combines This patch is a first step towards a more extendible method of matching combined target shuffle masks. Initially this just pulls out the existing basic mask matches and adds support for some 256/512 bit equivalents. Future patterns will require a number of features to be added but I wanted to keep this patch simple. I hope we can avoid duplication between shuffle lowering and combining and share more complex pattern match functions in future commits. Differential Revision: http://reviews.llvm.org/D19198 llvm-svn: 270230	2016-05-20 16:19:30 +00:00
Rafael Espindola	c7e9813228	Refactor X86 symbol access classification. This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209	2016-05-20 12:20:10 +00:00
Craig Topper	b182715a52	[X86] Fix another AVX pattern to only be disable if VLX and BWI are supported. llvm-svn: 270182	2016-05-20 05:10:27 +00:00
Craig Topper	0a7a8dee2b	[X86] Fix some AVX patterns to only be disabled if VLX and BWI are supported. Without this we get isel failures on the avx-intrinsics-x86.ll test in AVX512VL. llvm-svn: 270174	2016-05-20 02:00:08 +00:00
Rafael Espindola	ab03eb007c	Record a TargetMachine instead of a Reloc::Model. Addresses r270095's code review. llvm-svn: 270147	2016-05-19 22:07:57 +00:00
Hans Wennborg	172eee9cfc	X86: Don't reset the stack after calls that don't return (PR27117) Since the calls don't return, the instruction afterwards will never run, and is just taking up unnecessary space in the binary. Differential Revision: http://reviews.llvm.org/D20406 llvm-svn: 270109	2016-05-19 20:15:33 +00:00
Rafael Espindola	46107b9e62	Remember the relocation model. NFC. This avoids passing a TargetMachine in a few places. llvm-svn: 270095	2016-05-19 18:49:29 +00:00
Rafael Espindola	cb2d266360	Style fixes. NFC. llvm-svn: 270093	2016-05-19 18:34:20 +00:00
Andrey Turetskiy	45b22a4aff	[X86] Enable RRL part of the LEA optimization pass for -O2. Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2. This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro. There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006). Differential Revision: http://reviews.llvm.org/D19659 llvm-svn: 270036	2016-05-19 10:18:29 +00:00
Craig Topper	19e04b6430	[X86] Generalize and combine some similar type constraints and node types. No changes to the isel table size so the separation wasn't buying us anything. llvm-svn: 270026	2016-05-19 06:13:58 +00:00
Craig Topper	9152f5fcdf	[X86] Simplify some type constraints by removing parts that were already implied. llvm-svn: 270025	2016-05-19 06:13:48 +00:00
Craig Topper	4fcff19ff5	[X86] Remove some type constraint classes and use already existing stricter classes. llvm-svn: 270013	2016-05-19 02:05:58 +00:00
Craig Topper	7ee092a268	[AVX512] Strengthen type constraints for VFIXUPIMM patterns and combine the type constraints for vector and scalar. llvm-svn: 270012	2016-05-19 02:05:55 +00:00
Rafael Espindola	8c34dd8257	Delete Reloc::Default. Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988	2016-05-18 22:04:49 +00:00
Sanjay Patel	e99014d471	clean up; NFCI llvm-svn: 269962	2016-05-18 17:23:38 +00:00
Hans Wennborg	8eb336c14e	Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949	2016-05-18 16:10:17 +00:00
Rafael Espindola	38af4d6347	Trivial cleanups. This just clang formats and cleans comments in an area I am about to post a patch for review. llvm-svn: 269946	2016-05-18 16:00:24 +00:00
Ashutosh Nema	348af9cc6b	Add new flag and intrinsic support for MWAITX and MONITORX instructions Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911	2016-05-18 11:59:12 +00:00
Craig Topper	095fc41523	[AVX512] Strengthen type constraints on my rounding mode inputs and some immediate inputs. llvm-svn: 269886	2016-05-18 06:56:01 +00:00
Craig Topper	74ed087b0b	[AVX512] Strengthen type checks on the X86ISD::SELECT node. Saves over 800 bytes in the DAG isel table by removing type checks for the condition operand which is always a vector or scalar of i1 matching the the number of elements in the other operands. llvm-svn: 269885	2016-05-18 06:55:59 +00:00
Hans Wennborg	759af30109	Revert r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions" Seems to have broken the Windows ASan bot. Reverting while investigating. llvm-svn: 269833	2016-05-17 20:38:56 +00:00
Hans Wennborg	c3fb51171e	X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions This patch moves the expansion of WIN_ALLOCA pseudo-instructions into a separate pass that walks the CFG and lowers the instructions based on a conservative estimate of the offset between the stack pointer and the lowest accessed stack address. The goal is to reduce binary size and run-time costs by removing calls to _chkstk. While it doesn't fix all the code quality problems with inalloca calls, it's an incremental improvement for PR27076. Differential Revision: http://reviews.llvm.org/D20263 llvm-svn: 269828	2016-05-17 20:13:29 +00:00
Rafael Espindola	712f957cae	Simplify handling of hidden stub. Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776	2016-05-17 16:01:32 +00:00
David L Kreitzer	e7c583e06f	Fix for PR27750. Correctly handle the case where the fallthrough block and target block are the same in getFallThroughMBB. Differential Revision: http://reviews.llvm.org/D20288 llvm-svn: 269760	2016-05-17 12:47:46 +00:00
Michael Kuperstein	ac2088d122	[X86] Remove transformVSELECTtoBlendVECTOR_SHUFFLE The new X86 shuffle lowering can do just fine without transforming vselects into vector_shuffles. It looks like the only thing this code does right now is cause trouble - in particular, it can lead to combine/legalization infinite loops. Note that it's not completely NFC, since some of the shuffle masks get inverted, which may cause slight differences further down the line. We may want to find a way to invert those masks, but that's orthogonal to this commit. This fixes the hang in PR27689. llvm-svn: 269676	2016-05-16 18:27:00 +00:00
Simon Pilgrim	0d05484db6	Fixed unused variable warning llvm-svn: 269650	2016-05-16 11:48:54 +00:00
Simon Pilgrim	265995ef53	[X86][SSSE3] Lower vector CTLZ with PSHUFB lookups This patch uses PSHUFB to lower vector CTLZ and avoid (slower) scalarizations. The leading zero count of each 4-bit nibble of the vector is determined by using a PSHUFB lookup. Pairs of results are then repeatedly combined up to the original element width. Differential Revision: http://reviews.llvm.org/D20016 llvm-svn: 269646	2016-05-16 11:19:11 +00:00
Simon Pilgrim	2d4bf1042b	[X86][SSE] Simplify zero'th index extract element matching llvm-svn: 269615	2016-05-15 20:22:50 +00:00
Simon Pilgrim	fbe97bc15a	[X86][SSE] Removed duplicate variables. NFCI. Removed duplicate getOperand / getSimpleValueType calls. llvm-svn: 269614	2016-05-15 20:11:10 +00:00
Craig Topper	258f874bb9	[AVX512] Make the permd intrinsics take a 32-bit immediate to match the software spec. llvm-svn: 269579	2016-05-14 21:13:20 +00:00
Elena Demikhovsky	e79b716daf	Fixed lowering of _comi_ intrinsics from all sets - SSE/SSE2/AVX/AVX-512 Differential revision http://reviews.llvm.org/D19261 llvm-svn: 269569	2016-05-14 15:06:09 +00:00
Craig Topper	d8a9c0d120	[AVX512] Fix types for pshufd intrinsics. The immediate is the second argument and the mask is the 4th argument. Also move the 128/256 tests to the right test file. Prior to this the immediate was a strange 16-bits and the 512-bit intrinsic couldn't receive the full 16 mask bits it needs. llvm-svn: 269526	2016-05-14 00:47:18 +00:00
Justin Bogner	9b6b9c7f99	SDAG: Clean up a dead node I missed earlier in X86 H.J. Lu pointed out that I missed this in r269236. Thanks! llvm-svn: 269516	2016-05-13 23:26:28 +00:00
Amjad Aboud	78b1fb0146	Assure calling "cld" instruction in prologue of X86 interrupt handler function. Differential Revision: http://reviews.llvm.org/D18725 llvm-svn: 269413	2016-05-13 12:46:57 +00:00
Amjad Aboud	f29608265d	Fixed the callee saved registers list for X86 AllRegs calling convention. 32-bit AllRegs: SSE: xmm0-xmm7 AVX: ymm0-ymm7 AVX512: zmm0-zmm7 + k0-k7 64-bit AllRegs: SSE: xmm0-xmm15 AVX: ymm0-ymm15 AVX512: zmm0-zmm31 + k0-k7 Differential Revision: http://reviews.llvm.org/D20142 llvm-svn: 269337	2016-05-12 19:58:32 +00:00
Amjad Aboud	f85452426c	Fixed dwarf X86-32 register mapping for k0-k7 registers. llvm-svn: 269333	2016-05-12 19:49:24 +00:00
Justin Bogner	fde9f2e51d	SDAG: Use ReplaceNode here, not ReplaceUses This was a typo in an earlier commit - there's no point in keeping the old node around here. Noticed by Meador Inge. Thanks! llvm-svn: 269245	2016-05-11 22:21:50 +00:00
Justin Bogner	31d7da3b5f	SDAG: Add a helper to replace and remove a node during ISel It's very common to want to replace a node and then remove it since it's dead, especially as we port backends from the SDNode *Select API to the void Select one. This helper makes this sequence a bit less verbose. llvm-svn: 269236	2016-05-11 21:13:17 +00:00
Simon Pilgrim	6ce35dd9ea	[X86][AVX512] Fixed VPERMILPD/VPERMILPS shuffle comments. Fixed incorrect operands indices used to access src registers llvm-svn: 269221	2016-05-11 18:53:44 +00:00
Justin Bogner	c200ad7e3b	SDAG: Minor cleanup in X86 Don't bother returning a result we don't use here. I've also renamed this from selectGather to tryGather to better indicate that it may not do anything. llvm-svn: 269215	2016-05-11 17:46:03 +00:00
Simon Pilgrim	3016d9e9e1	[X86][SSE] Avoid repeatedly calling MCInst::getNumOperands(). NFCI. llvm-svn: 269209	2016-05-11 17:36:32 +00:00
Simon Pilgrim	41c05c019e	[X86][AVX512] Updated shuffle comments instruction macros to split writemask instructions. NFC This will make it easier to support the different writemask cases in shuffle comments llvm-svn: 269174	2016-05-11 11:55:12 +00:00
Justin Bogner	593741d354	SDAG: Implement Select instead of SelectImpl in X86 This is part of the work to have Select return void instead of an SDNode *, which is in turn part of llvm.org/pr26808. llvm-svn: 269144	2016-05-10 23:55:37 +00:00
Quentin Colombet	220f7da488	[X86] Properly check that EAX is dead when copying EFLAGS. This fixes a bug introduced in r267623, where we got smarter and avoided to save EAX before using it. However, we failed to check if any of the subregister of EAX were alive and thus, missed cases where we have to save EAX before using it. The problem may happen on every X86/i386/... platform. This fixes llvm.org/PR27624 llvm-svn: 269115	2016-05-10 20:49:46 +00:00
Jonas Paulsson	8e5b0c65cc	[foldMemoryOperand()] Pass LiveIntervals to enable liveness check. SystemZ (and probably other targets as well) can fold a memory operand by changing the opcode into a new instruction that as a side-effect also clobbers the CC-reg. In order to do this, liveness of that reg must first be checked. When LIS is passed, getRegUnit() can be called on it and the right LiveRange is computed on demand. Reviewed by Matthias Braun. http://reviews.llvm.org/D19861 llvm-svn: 269026	2016-05-10 08:09:37 +00:00
Craig Topper	3e0c038a84	[X86][AVX512] Strengthen the assertions from r269001. We need VLX to use the 128/256-bit move opcodes for extended registers. llvm-svn: 269019	2016-05-10 05:28:04 +00:00
Craig Topper	9f8e50cdb4	[X86] Add ZMM registers to the X86_INTR calling convention preserved mask when AVX512 is enabled. llvm-svn: 269018	2016-05-10 05:28:02 +00:00
Craig Topper	3fef1de785	[X86] Update X86_INTR calling convention to save ZMM registers instead of YMM registers when AVX512 is enabled. llvm-svn: 269017	2016-05-10 05:27:56 +00:00
Matthias Braun	31d19d43c7	CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011	2016-05-10 03:21:59 +00:00
Quentin Colombet	ee5f36bd54	[X86][AVX512] Use the proper load/store for AVX512 registers. When loading or storing AVX512 registers we were not using the AVX512 variant of the load and store for VR128 and VR256 like registers. Thus, we ended up with the wrong encoding and actually were dropping the high bits of the instruction. The result was that we load or store the wrong register. The effect is visible only when we emit the object file directly and disassemble it. Then, the output of the disassembler does not match the assembly input. This is related to llvm.org/PR27481. llvm-svn: 269001	2016-05-10 01:09:14 +00:00
Quentin Colombet	739614839f	[X86] Fix the AllRegs AVX calling convention. We used to list registers that were not in the AVX space. In other words, we were pushing registers that the ISA cannot encode (YMM16-YMM31). This is part of llvm.org/PR27481. llvm-svn: 268983	2016-05-09 22:37:05 +00:00
Quentin Colombet	b47b9b2de7	[X86] Strengthen the setting of inline asm constraints for fp regclasses. This is similar to r268953, but for floating point and vector register classes. Explanations: The setting of the inline asm constraints was implicitly relying on the order of the register classes in the file generated by tablegen. Since, we do not have any control on that order, make sure we do not depend on it anymore. llvm-svn: 268973	2016-05-09 21:24:31 +00:00
Simon Pilgrim	eec3a95f95	[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972	2016-05-09 21:14:38 +00:00
Quentin Colombet	3126db6fd7	[X86] Drop the 64-bit alignment for LOW32_ADDR_ACCESS register class. The only 64-bit register in that register class is RIP and it will not get spilled in the current ABIs. llvm-svn: 268963	2016-05-09 19:50:30 +00:00
Quentin Colombet	86098ab10b	Reapply [X86] Add a new LOW32_ADDR_ACCESS_RBP register class. This reapplies commit r268796, with a fix for the setting of the inline asm constraints. I.e., "mark" LOW32_ADDR_ACCESS_RBP as a GR variant, so that the regular processing of the GR operands (setting of the subregisters) happens. Original commit log: [X86] Add a new LOW32_ADDR_ACCESS_RBP register class. ABIs like NaCl uses 32-bit addresses but have 64-bit frame. The new register class reflects those constraints when choosing a register class for a address access. llvm-svn: 268955	2016-05-09 19:01:46 +00:00
Quentin Colombet	bb15ce3d1f	[X86] Strengthen the setting of inline asm constraints. The setting of the inline asm constraints was implicitly relying on the order of the register classes in the file generated by tablegen. Since, we do not have any control on that order, make sure we do not depend on it anymore. llvm-svn: 268953	2016-05-09 19:01:35 +00:00
Simon Pilgrim	af742d51ad	[X86][SSE] Added TODO comment to add support for AVX512 mask registers to shuffle comments This came up in discussion on D19198 llvm-svn: 268915	2016-05-09 13:30:16 +00:00
Craig Topper	a5d0bf5c36	[X86] Strengthen some type contraints for floating point round and extend. llvm-svn: 268892	2016-05-09 05:34:14 +00:00
Craig Topper	a58abd1cc6	[AVX512] Fix up types for arguments of int_x86_avx512_mask_cvtsd2ss_round and int_x86_avx512_mask_cvtss2sd_round. Only the argument being converted should be a different type. The other 2 argument should have the same type as the result. llvm-svn: 268891	2016-05-09 05:34:12 +00:00
Craig Topper	707c89c00d	[AVX512] Add non-temporal store patterns for v16i32/v32i16/v64i8. llvm-svn: 268889	2016-05-08 23:43:17 +00:00
Craig Topper	c41320d700	[AVX512] Add missing patterns for non-temporal stores of 128/256-bit vXi8/vXi16/vXi32 when VLX is enabled. The equivalent AVX1/2 patterns are disabled by VLX. This caused regular stores to be emitted instead. llvm-svn: 268886	2016-05-08 23:08:45 +00:00
Craig Topper	906f397137	[AVX512] Change predicates on some vXi16/vXi8 AVX store patterns so they stay enabled unless VLX and BWI instructions are supported." Without this we could fail instruction selection if VLX was enabled, but BWI wasn't. llvm-svn: 268885	2016-05-08 23:08:40 +00:00
Craig Topper	e5ce84a33c	[AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used. llvm-svn: 268884	2016-05-08 21:33:53 +00:00
Craig Topper	9d9251b86f	[X86] Remove extra patterns that check for BUILD_VECTOR of all 0s. These are always canonicalized to v4i32/v8i32/v16i32 except for in SSE1 only when only v4f32 is supported. llvm-svn: 268880	2016-05-08 20:10:20 +00:00
David Majnemer	eac58d8f68	[X86] Promote several single precision FP libcalls on Windows A number of libcalls don't exist in any particular lib but are, instead, defined in math.h as inline functions (even in C mode!). Don't rely on their existence when lowering @llvm.{cos,sin,floor,..}.f32, promote them instead. N.B. We had logic to handle FREM but were missing out on a number of others. This change generalizes the FREM handling. llvm-svn: 268875	2016-05-08 08:15:50 +00:00
Craig Topper	d681e23336	[X86] Lower 256-bit vector all-zero constants to v8i32 even with AVX1 only. Either way a 256-bit VXORPS will be used. llvm-svn: 268873	2016-05-08 07:10:54 +00:00
Craig Topper	3d6722910c	[X86] Add patterns for 256-bit non-temporal stores when only AVX1 is supported. While there, add a predicate to the SSE2 patterns to avoid an ordering dependency. llvm-svn: 268872	2016-05-08 07:10:50 +00:00
Craig Topper	d788498411	[X86] No need to avoid selecting AVX_SET0 for 256-bit integer types when only AVX1 is supported. AVX_SET0 just expands to 256-bit VXORPS which is legal in AVX1. llvm-svn: 268871	2016-05-08 07:10:47 +00:00
Craig Topper	6502975cf5	[X86] Fix InstAliases to not allow FARCALL32i/FARCALL16i/FARJMP32i/FARJMP16i in 64-bit mode. llvm-svn: 268863	2016-05-07 19:25:56 +00:00
Simon Pilgrim	96e5307d4e	[X86] Pulled out duplicate mask width calculation. NFCI. llvm-svn: 268861	2016-05-07 18:04:24 +00:00
Sanjay Patel	c2751e7050	[x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons For the sake of minimalism, this patch is x86 only, but I think that at least PPC, ARM, AArch64, and Sparc probably want to do this too. We might want to generalize the hook and pattern recognition for a target like PPC that has a full assortment of negated logic ops (orc, nand). Note that http://reviews.llvm.org/D18842 will cause this transform to trigger more often. For reference, this relates to: https://llvm.org/bugs/show_bug.cgi?id=27105 https://llvm.org/bugs/show_bug.cgi?id=27202 https://llvm.org/bugs/show_bug.cgi?id=27203 https://llvm.org/bugs/show_bug.cgi?id=27328 Differential Revision: http://reviews.llvm.org/D19087 llvm-svn: 268858	2016-05-07 15:03:40 +00:00
Ahmed Bougacha	04a8fc2e37	[X86] Teach X86FixupBWInsts to promote MOV8rr/MOV16rr to MOV32rr. This re-applies r268760, reverted in r268794. Fixes http://llvm.org/PR27670 The original imp-defs assertion was way overzealous: forward all implicit operands, except imp-defs of the new super-reg def (r268787 for GR64, but also possible for GR16->GR32), or imp-uses of the new super-reg use. While there, mark the source use as Undef, and add an imp-use of the old source reg: that should cover any case of dead super-regs. At the stage the pass runs, flags are unlikely to matter anyway; still, let's be as correct as possible. Also add MIR tests for the various interesting cases. Original commit message: Codesize is less (16) or equal (8), and we avoid partial dependencies. Differential Revision: http://reviews.llvm.org/D19999 llvm-svn: 268831	2016-05-07 01:11:17 +00:00
Ahmed Bougacha	068ac4af39	[X86] Register and initialize the FixupBW pass. That lets us use it in MIR tests. llvm-svn: 268830	2016-05-07 01:11:10 +00:00
Quentin Colombet	a09f050dc1	Revert "[X86] Add a new LOW32_ADDR_ACCESS_RBP register class." This reverts commit r268796. I believe it breaks test/CodeGen/X86/asm-mismatched-types.ll with: Cannot emit physreg copy instruction llvm-svn: 268799	2016-05-06 21:21:50 +00:00
Quentin Colombet	2728074e3c	[X86] Add a new LOW32_ADDR_ACCESS_RBP register class. ABIs like NaCl uses 32-bit addresses but have 64-bit frame. The new register class reflects those constraints when choosing a register class for a address access. llvm-svn: 268796	2016-05-06 21:10:53 +00:00
Quentin Colombet	377fc2aa3d	[X86] Rename the X32_ADDR_ACCESS register class into LOW32_ADDR_ACCESS. This register class may be used by any ABIs that uses x86_64 ISA while using 32-bit addresses, not just in X32 cases. Make sure the name reflects that. llvm-svn: 268795	2016-05-06 21:10:43 +00:00
Nico Weber	9b32b4fbee	Revert r268760, it caused PR27670. llvm-svn: 268794	2016-05-06 21:07:02 +00:00
Ahmed Bougacha	505984b466	[X86] Accept imp-defs of GR64 super-registers in FixupBW MOVrr. Testcase will follow shortly. llvm-svn: 268787	2016-05-06 20:03:03 +00:00
Quentin Colombet	a065ac45ee	[X86] Get rid of X32_NOREX_ADDR_ACCESS register class. According to H.J. Lu <hjl.tools@gmail.com>, this register class is never used. llvm-svn: 268771	2016-05-06 18:22:48 +00:00
Ahmed Bougacha	258426ca7a	[X86] Teach X86FixupBWInsts to promote MOV8rr/MOV16rr to MOV32rr. Codesize is less (16) or equal (8), and we avoid partial dependencies. Differential Revision: http://reviews.llvm.org/D19999 llvm-svn: 268760	2016-05-06 17:42:57 +00:00
Ahmed Bougacha	04200a7c86	[X86] Remove \brief in FixupBW. NFC. llvm-svn: 268754	2016-05-06 17:28:47 +00:00
Ahmed Bougacha	cfd9e55e90	[X86] Simplify FixupBW sub_8bit_hi-related logic. NFC. Instead of passing around sizes and asking for subregs, we can check the subreg indices we care about: sub_8bit_hi and sub_8bit. Differential Revision: http://reviews.llvm.org/D20006 llvm-svn: 268753	2016-05-06 17:28:42 +00:00
Justin Bogner	b012699741	SDAG: Rename Select->SelectImpl and repurpose Select as returning void This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693	2016-05-05 23:19:08 +00:00
Hans Wennborg	501e739d8a	X86CallFrameOptimization: make adjustCallSequence's return type void It always returned the same value (true). No functionality change. llvm-svn: 268645	2016-05-05 16:39:31 +00:00
Marcin Koscielnicki	0275fac2c9	[X86] Extend some Linux special cases to cover kFreeBSD. Both Linux and kFreeBSD use glibc, so follow similiar code paths. Add isTargetGlibc to check for this, and use it instead of isTargetLinux in a few places. Fixes PR22248 for kFreeBSD. Differential Revision: http://reviews.llvm.org/D19104 llvm-svn: 268624	2016-05-05 11:35:51 +00:00
David Majnemer	911d0e3c21	[X86] Use the right type when folding xor (truncate (shift)) -> setcc The result type of setcc is dependent on whether or not AVX512 is present. We had an X86-specific DAG-combine which assumed that the result type should be i8 when it could be i1. This meant that we would generate illegal setccs which LowerSETCC did not like. Instead, use an appropriate type and zero extend to i8. Also, there were some scenarios where the fold should have fired but didn't because we were overly cautious about the types. This meant that we generated: shrl $31, %edi andl $1, %edi kmovw %edi, %k0 kxnorw %k0, %k0, %k1 kshiftrw $15, %k1, %k1 kxorw %k1, %k0, %k0 kmovw %k0, %eax instead of: testl %edi, %edi setns %al This fixes PR27638. llvm-svn: 268609	2016-05-05 06:00:56 +00:00
Quentin Colombet	0c5bfd0514	[X86] Add a few register classes for x32 address accesses. The new register classes allow to tell the machine verifier that it is fine to use RIP for address accesses in x32 mode. Prior to that patch, we would complain that we are using a GR64 in place of GR32, whereas it is actually fine to use GR64 for x32 as long as the 32 high bits are 0s. RIP has this property and is used for RIP-relative addressing. This partially fixes http://llvm.org/PR27481. llvm-svn: 268567	2016-05-04 22:45:31 +00:00
David Majnemer	2c5aeabedd	[X86] Lower zext i1 arguments i1 is now a legal type for X86 with AVX512. There were some paths in X86FastISel which were not quite ready to see an i1 value: they were not quite sure how to deal with sign/zero extends for call arguments. DTRT by extending to i8 for zeroext and bailing out of FastISel for signext. This fixes PR27591. llvm-svn: 268470	2016-05-04 00:22:23 +00:00
Simon Pilgrim	be439d7f1a	[X86] Tidied up SDValue's SDNode referencing. NFCI. llvm-svn: 268445	2016-05-03 21:44:45 +00:00
Tim Northover	d2ecbccf27	X86-Darwin: start emitting data-region directives for jump-tables. The surrounding tools can cope these days, and they were invented for a reason. llvm-svn: 268437	2016-05-03 21:03:41 +00:00
David L Kreitzer	c9fbf1018a	Add an address space for the X86 SS segment. Patch by Michael LeMay (michael.lemay@intel.com) Differential Revision: http://reviews.llvm.org/D17093 llvm-svn: 268431	2016-05-03 20:16:08 +00:00
Simon Pilgrim	d2752708a3	[X86][SSE] Added target shuffle combine to MOVQ llvm-svn: 268391	2016-05-03 15:05:13 +00:00
Igor Breger	58c07806ae	[AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC. Differential Revision: http://reviews.llvm.org/D19860 llvm-svn: 268375	2016-05-03 11:51:45 +00:00
Igor Breger	ab076c683c	[AVX512] Fix lowerV4X128VectorShuffle to select correctly input operands . Differential Revision: http://reviews.llvm.org/D19803 llvm-svn: 268368	2016-05-03 08:08:44 +00:00
Matthias Braun	d1aabb2813	livePhysRegs: Pass MBB by reference in addLive{Ins\|Outs}(); NFC The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340	2016-05-03 00:24:32 +00:00
Matthias Braun	24f26e6d91	LivePhysRegs: Automatically determine presence of pristine regs. Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336	2016-05-03 00:08:46 +00:00
Quentin Colombet	4e1d389ac5	[X86] Model FAULTING_LOAD_OP as a terminator and branch. This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327	2016-05-02 22:58:54 +00:00
Simon Pilgrim	52f8693263	[X86][SSE] Added placeholder for 128/256-bit wide shuffle combines Begun adding placeholder for future support for vperm2f128/vshuff64x2 style 128/256-bit wide shuffles llvm-svn: 268306	2016-05-02 21:12:48 +00:00
Simon Pilgrim	e5e04baf95	[X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN lowering movmsk.ll tests are unchanged. llvm-svn: 268237	2016-05-02 14:58:22 +00:00
David L Kreitzer	0fe4632bd7	Enable the X86 call frame optimization for the 64-bit targets that allow it. Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227	2016-05-02 13:45:25 +00:00
Craig Topper	7b5925a5b6	[X86] Fix a bug in LOCK arithmetic operation pattern matching where the wrong immediate predicate check was being used for 64-bit instructions with 8-bit immediates. This didn't cause a bug because the order of the patterns ensured that the 64-bit instructions with 32-bit immediates were selected first. llvm-svn: 268212	2016-05-02 05:44:21 +00:00
Craig Topper	b6da65403a	[AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200	2016-05-01 17:38:32 +00:00
Igor Breger	131008fbcb	Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190	2016-05-01 08:40:00 +00:00
Craig Topper	e430de8be6	[AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when VLX and BWI are supported. llvm-svn: 268189	2016-05-01 06:52:19 +00:00
Craig Topper	5acb5a1caf	[AVX512] Add HasVLX to the 128/256-bit versions of VPACKSSDW/USDW/SSWB/USWB and VPMADDUBSW/VPMADDWD. llvm-svn: 268188	2016-05-01 06:24:57 +00:00
Craig Topper	db290664f6	[AVX512] Make sure 128/256-bit DQI versions of VAND/VANDN/VOR/VXOR are also marked as requiring VLX. llvm-svn: 268186	2016-05-01 05:57:06 +00:00
Craig Topper	f77ca947ce	[X86] Add an AddedComplexity to another pattern to put it near similar in the output file. llvm-svn: 268184	2016-05-01 05:22:15 +00:00
Craig Topper	742977ede8	[X86] Remove a seemlingly unused pattern. The same pattern appears elsewhere with an AddedComplexity that made this unreachable. llvm-svn: 268183	2016-05-01 05:22:13 +00:00
Craig Topper	eb9a87918b	[X86] Add AddedComplexity to keep some similar patterns near each other in the output file. llvm-svn: 268181	2016-05-01 04:59:49 +00:00
Craig Topper	7ed84d826e	[X86] Remove some redundant selection patterns. llvm-svn: 268180	2016-05-01 04:59:46 +00:00
Craig Topper	c9b1923358	[AVX512] Replace vector_extract with extractelt in some patterns. They mean the same thing but vector_extract is deprecated. NFC llvm-svn: 268179	2016-05-01 04:59:44 +00:00
Craig Topper	99f6b620cc	[AVX512] Add hasSideEffects/mayLoad/mayStore flags to some instructions. llvm-svn: 268174	2016-05-01 01:03:56 +00:00
Craig Topper	e012ede137	[X86] Reduce memory usage of MemOp2RegOp and RegOp2MemOp folding maps. llvm-svn: 268164	2016-04-30 17:59:49 +00:00
Sriraman Tallam	7da9b445ea	Differential Revision: http://reviews.llvm.org/D19733 llvm-svn: 268106	2016-04-29 21:19:16 +00:00
Filipe Cabecinhas	0da9937517	Unify XDEBUG and EXPENSIVE_CHECKS (into the latter), and add an option to the cmake build to enable them. Summary: Historically, we had a switch in the Makefiles for turning on "expensive checks". This has never been ported to the cmake build, but the (dead-ish) code is still around. This will also make it easier to turn it on in buildbots. Reviewers: chandlerc Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits Differential Revision: http://reviews.llvm.org/D19723 llvm-svn: 268050	2016-04-29 15:22:48 +00:00
Craig Topper	b805723294	[X86] Remove unnecessary header file containing a small class. It was only included in one place. Just define the class directly in the cpp file. NFC llvm-svn: 267985	2016-04-29 04:22:28 +00:00
Craig Topper	e7c1cd18d3	[X86] Include X86MCTargetDesc.h directly in X86Disassembler.cpp instead of duplicating parts of it. NFC llvm-svn: 267984	2016-04-29 04:22:26 +00:00
Craig Topper	184310d6a9	[X86] Use nested switches to vary the operand to helper functions that were previously called in multiple cases. This seems to help the inliner reduce code. NFC llvm-svn: 267964	2016-04-29 00:51:30 +00:00
Craig Topper	477649a4c0	[X86] Remove unused operand from a function and all its callers. NFC llvm-svn: 267854	2016-04-28 05:58:46 +00:00
Craig Topper	33772c5375	[CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853	2016-04-28 03:34:31 +00:00
Mitch Bodart	e60465ddf7	[X86] Enable the post-RA-scheduler for clang's default 32-bit cpu. For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809	2016-04-27 22:52:35 +00:00
Quentin Colombet	bf200688de	[X86][FastISel] Make sure we use the right register class when we select stores. llvm-svn: 267806	2016-04-27 22:33:42 +00:00
Quentin Colombet	d6dbec4c6f	[X86] Fix the lowering of TLS calls. The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797	2016-04-27 21:37:37 +00:00
Kevin B. Smith	c378a99ba5	[X86]: Quit promoting 16 bit loads to 32 bit. Differential Revision: http://reviews.llvm.org/D19592 llvm-svn: 267773	2016-04-27 19:58:03 +00:00
Nico Weber	e69b9548b8	Revert r267649, it caused PR27539. llvm-svn: 267723	2016-04-27 15:16:54 +00:00
Ahmed Bougacha	9a0c9adade	[X86] Set AddPristinesAndCSRs to FixupBW LivePhysRegs. NFC. We run after PEI, so we need to AddPristinesAndCSRs. In practice, that makes no difference here, because we only ask about liveness of super-registers of defined GR8/GR16 registers, so they can't be pristine. Still, it's the correct thing to do. Thanks to Quentin for noticing! Follow-up to r267495. llvm-svn: 267658	2016-04-27 01:51:38 +00:00
Ahmed Bougacha	19a2ee591a	[X86] Don't assume that MMX extractelts are from index 0. It's probably the case for all 3 MMX users out there, but with hand-crafted IR, you can trigger selection failures. Fix that. llvm-svn: 267652	2016-04-27 01:35:29 +00:00
Ahmed Bougacha	e68363a03c	[X86] Re-enable MMX i32 extractelt combine. This effectively adds back the extractelt combine removed by r262358: the direct case can still occur (because x86_mmx is special, see r262446), but it's the indirect case that's now superseded by the generic combine. llvm-svn: 267651	2016-04-27 01:35:25 +00:00
Cong Hou	6f879d9eb1	Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649	2016-04-27 01:29:18 +00:00
Quentin Colombet	4ff3cfb673	[X86] Make sure it is safe to clobber EFLAGS, if need be, when choosing the prologue. Do not use basic blocks that have EFLAGS live-in as prologue if we need to realign the stack. Realigning the stack uses AND instruction and this clobbers EFLAGS. An other alternative would have been to save and restore EFLAGS around the stack realignment code, but this is likely inefficient. Fixes PR27531. llvm-svn: 267634	2016-04-26 23:44:14 +00:00
Quentin Colombet	2b3a4e787e	[X86] Teach the expansion of copy instructions how to do proper liveness. When the simple analysis provided by MachineBasicBlock::computeRegisterLiveness fails, fall back on the LivePhysReg utility. llvm-svn: 267623	2016-04-26 23:14:32 +00:00
Andrew Kaylor	2bee5ef462	Optimization bisect support in X86-specific passes Differential Revision: http://reviews.llvm.org/D19439 llvm-svn: 267608	2016-04-26 21:44:24 +00:00
Ahmed Bougacha	128f8732a5	[CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI. Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606	2016-04-26 21:15:30 +00:00
Manman Ren	1c3f65a18c	Swift Calling Convention: use %RAX for sret. We don't need to copy the sret argument into %rax upon return. rdar://25671494 llvm-svn: 267579	2016-04-26 18:08:06 +00:00
Andrey Turetskiy	b405606432	[X86] PR27502: Fix the LEA optimization pass. Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass. Differential Revision: http://reviews.llvm.org/D19409 llvm-svn: 267551	2016-04-26 12:18:12 +00:00
Ahmed Bougacha	5cf735a5b1	[X86] Use LivePhysRegs in X86FixupBWInsts. Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495	2016-04-26 00:00:48 +00:00
Craig Topper	03734c7ce1	[X86] Replace a SmallVector used to pass 2 values to an ArrayRef parameter with a fixed size array. NFC llvm-svn: 267377	2016-04-25 04:30:29 +00:00
Simon Pilgrim	dd748b83aa	[X86][SSE] getTargetShuffleMaskIndices - dropped (unused) UNDEF handling We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements. llvm-svn: 267350	2016-04-24 16:49:53 +00:00
Simon Pilgrim	7c25ef92a3	[X86][SSE] Use range loop. NFCI. llvm-svn: 267349	2016-04-24 16:33:35 +00:00
Simon Pilgrim	f379a6c684	[X86][XOP] Fixed VPPERM permute op decoding (PR27472). Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346	2016-04-24 15:05:04 +00:00
Simon Pilgrim	9f5697ef68	[X86][SSE] Improved support for decoding target shuffle masks through bitcasts Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343	2016-04-24 14:53:54 +00:00
Craig Topper	dbc981f71f	[X86] Merge LowerCTLZ and LowerCTLZ_ZERO_UNDEF into a single function that branches internally for the one difference, allowing the rest of the code to be common. NFC llvm-svn: 267331	2016-04-24 06:27:39 +00:00
Craig Topper	6469a39f51	[X86] Node need to check if AVX512 is supported when lowering vector CTLZ. The CTLZ operation is only Custom for vectors if AVX512 is enabled so if a vector gets here AVX512 is implied. NFC llvm-svn: 267330	2016-04-24 06:27:35 +00:00
Craig Topper	e78eac1d31	[X86] Remove isel patterns for selecting tzcnt/lzcnt from cmove/ne+cttz/ctlz. These are folded by DAG combine now. llvm-svn: 267326	2016-04-24 04:38:34 +00:00
Craig Topper	601b6c69bc	[X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311	2016-04-24 02:01:22 +00:00
Davide Italiano	f59b0da654	[MC/ELF] Implement support for GOTPCRELX/REX_GOTPCRELX. The option to control the emission of the new relocations is -relax-relocations (blatantly copied from GNU as). It can't be enabled by default because it breaks relatively recent versions of ld.bfd/ld.gold (late 2015). llvm-svn: 267307	2016-04-24 01:03:57 +00:00
Davide Italiano	4652c59568	[MC/ELF] Pass Fixup to getRelocType64. In preparation for other changes. llvm-svn: 267300	2016-04-23 22:26:31 +00:00
Sriraman Tallam	3cb773431d	Differential Revision: http://reviews.llvm.org/D19040 llvm-svn: 267229	2016-04-22 21:41:58 +00:00
Peter Collingbourne	265ebd7d70	CodeGen: Use PLT relocations for relative references to unnamed_addr functions. The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211	2016-04-22 20:40:10 +00:00
Nirav Dave	9a878c4930	Emit code16 in assembly in 16-bit mode Summary: When generating assembly using -m16 we must explicitly mark it as 16-bit. Emit .code16 at beginning of file. Fixes wrong results when using -fno-integrated-as. Reviewers: dwmw2 Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19392 llvm-svn: 267152	2016-04-22 13:36:11 +00:00
Ashutosh Nema	468558a061	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123	2016-04-22 08:34:05 +00:00
Craig Topper	59479e7208	[AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ. llvm-svn: 267100	2016-04-22 03:22:38 +00:00
Craig Topper	21690db05a	[AVX512] Add CTTZ support for v8i64 and v16i32 vectors. llvm-svn: 266968	2016-04-21 07:30:06 +00:00
Craig Topper	340ad0a0c9	[AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions. llvm-svn: 266963	2016-04-21 06:39:34 +00:00
Craig Topper	7dedfdc60a	[X86] Remove redundant calls to setOperationAction for EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT from SSE41 block. They were already done in an earlier block. NFC llvm-svn: 266962	2016-04-21 06:39:32 +00:00
Craig Topper	032e985cbc	[X86] Remove some operations from the default Expand all vector ops loop. Instead let them stay Legal and mark them Expand for specific types where needed. Reduces overall number of calls to setOperationAction. NFC llvm-svn: 266961	2016-04-21 06:39:29 +00:00
Craig Topper	98c855d480	[X86] Remove old leftover MMX code that sets various 64-bit vector operations to Expand. These vector types aren't legal so these operations would never make it far enough to need to expand. NFC llvm-svn: 266960	2016-04-21 06:39:26 +00:00
Craig Topper	3e6be4c27a	[X86] Remove unnecessary setting of CTTZ_ZERO_UNDEF to Custom for vector types where we can't do any better than the Custom lowering of CTTZ. LegalizeVectorOps will expand to CTTZ since its marked Custom. CTTZ_ZERO_UNDEF can be custom lowered specially if CTLZ is supported. Otherwise CTTZ and CTTZ_ZERO_UNDEF are handled the same way by using CTPOP and bitmath. llvm-svn: 266952	2016-04-21 04:44:00 +00:00
Craig Topper	3dd625ce79	[AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions. Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results. llvm-svn: 266950	2016-04-21 03:57:24 +00:00
Davide Italiano	bf4df85ba7	[MC] Silence warning due to unused variable in !Debug builds. llvm-svn: 266901	2016-04-20 18:45:31 +00:00
Davide Italiano	8a8f24b098	[MC] EmitNop: Make an assertion more useful. Differential Revision: http://reviews.llvm.org/D19334 llvm-svn: 266895	2016-04-20 17:53:21 +00:00
Asaf Badouh	89406d1815	[X86] enable PIE for functions Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863	2016-04-20 08:32:57 +00:00
Craig Topper	99e60e9f1f	[AVX512] Add popcount support for v32i16 and v64i8. llvm-svn: 266858	2016-04-20 05:18:55 +00:00
Craig Topper	3e8f1e483c	[X86] Mark some floating point operations that are always expanded for vector types as Expand in a floating point only loop instead of looping through all vector types. llvm-svn: 266850	2016-04-20 01:57:44 +00:00
Craig Topper	7f28d55a00	[X86] Don't mark vector loads and shifts Expand in advance. Loads are always marked Legal or Promote for all the legal types later. Shifts are always marked custom. NFC llvm-svn: 266849	2016-04-20 01:57:42 +00:00
Craig Topper	ab7497dd6e	[X86] Merge the two different SSE2 blocks in the X86TargetLowering constructor. Also qualfiy the XOP block with !useSoftFloat to match the other vector blocks. llvm-svn: 266848	2016-04-20 01:57:40 +00:00
Craig Topper	397968ea16	[X86] Don't set vector FADD,FSUB,FMUL,FDIV,FNEG,FSQRT to Expand early. For every legal FP type we either set them to Legal or Custom anyway. So let them stay defaulted to Legal and only change when they need to be Custom. llvm-svn: 266847	2016-04-20 01:57:38 +00:00
Tim Shen	a1d8bc5597	[PPC, SSP] Support PowerPC Linux stack protection. llvm-svn: 266809	2016-04-19 20:14:52 +00:00
Tim Shen	e885d5e4d3	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806	2016-04-19 19:40:37 +00:00
Sanjoy Das	2effffd456	[X86] Simplify StackMapShadowTracker; NFC - Elide trivial contructor and desctructor - Move implementation out of an unnecessary explicit llvm namespace scope llvm-svn: 266794	2016-04-19 18:48:16 +00:00
Sanjoy Das	6ecfae61dc	[X86MCInstLower] Clean up EmitNops; NFC Instead of having a conditional assert inside EmitNops, refactor so that the caller can have the assert instead. llvm-svn: 266793	2016-04-19 18:48:13 +00:00
David L Kreitzer	d5cb34118d	Preliminary changes for fixing PR27241. Generalized/restructured some things in preparation for enabling the outgoing parameter store-to-push optimization for 64-bit targets. Differential Revision: http://reviews.llvm.org/D19222 llvm-svn: 266774	2016-04-19 17:43:44 +00:00
Simon Pilgrim	32b1c9fe7f	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Sanjoy Das	c0441c29df	Introduce a "patchable-function" function attribute Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future. Reviewers: joker.eph, rnk, echristo, dberris Subscribers: joker.eph, echristo, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19046 llvm-svn: 266715	2016-04-19 05:24:47 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Craig Topper	221e1c2b1f	[X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 rather than just looping over all vector types and conditinally matching them. NFC llvm-svn: 266577	2016-04-17 22:49:46 +00:00
Simon Pilgrim	dd153476fd	[X86] Added TODO comment for target shuffle mask decoding of bitcasted masks llvm-svn: 266559	2016-04-17 11:34:18 +00:00
Asaf Badouh	aec79651c1	[X86] Remove unneeded variables no functional change. ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL). Differential Revision: http://reviews.llvm.org/D18942 llvm-svn: 266557	2016-04-17 08:28:40 +00:00
Craig Topper	75869d5701	[AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled. llvm-svn: 266554	2016-04-17 07:25:39 +00:00
Craig Topper	1663e7a472	[X86] Use ternary operator to reduce code slightly. NFC llvm-svn: 266534	2016-04-16 19:09:32 +00:00
Simon Pilgrim	fd4b9b02a3	[X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533	2016-04-16 17:52:07 +00:00
Craig Topper	ea46b592ab	Add a setOperationPromotedToType convenience method that sets an operation to promoted and set the type in one call. Use it so save code in X86. llvm-svn: 266413	2016-04-15 06:20:18 +00:00
Craig Topper	13e9dc66e4	[X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly. llvm-svn: 266412	2016-04-15 06:20:14 +00:00
Craig Topper	5e20fd3e7c	[X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC llvm-svn: 266410	2016-04-15 04:57:09 +00:00
Reid Kleckner	28865809fe	Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h MachineInstr.h and MachineInstrBuilder.h are very popular headers, widely included across all LLVM backends. It turns out that there only a handful of TUs that actually care about DI operands on MachineInstrs. After this change, touching DebugInfoMetadata.h and rebuilding llc only needs 112 actions instead of 542. llvm-svn: 266351	2016-04-14 18:29:59 +00:00
Mehdi Amini	867e91468b	Do not use getGlobalContext()... ever. This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275	2016-04-14 04:36:40 +00:00
Matthias Braun	46b0f03e12	TargetLowering: Factor out common code for tail call eligibility checking; NFC llvm-svn: 266270	2016-04-14 01:10:42 +00:00
Matthias Braun	588d1cdad4	X86: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18902 llvm-svn: 266252	2016-04-13 21:43:21 +00:00
David L Kreitzer	99775c1b6e	Fixed a few typos and formatting problems. NFCI. llvm-svn: 266135	2016-04-12 21:45:09 +00:00
Justin Bogner	32ad24d4ef	X86: Avoid accessing SDValues after they've been RAUW'd This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress matches an ADD with an AND as an operand, and that AND hits one of the "heroic transforms" that folds masks and shifts, we end up with N pointing to an SDNode that was deleted. Make sure we're done accessing it before that. Found by ASan with the recycling allocator changes in llvm.org/PR26808. llvm-svn: 266130	2016-04-12 21:34:24 +00:00
Sanjay Patel	e6a0a23e08	fix indentation; NFC llvm-svn: 266097	2016-04-12 18:01:48 +00:00
Manman Ren	5751814eda	Swift Calling Convention: swifterror target support. Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997	2016-04-11 21:08:06 +00:00
Sriraman Tallam	f39e190ad8	Test commit. llvm-svn: 265976	2016-04-11 18:40:50 +00:00
Andrey Turetskiy	9df334c28e	[X86] Restrict max long nop length for Lakemont. Restrict the max length of long nops for Lakemont to 7. Experiments on MCU benchmarks (Dhrystone, Coremark) show that this is the most optimal length. Differential Revision: http://reviews.llvm.org/D18897 llvm-svn: 265924	2016-04-11 10:07:36 +00:00
Simon Pilgrim	d263fdc512	[X86][AVX512BW] Add support for v64i8 multiplies Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902	2016-04-10 17:02:48 +00:00
Craig Topper	35db8ecb50	[X86] Use for loops over types to reduce code for setting up operation actions. llvm-svn: 265893	2016-04-10 05:39:32 +00:00
Craig Topper	dcc8f49bf0	[X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC llvm-svn: 265892	2016-04-10 05:39:28 +00:00
Davide Italiano	7aa47094b2	[MC] support TLSDESC and TLSCALL / GNU2 tls dialect Differential Revision: http://reviews.llvm.org/D18885 llvm-svn: 265881	2016-04-09 20:32:33 +00:00
Sanjay Patel	4abae4e0fa	[x86] use BMI 'andn' for logic + compare ops With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875	2016-04-09 16:02:52 +00:00
Simon Pilgrim	1cc5712763	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Craig Topper	f027107094	[X86] Use for loops over types to reduce code for setting up operation actions. NFC llvm-svn: 265871	2016-04-09 06:31:02 +00:00
Craig Topper	e801ed9e15	[X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC llvm-svn: 265870	2016-04-09 05:53:48 +00:00
Tim Shen	0012756489	[SSP] Remove llvm.stackprotectorcheck. This is a cleanup patch for SSP support in LLVM. There is no functional change. llvm.stackprotectorcheck is not needed, because SelectionDAG isn't actually lowering it in SelectBasicBlock; rather, it adds check code in FinishBasicBlock, ignoring the position where the intrinsic is inserted (See FindSplitPointForStackProtector()). llvm-svn: 265851	2016-04-08 21:26:31 +00:00
Hans Wennborg	e25b65bdb7	Rangeify a loop. NFC. llvm-svn: 265846	2016-04-08 20:46:09 +00:00
Hans Wennborg	74ff770670	Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC These are already defined, with the same values, a few lines up. NFC. llvm-svn: 265845	2016-04-08 20:46:00 +00:00
Kevin B. Smith	e0a6fc3bcc	[X86] Fix PR23155 by turning on X86FixupBWInsts by default. Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830	2016-04-08 18:58:29 +00:00
Simon Pilgrim	476170384f	[X86] Tidied up shuffle decode function doxygen descriptions As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions. llvm-svn: 265785	2016-04-08 14:17:07 +00:00
Kevin B. Smith	3802c4af59	[X86]: Fix for PR27251. Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690	2016-04-07 16:15:34 +00:00
Simon Pilgrim	d54bae6525	[X86][SSE] Add support for VZEXT constant folding llvm-svn: 265646	2016-04-07 07:52:45 +00:00
Ahmed Bougacha	1cf67fb9cb	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636	2016-04-07 02:07:10 +00:00
Hans Wennborg	ab16be799c	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623	2016-04-07 00:05:49 +00:00
JF Bastien	800f87a871	NFC: make AtomicOrdering an enum class Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 llvm-svn: 265602	2016-04-06 21:19:33 +00:00
Hans Wennborg	6849f8f15f	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC." It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559	2016-04-06 16:44:38 +00:00
Hans Wennborg	a7e396b5ef	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551	2016-04-06 16:10:20 +00:00
Evgeniy Stepanov	dde29e2799	Faster stack-protector for Android/AArch64. Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481	2016-04-05 22:41:50 +00:00
Manman Ren	f8bdd88cd9	Swift Calling Convention: add swiftcc. Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480	2016-04-05 22:41:47 +00:00
Duncan P. N. Exon Smith	91d3cfed78	Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes." This reverts commit r265454 since it broke the build. E.g.: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/ llvm-svn: 265459	2016-04-05 20:45:04 +00:00
Eugene Zelenko	1760dc2a23	Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes. Some Include What You Use suggestions were used too. Use anonymous namespaces in source files. Differential revision: http://reviews.llvm.org/D18778 llvm-svn: 265454	2016-04-05 20:19:49 +00:00
Ahmed Bougacha	50e6cd4a3a	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450	2016-04-05 20:02:57 +00:00
Ahmed Bougacha	629446ba03	[X86] Simplify early-exit check. NFC. llvm-svn: 265447	2016-04-05 20:02:22 +00:00
Sanjay Patel	4c7d094451	fix typo; NFC llvm-svn: 265442	2016-04-05 19:27:39 +00:00
Hans Wennborg	a47a692341	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345	2016-04-04 21:02:46 +00:00
Matthias Braun	870c34f0cf	ARM, AArch64, X86: Check preserved registers for tail calls. We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329	2016-04-04 18:56:13 +00:00
Derek Schuff	1dbf7a571f	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313	2016-04-04 17:09:25 +00:00
Elena Demikhovsky	e99c561391	AVX-512: Truncating store for i1 vectors Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283	2016-04-04 07:17:47 +00:00
Simon Pilgrim	0edd3d771a	[X86] Removed duplicate code. llvm-svn: 265274	2016-04-03 20:40:35 +00:00
Simon Pilgrim	cd0dfc93eb	[X86][SSE] Support for MOVMSK signbit extraction instructions Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266	2016-04-03 18:22:03 +00:00
Simon Pilgrim	20d1d4f045	[X86] Tidied up X86ISD instruction nodes. NFCI. Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263	2016-04-03 14:14:32 +00:00
Elena Demikhovsky	5e426f7356	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Sanjay Patel	9f413364d5	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161	2016-04-01 17:36:45 +00:00
Sanjay Patel	a05e0ff223	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148	2016-04-01 16:27:14 +00:00
Andrea Di Biagio	8c48841907	[x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type. Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138	2016-04-01 12:25:44 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
Michael Kuperstein	7bab713188	Use range-based for loops. NFC. llvm-svn: 265105	2016-04-01 03:45:08 +00:00
Hans Wennborg	649159df3c	Follow-up to r265036: I got these iterators mixed up llvm-svn: 265076	2016-03-31 23:55:16 +00:00
Hans Wennborg	132cd62121	Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046	2016-03-31 20:27:30 +00:00
Hans Wennborg	e97fb414e8	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039	2016-03-31 19:26:24 +00:00
Hans Wennborg	e1a2e90ffa	Change eliminateCallFramePseudoInstr() to return an iterator This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036	2016-03-31 18:33:38 +00:00
Sanjay Patel	92d5ea5e07	[x86] use SSE/AVX ops for non-zero memsets (PR27100) Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029	2016-03-31 17:30:06 +00:00
Nirav Dave	83ce54aac2	Prevent X86ISelLowering from merging volatile loads Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013	2016-03-31 13:40:55 +00:00
Craig Topper	d2aa03a60a	[X86] Use MVT instead of EVT in code called after legalization. llvm-svn: 264992	2016-03-31 04:37:41 +00:00
Hans Wennborg	6596977130	[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966	2016-03-30 23:38:01 +00:00
Matthias Braun	8d41436004	CodeGen: Factor out code for tail call result compatibility check; NFC llvm-svn: 264959	2016-03-30 22:46:04 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Simon Pilgrim	c49bd2ede0	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922	2016-03-30 20:52:24 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00

... 7 8 9 10 11 ...

13764 Commits