llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	d890db6952	[AVX-512] Fix the ExeDomain for vcmpss/vcmpsd. llvm-svn: 295691	2017-02-21 04:26:04 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Ayman Musa	51ffeab8c8	[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643	2017-02-20 08:27:54 +00:00
Craig Topper	c6c68f5958	[AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0. llvm-svn: 295640	2017-02-20 07:00:40 +00:00
Craig Topper	a5fa2e40f9	[AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns. llvm-svn: 295638	2017-02-20 07:00:34 +00:00
Craig Topper	5b4e36aafa	[AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2. llvm-svn: 295634	2017-02-20 02:47:42 +00:00
Craig Topper	c184b671d9	[X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size. An earlier commit already did this for the register form. llvm-svn: 295626	2017-02-20 00:37:23 +00:00
Craig Topper	63801df251	[AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority. llvm-svn: 295619	2017-02-19 21:44:35 +00:00
Simon Pilgrim	14a7eee0b4	[X86] Use peekThroughOneUseBitcasts helper. NFCI. llvm-svn: 295618	2017-02-19 21:40:51 +00:00
Davide Italiano	16b476ffcc	[X86] Prefer static_cast<> to C-style cast. NFCI. llvm-svn: 295617	2017-02-19 21:35:41 +00:00
Craig Topper	489057715e	[AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own. llvm-svn: 295616	2017-02-19 21:32:15 +00:00
Simon Pilgrim	d590de2998	[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements. Replaces existing approach that could only search BUILD_VECTOR nodes. Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. llvm-svn: 295613	2017-02-19 19:40:31 +00:00
Craig Topper	4e794c71a6	[AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0. This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched. llvm-svn: 295612	2017-02-19 19:36:58 +00:00
Simon Pilgrim	4271186f9c	[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths. As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths. We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles. llvm-svn: 295608	2017-02-19 17:19:38 +00:00
Simon Pilgrim	6d07d514de	[X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains. Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted. llvm-svn: 295606	2017-02-19 15:15:40 +00:00
Simon Pilgrim	b4460cf5a9	[X86][SSE] Add domain crossing support for target shuffle combines. Add the infrastructure to flag whether float and/or int domains are permitable. A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors. llvm-svn: 295604	2017-02-19 14:12:25 +00:00
Craig Topper	218d1a020e	[AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch. The instructions are marked commutable, but without special handling we don't get the immediate correct. While here also remove the masked memory forms that aren't commutable. llvm-svn: 295602	2017-02-19 08:03:26 +00:00
Craig Topper	007c93b2b9	[X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type. llvm-svn: 295589	2017-02-19 02:08:48 +00:00
Craig Topper	06ae5e821c	[X86] Tighten up some of the SDNode type constraints. llvm-svn: 295588	2017-02-19 01:54:47 +00:00
Simon Pilgrim	599b872ca2	[X86] Fix enumeral/non-enumeral conditional expression warning. gcc only allows you to mix enums / ints if they have the same signedness. llvm-svn: 295586	2017-02-19 00:04:30 +00:00
Simon Pilgrim	2f2d8dc630	Fix signed/unsigned comparison warning. llvm-svn: 295580	2017-02-18 22:56:17 +00:00
Craig Topper	811756b4dc	[X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously. This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions. llvm-svn: 295579	2017-02-18 22:53:43 +00:00
Simon Pilgrim	7a87eebcad	[X86] Fix enumeral/non-enumeral comparison warning. gcc only allows you to mix enums / ints if they have the same signedness. llvm-svn: 295576	2017-02-18 22:40:58 +00:00
Simon Pilgrim	2e78c94ea5	[X86][SSE] Avoid repeated calls to SDValue::getValueType. Added assertion to check input type of X86ISD::VZEXT during target known bits calculation. llvm-svn: 295575	2017-02-18 22:25:27 +00:00
Craig Topper	de10312bea	Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." Clang has now been fixed to not use these intrinsics. llvm-svn: 295571	2017-02-18 21:50:58 +00:00
Sanjay Patel	12c2093e1e	[x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1 This is the same transform that is current used for: select Bool, 0, -1 llvm-svn: 295568	2017-02-18 21:03:28 +00:00
Craig Topper	ba2a726cc6	Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support. llvm-svn: 295565	2017-02-18 20:14:20 +00:00
Craig Topper	884db3f85d	[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR. It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses. llvm-svn: 295564	2017-02-18 19:51:25 +00:00
Craig Topper	a505169ca5	[AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to legacy unmasked intrinsics and select instructions. llvm-svn: 295543	2017-02-18 07:07:50 +00:00
Simon Pilgrim	7db8f42fe3	[X86] Simplify by pulling out valuetype. NFCI. llvm-svn: 295502	2017-02-17 22:10:10 +00:00
Simon Pilgrim	a4c350ff17	[X86][SSE] Add (V)MOVD folding pattern with zextloadi64i32 load node. Fixes PRPR31309 llvm-svn: 295492	2017-02-17 20:43:32 +00:00
Hans Wennborg	35905d6a67	Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" The original commit was reverted in r283329 due to a miscompile in Chromium. That turned out to be the same issue as PR31257, which was fixed in r295262. llvm-svn: 295357	2017-02-16 19:04:42 +00:00
Andrea Di Biagio	42f7712e23	x86 interrupt calling convention: only save xmm registers if the target supports SSE The existing code always saves the xmm registers for 64-bit targets even if the target doesn't support SSE (which is common for kernels). Thus, the compiler inserts movaps instructions which lead to CPU exceptions when an interrupt handler is invoked. This commit fixes this bug by returning a register set without xmm registers from getCalleeSavedRegs and getCallPreservedMask for such targets. Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D29959 llvm-svn: 295347	2017-02-16 18:25:37 +00:00
Simon Pilgrim	2fe568c95e	[X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead. llvm-svn: 295326	2017-02-16 15:11:49 +00:00
Craig Topper	715873ead3	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290	2017-02-16 06:31:54 +00:00
Hans Wennborg	a468601e0e	[X86] Re-enable conditional tail calls and fix PR31257. This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262	2017-02-16 00:04:05 +00:00
Simon Pilgrim	5b4c30fb32	[X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing. Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow. llvm-svn: 295235	2017-02-15 21:09:00 +00:00
Simon Pilgrim	da25d5c7b6	[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208	2017-02-15 17:41:33 +00:00
Simon Pilgrim	0f0e5bd3c6	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets llvm-svn: 295169	2017-02-15 11:46:15 +00:00
Ayman Musa	b8a4f255dd	[X86][AVX] Remove REX_W from AVX instructions. There is no meaning for REX_W in VEX encoded AVX instruction. Differential Revision: https://reviews.llvm.org/D29894 llvm-svn: 295157	2017-02-15 08:12:16 +00:00
Craig Topper	fbc7805e25	[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155	2017-02-15 06:58:47 +00:00
Craig Topper	ec5df5f4aa	[AVX-512] Add PACKSS/PACKUS instructions to load folding tables. llvm-svn: 295154	2017-02-15 06:51:39 +00:00
Diego Novillo	8adfc8ef3a	Remove unused variable. llvm-svn: 295065	2017-02-14 16:39:54 +00:00
Simon Pilgrim	6f732e026d	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs Add support for specifying an UNPCK input as UNDEF llvm-svn: 295061	2017-02-14 16:22:04 +00:00
Simon Pilgrim	a0878dea9e	[X86][SSE] Move unary inputs handling inside matchVectorShuffleWithUNPCK. llvm-svn: 295053	2017-02-14 13:47:17 +00:00
Simon Pilgrim	3efdffcb27	[X86][SSE] Tidyup matchVectorShuffleWithUNPCK helper function call. Don't bother setting the V1/V2 operands again for unary shuffles. Don't bother legalizing the value type unless the match succeeds. llvm-svn: 295051	2017-02-14 12:54:39 +00:00
Craig Topper	d2d50cba2a	[AVX-512] Add PAVGB/PAVGW to load folding tables. llvm-svn: 295035	2017-02-14 06:54:57 +00:00
Andrew Kaylor	709f1c2a9b	[X86] Add MXCSR register This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions. Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now. Differential Revision: https://reviews.llvm.org/D29903 llvm-svn: 295004	2017-02-13 23:38:52 +00:00
Simon Pilgrim	fd6a84fbaa	Fix indentation. NFCI. llvm-svn: 294959	2017-02-13 15:31:08 +00:00
Simon Pilgrim	828dee1f70	[X86][SSE] Create matchVectorShuffleWithUNPCK helper function. Currently only used by target shuffle combining - will use it for lowering as well in a future patch. llvm-svn: 294943	2017-02-13 11:52:58 +00:00
Ayman Musa	f77219e035	[X86][AVX512] Fix operand classes for some AVX512 instructions to keep consistency between VEX/EVEX versions of the same instruction. Differential Revision: https://reviews.llvm.org/D29873 llvm-svn: 294937	2017-02-13 09:55:48 +00:00
Craig Topper	680c73e7ab	[X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR to support 512-bit vectors with 128-bit or 256-bit subvectors. We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors. llvm-svn: 294931	2017-02-13 04:53:29 +00:00
Craig Topper	53eafa8ea4	[X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR. This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist. llvm-svn: 294929	2017-02-12 23:49:46 +00:00
Simon Pilgrim	cc9242bd1c	[X86] Fix typo in function name. NFCI. convertBitVectorToUnsiged - convertBitVectorToUnsigned llvm-svn: 294914	2017-02-12 20:53:44 +00:00
Craig Topper	cfe8ce3a58	[AVX-512] Add various EVEX move instructions to load folding tables using the VEX equivalents as a guide. llvm-svn: 294908	2017-02-12 18:47:46 +00:00
Craig Topper	5971b5488e	[AVX-512] Add VMOV64toSDZrm CodeGenOnly instruction based on the same instruction from AVX/SSE. I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables. llvm-svn: 294907	2017-02-12 18:47:44 +00:00
Craig Topper	ec26801483	[X86] Fix a couple instruction names to use 'mr' instead of 'rm' to indicate they are stores. AVX-512 version was already named with 'mr'. llvm-svn: 294906	2017-02-12 18:47:40 +00:00
Craig Topper	6eca3170a8	[AVX-512] Add VPEXTRD/Q to load folding tables. llvm-svn: 294905	2017-02-12 18:47:37 +00:00
Simon Pilgrim	04ec0f2b2a	[X86][SSE] Update argument names to match function name. NFCI. The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently. llvm-svn: 294900	2017-02-12 16:46:41 +00:00
Simon Pilgrim	4cd841757a	[X86][AVX2] Add support for combining target shuffles to VPMOVZX Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch. llvm-svn: 294896	2017-02-12 14:31:23 +00:00
Elena Demikhovsky	5d91ab46c0	AVX-512: Fixed DWARF register numbers for XMM16-31 The reference is here: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf llvm-svn: 294890	2017-02-12 07:56:50 +00:00
Craig Topper	1c37e991e6	[X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend. llvm-svn: 294876	2017-02-11 22:57:12 +00:00
Simon Pilgrim	755d9127f5	[X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874	2017-02-11 22:47:06 +00:00
Simon Pilgrim	437d64c49e	[X86][SSE] Improve VSEXT/VZEXT constant folding. Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873	2017-02-11 21:55:24 +00:00
Simon Pilgrim	4ef9672f0f	[X86][SSE] Add early-out when trying to match blend shuffle. NFCI. llvm-svn: 294864	2017-02-11 18:06:24 +00:00
Amaury Sechet	58ce15aba1	Fix indentation in X86ISelLowering. NFC llvm-svn: 294859	2017-02-11 17:48:48 +00:00
Craig Topper	255343483d	[AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables. llvm-svn: 294858	2017-02-11 17:35:28 +00:00
Craig Topper	b2fa216dd5	[X86] Improve alphabetizing of load folding tables. NFC llvm-svn: 294857	2017-02-11 17:35:25 +00:00
Simon Pilgrim	0e6945e48a	[X86][SSE] Convert getTargetShuffleMaskIndices to use getTargetConstantBitsFromNode. Removes duplicate constant extraction code in getTargetShuffleMaskIndices. getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits. llvm-svn: 294856	2017-02-11 17:27:21 +00:00
Simon Pilgrim	d59fa0e38a	[X86] Merge repeated getScalarValueSizeInBits calls. NFCI. llvm-svn: 294852	2017-02-11 16:42:07 +00:00
Simon Pilgrim	6411a0ebed	[X86][3DNow!] Enable PFSUB<->PFSUBR commutation llvm-svn: 294847	2017-02-11 13:51:14 +00:00
Simon Pilgrim	4ead1d4aa9	[X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRW All commutations confirmed to give identical results - note PFMAX/PFMIN do not PFSUB<->PFSUBR should be commutable as well llvm-svn: 294846	2017-02-11 13:32:55 +00:00
Benjamin Kramer	efcf06f5f2	Move symbols from the global namespace into (anonymous) namespaces. NFC. llvm-svn: 294837	2017-02-11 11:06:55 +00:00
Craig Topper	1f6153bab4	[AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables. llvm-svn: 294830	2017-02-11 07:01:40 +00:00
Craig Topper	a9818aadab	[AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV. llvm-svn: 294829	2017-02-11 07:01:38 +00:00
Craig Topper	3afa777f10	[AVX-512] Add VPSADBW instructions to load folding tables. llvm-svn: 294827	2017-02-11 06:24:03 +00:00
Craig Topper	464b8cb244	[X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824	2017-02-11 05:32:57 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Simon Pilgrim	8c8b10389d	[X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI. Also reordered an if statement to test low cost comparisons first llvm-svn: 294748	2017-02-10 14:27:59 +00:00
Simon Pilgrim	c371159aac	[X86][SSE] Add support for extracting target constants from BUILD_VECTOR In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode llvm-svn: 294746	2017-02-10 14:04:11 +00:00
Simon Pilgrim	1140281413	[X86][SSE] Add missing comment describing combing to SHUFPS. NFCI llvm-svn: 294745	2017-02-10 13:16:01 +00:00
Igor Breger	6677999e17	add #ifdef, fix compilation error in case LLVM_BUILD_GLOBAL_ISEL=OFF llvm-svn: 294726	2017-02-10 07:33:14 +00:00
Igor Breger	b4442f34cd	[X86][GlobalISel] Add general-purpose Register Bank Summary: [X86][GlobalISel] Add general-purpose Register Bank. Add trivial handling of G_ADD legalization . Add Regestry Bank selection for COPY and G_ADD instructions Reviewers: rovka, zvi, ab, t.p.northover, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29771 llvm-svn: 294723	2017-02-10 07:05:56 +00:00
Eric Christopher	0824096cc0	Temporarily revert "For X86-64 linux and PPC64 linux align int128 to 16 bytes." until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise we can't code generate existing bitcode without a string equality data layout. This reverts commit r294702. llvm-svn: 294709	2017-02-10 04:35:32 +00:00
Eric Christopher	42b9248803	For X86-64 linux and PPC64 linux align int128 to 16 bytes. For other platforms we should find out what they need and likely make the same change, however, a smaller additional change is easier for platforms we know have it specified in the ABI. As part of this rewrite some of the handling in the backends for data layout and update a bunch of testcases. Based on a patch by Simonas Kazlauskas! llvm-svn: 294702	2017-02-10 03:32:21 +00:00
Simon Pilgrim	7f0d7e08b2	[X86] Remove duplicate call to getValueType. NFCI. llvm-svn: 294640	2017-02-09 22:35:59 +00:00
Peter Collingbourne	ef089bdb4b	X86: Introduce relocImm-based patterns for cmp. Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636	2017-02-09 22:02:28 +00:00
Peter Collingbourne	d7dd65ad7c	X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols. This requires that we communicate to X86InstrInfo::optimizeCompareInstr that the second operand is neither a register nor an immediate. The way we do that is by setting CmpMask to zero. Note that there were already instructions where the second operand was not a register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero for those instructions. This seems like a latent bug, but I was unable to trigger it. Differential Revision: https://reviews.llvm.org/D28621 llvm-svn: 294634	2017-02-09 21:58:24 +00:00
Simon Pilgrim	e0b5c2acbd	Convert to for-range loop. NFCI. llvm-svn: 294610	2017-02-09 18:52:24 +00:00
Simon Pilgrim	6bf1bd3ed6	[X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode. llvm-svn: 294596	2017-02-09 17:08:47 +00:00
Pierre Gousseau	6953b32475	[X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588	2017-02-09 14:43:58 +00:00
Simon Pilgrim	563e23e66e	[X86][SSE] Attempt to break register dependencies during lowerBuildVector LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581	2017-02-09 11:50:19 +00:00
Craig Topper	3cac763532	[X86] Remove the HLE feature flag. We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562	2017-02-09 06:51:02 +00:00
Craig Topper	86576bd921	[X86] Remove INVPCID and SMAP feature flags. They aren't currently used by any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561	2017-02-09 06:50:59 +00:00
Craig Topper	50f3d1452c	[X86] Clzero intrinsic and its addition under znver1 This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558	2017-02-09 04:27:34 +00:00
Simon Pilgrim	dcd10344a3	[X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI. Run clang-format and standardized variable names between functions. llvm-svn: 294456	2017-02-08 14:44:45 +00:00
Craig Topper	3fd463a15a	[X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set. llvm-svn: 294407	2017-02-08 05:45:46 +00:00
Craig Topper	6c05192018	[X86] Remove the VMFUNC feature flag. It was only partially implemented and we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406	2017-02-08 05:45:42 +00:00
Craig Topper	e0ac7f3beb	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405	2017-02-08 05:45:39 +00:00
Hans Wennborg	819e3e02a9	[X86] Disable conditional tail calls (PR31257) They are currently modelled incorrectly (as calls, which clobber registers, confusing e.g. Machine Copy Propagation). Reverting until we figure out the proper solution. llvm-svn: 294348	2017-02-07 20:37:45 +00:00
Sanjay Patel	b0cee9b273	[x86] improve comments for SHRUNKBLEND node creation; NFC llvm-svn: 294344	2017-02-07 19:54:16 +00:00
Sanjoy Das	2f63cbcc0c	[ImplicitNullCheck] Extend Implicit Null Check scope by using stores Summary: This change allows usage of store instruction for implicit null check. Memory Aliasing Analisys is not used and change conservatively supposes that any store and load may access the same memory. As a result re-ordering of store-store, store-load and load-store is prohibited. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: atrick, llvm-commits Differential Revision: https://reviews.llvm.org/D29400 llvm-svn: 294338	2017-02-07 19:19:49 +00:00
Sanjay Patel	ef6d573f67	[x86] use range-for loops; NFCI llvm-svn: 294337	2017-02-07 19:18:25 +00:00
Sanjay Patel	633ecbf3c4	[x86] use getSignBit() for clarity; NFCI llvm-svn: 294333	2017-02-07 19:01:35 +00:00
Simon Pilgrim	8c0f62d293	[X86][SSE] Ensure that vector shift-by-immediate inputs are correctly bitcast to the result type vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input. Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them. llvm-svn: 294308	2017-02-07 14:22:25 +00:00
Craig Topper	9191c3324a	[AVX-512] Add masked and unmasked shift by immediate instructions to load folding tables. llvm-svn: 294287	2017-02-07 07:31:00 +00:00
Craig Topper	62304d80e3	[AVX-512] Add masked shift instructions to load folding tables. This adds the masked versions of everything, but the shift by immediate instructions. llvm-svn: 294286	2017-02-07 07:30:57 +00:00
Craig Topper	45d9ddc687	[AVX-512] Add some of the shift instructions to the load folding tables. This includes unmasked forms of variable shift and shifting by the lower element of a register. Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms. llvm-svn: 294285	2017-02-07 07:30:54 +00:00
Craig Topper	39d86bb688	[X86] Change the Defs list for VZEROALL/VZEROUPPER back to not including YMM16-31. llvm-svn: 294277	2017-02-07 04:10:57 +00:00
Eugene Zelenko	90562dfb50	[X86] Fix some Include What You Use warnings; other minor fixes (NFC). This is preparation to reduce MCExpr.h dependencies.(vlsj-clangbuild)[622] llvm-svn: 294246	2017-02-06 21:55:43 +00:00
Simon Pilgrim	bfd4495512	[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183	2017-02-06 13:44:45 +00:00
Igor Breger	5c31a4c9a3	[X86][GlobalISel] Add limited ret lowering support to the IRTranslator. Summary: Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only. Support argument lowering for float/double types. Reviewers: t.p.northover, zvi, ab, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, delena, llvm-commits Differential Revision: https://reviews.llvm.org/D29261 llvm-svn: 294173	2017-02-06 08:37:41 +00:00
Craig Topper	5d9ecd23e8	[AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables. llvm-svn: 294170	2017-02-06 05:12:14 +00:00
Craig Topper	f0eb60a6f3	[AVX-512] Add VPABSB/D/Q/W to load folding tables. llvm-svn: 294169	2017-02-06 03:18:01 +00:00
Craig Topper	864b1a5376	[AVX-512] Add VSHUFPS/PD to load folding tables. llvm-svn: 294168	2017-02-06 03:17:58 +00:00
Craig Topper	75218fb6b1	[AVX-512] Add VPMULLD/Q/W instructions to load folding tables. llvm-svn: 294164	2017-02-06 01:19:26 +00:00
Craig Topper	452a7770e6	[AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables. llvm-svn: 294163	2017-02-05 23:31:48 +00:00
Simon Pilgrim	380ce75687	[X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector llvm-svn: 294162	2017-02-05 22:50:29 +00:00
Craig Topper	8eb1f315ac	[AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables. llvm-svn: 294153	2017-02-05 22:25:46 +00:00
Craig Topper	cb4bc8be5b	[AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables. llvm-svn: 294152	2017-02-05 22:25:42 +00:00
Craig Topper	59af67206d	[AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads. llvm-svn: 294151	2017-02-05 22:25:40 +00:00
Kamil Rytarowski	5d2bd8dd54	Revamp llvm::once_flag to be closer to std::once_flag Summary: Make this interface reusable similarly to std::call_once and std::once_flag interface. This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical. Sponsored by <The NetBSD Foundation> Reviewers: mehdi_amini, labath, joerg Reviewed By: mehdi_amini Subscribers: emaste, clayborg Differential Revision: https://reviews.llvm.org/D29566 llvm-svn: 294143	2017-02-05 21:13:06 +00:00
Craig Topper	cac328f25e	[X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument. llvm-svn: 294132	2017-02-05 18:33:31 +00:00
Craig Topper	d7ae9ab1fa	[X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage. llvm-svn: 294131	2017-02-05 18:33:24 +00:00
Craig Topper	6a35a81fc5	[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. llvm-svn: 294130	2017-02-05 18:33:14 +00:00
Craig Topper	978fdb75a4	[X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1). llvm-svn: 294112	2017-02-04 23:26:46 +00:00
Craig Topper	3d95228dbe	[X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI llvm-svn: 294111	2017-02-04 23:26:42 +00:00
Simon Pilgrim	034c1bd32c	[X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle. Correctly flagging upper elements as undef. llvm-svn: 294020	2017-02-03 17:59:58 +00:00
Craig Topper	bbb2b95ce5	[X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering. llvm-svn: 293969	2017-02-03 00:24:49 +00:00
Eugene Zelenko	fbd13c5c12	[X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293949	2017-02-02 22:55:55 +00:00
Reid Kleckner	3c467e225e	[X86] Avoid sorted order check in release builds Effectively reverts r290248 and fixes the unused function warning with ifndef NDEBUG. llvm-svn: 293945	2017-02-02 22:06:30 +00:00
Craig Topper	c45657375b	[X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine. On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI. llvm-svn: 293944	2017-02-02 22:02:57 +00:00
Michael Kuperstein	e6d59fdca5	[X86] Add costs for non-AVX512 single-source permutation integer shuffles Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932	2017-02-02 20:27:13 +00:00
Nirav Dave	e14300e270	[X86,ISEL] Fix X86 increment chain dependence calculation Merging Load-add-store pattern into a increment op previously dropped the load's chain from the instructions dependence if the store is chained to a TokenFactor. llvm-svn: 293892	2017-02-02 14:39:26 +00:00
Simon Pilgrim	20ab6b875a	[X86][SSE] Use MOVMSK for all_of/any_of reduction patterns This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880	2017-02-02 11:52:33 +00:00
Craig Topper	047a8be18a	[X86] Remove some unused DAGCombinerInfo parameters. NFC llvm-svn: 293873	2017-02-02 08:03:23 +00:00
Craig Topper	94ed54b49a	[X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine. This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together. This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases. llvm-svn: 293872	2017-02-02 08:03:20 +00:00
Craig Topper	b81e6c48f8	[AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31. llvm-svn: 293862	2017-02-02 04:17:18 +00:00
Peter Collingbourne	dc5e583687	X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128). Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844	2017-02-02 00:32:03 +00:00
Simon Pilgrim	ca931efc21	[X86][SSE] Remove unused argument. NFCI. llvm-svn: 293777	2017-02-01 16:34:50 +00:00
Simon Pilgrim	55a9c79bd1	[X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI. These are identical apart from the extra SSE41 guard for PINSRB. llvm-svn: 293766	2017-02-01 13:32:19 +00:00
NAKAMURA Takumi	468487d71a	*MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation] llvm-svn: 293744	2017-02-01 07:30:46 +00:00
Craig Topper	0bcba19cdf	[X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types. For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal. This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too. llvm-svn: 293743	2017-02-01 07:17:16 +00:00
Evandro Menezes	94edf02923	[CodeGen] Move MacroFusion to the target This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737	2017-02-01 02:54:34 +00:00
Peter Collingbourne	d763c4cc85	MC: Introduce the ABS8 symbol modifier. @ABS8 can be applied to symbols which appear as immediate operands to instructions that have a 8-bit immediate form for that operand. It causes the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8 or R_X86_64_8) for the symbol. Differential Revision: https://reviews.llvm.org/D28688 llvm-svn: 293667	2017-01-31 18:28:44 +00:00
Nirav Dave	a7c041d147	[X86] Implement -mfentry Summary: Insert calls to __fentry__ at function entry. Reviewers: hfinkel, craig.topper Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28000 llvm-svn: 293648	2017-01-31 17:00:27 +00:00
Simon Pilgrim	1b39d5db7b	[X86][SSE] Add support for combining PINSRB into a target shuffle. llvm-svn: 293637	2017-01-31 14:59:44 +00:00
Benjamin Kramer	94a833962c	[X86] Silence unused variable warning in Release builds. llvm-svn: 293631	2017-01-31 14:13:53 +00:00
Simon Pilgrim	4eab18f6b8	[X86][SSE] Detect unary PBLEND shuffles. These can appear during shuffle combining. llvm-svn: 293628	2017-01-31 13:58:01 +00:00
Simon Pilgrim	c29eab52e8	[X86][SSE] Add support for combining PINSRW into a target shuffle. Also add the ability to recognise PINSR(Vex, 0, Idx). Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat. The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch. llvm-svn: 293627	2017-01-31 13:51:10 +00:00
Craig Topper	2cfa2071bd	[AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway. llvm-svn: 293608	2017-01-31 06:49:55 +00:00
Craig Topper	797e32dd98	[X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version. llvm-svn: 293607	2017-01-31 06:49:53 +00:00
Craig Topper	779e4c5bb4	[AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions. llvm-svn: 293606	2017-01-31 06:49:50 +00:00
Craig Topper	06e038c6de	[X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain. llvm-svn: 293603	2017-01-31 05:18:29 +00:00
Craig Topper	e9e84c8284	[AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP. llvm-svn: 293601	2017-01-31 05:18:24 +00:00
Craig Topper	d064cc93b2	[X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite. llvm-svn: 293592	2017-01-31 02:09:53 +00:00
Craig Topper	85935f69fb	[X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves. llvm-svn: 293591	2017-01-31 02:09:51 +00:00
Craig Topper	ced68315ce	[X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway. llvm-svn: 293590	2017-01-31 02:09:49 +00:00
Craig Topper	b76494e017	[X86] Remove 'else' after 'return'. NFC llvm-svn: 293589	2017-01-31 02:09:46 +00:00
Craig Topper	f9d901f0ea	[X86] Use integer broadcast instructions for integer broadcast patterns. I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it. llvm-svn: 293588	2017-01-31 02:09:43 +00:00
Simon Pilgrim	3905e03a47	[X86][SSE] Fix unsigned <= 0 warning in assert. NFCI. Thanks to @mkuper llvm-svn: 293561	2017-01-30 22:58:44 +00:00
Simon Pilgrim	a80a47afef	[X86][SSE] Generalize the number of decoded shuffle inputs. NFCI. combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs. This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns. llvm-svn: 293560	2017-01-30 22:48:49 +00:00
Eli Friedman	2345733246	Fix line endings. llvm-svn: 293554	2017-01-30 22:04:23 +00:00
Simon Pilgrim	098998aef0	[X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles llvm-svn: 293500	2017-01-30 16:58:34 +00:00
Asaf Badouh	e11d2d73bf	[X86][MCU] Minor bug fix for r293469 + test case llvm-svn: 293478	2017-01-30 13:14:37 +00:00
Asaf Badouh	53713df0c2	[X86][MCU] replace select with bit manipulation instead of branches Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469	2017-01-30 08:16:59 +00:00
Craig Topper	f6df4a6978	[AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does. This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX. llvm-svn: 293464	2017-01-30 06:59:06 +00:00
Craig Topper	0265a39472	[AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1. llvm-svn: 293458	2017-01-30 05:37:47 +00:00
Craig Topper	3b7e823f92	[AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements. llvm-svn: 293448	2017-01-30 00:06:01 +00:00
Chris Ray	30b3fafb94	[X86][Disassembler] Added SALC instruction Reviewers: joe.abbey, craig.topper Reviewed By: craig.topper Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D29201 llvm-svn: 293447	2017-01-29 23:02:47 +00:00
Craig Topper	db919caf1b	[AVX-512] Fix lowering for mask register concatenation with undef in the lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446	2017-01-29 22:53:33 +00:00
Chris Ray	ba3741cb2b	[X86] Fixing flag usage for RCL and RCR Summary: The RCL and RCR instructions use the carry flag. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29237 llvm-svn: 293441	2017-01-29 20:05:30 +00:00
Simon Pilgrim	76073f8d22	[X86][SSE] Lower scalar_to_vector(0) to zero vector Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector. The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem. Differential Revision: https://reviews.llvm.org/D29097 llvm-svn: 293438	2017-01-29 18:13:37 +00:00
Elena Demikhovsky	17fe27f1f2	[X86 Codegen] Fixed a bug in unsigned saturation PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern. AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned. See https://llvm.org/bugs/show_bug.cgi?id=31773 Differential Revision: https://reviews.llvm.org/D29196 llvm-svn: 293431	2017-01-29 13:18:30 +00:00
Igor Breger	9ea154d4ad	[X86][GlobalISel] Add limited argument lowering support to the IRTranslator. Summary: Add limited (i8/i16/i32/i64) argument lowering support to the IRTranslator. Inspired by commit 289940. Reviewers: t.p.northover, qcolombet, ab, zvi, rovka Reviewed By: rovka Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D28987 llvm-svn: 293427	2017-01-29 08:35:42 +00:00
Craig Topper	6533e40e9d	[X86] Fix vector ANDN matching to work correctly when both inputs to the AND are XORs. llvm-svn: 293403	2017-01-28 23:52:09 +00:00
Matthias Braun	8c209aa877	Cleanup dump() functions. We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359	2017-01-28 02:02:38 +00:00
Chris Ray	535e7d1547	[X86] Adding FFREEP instruction. Summary: Small change to get the FREEP instruction to decode properly. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29193 llvm-svn: 293314	2017-01-27 18:02:53 +00:00
Simon Pilgrim	027bb453d9	[X86][SSE] Add support for combining ANDNP byte masks with target shuffles llvm-svn: 293178	2017-01-26 14:31:12 +00:00
Simon Pilgrim	3057fd53f9	[X86][SSE] Pull out target shuffle resolve code into helper. NFCI. Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit. llvm-svn: 293175	2017-01-26 13:06:02 +00:00
Craig Topper	bad53cce26	[AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it. llvm-svn: 293157	2017-01-26 07:17:58 +00:00
Craig Topper	f0bab7b739	[X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand. llvm-svn: 293156	2017-01-26 07:17:53 +00:00
Jonas Paulsson	8e2f948ef0	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Mohammed Agabaria	20caee95e1	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Coby Tayree	77807d93af	[X86]Enable the use of 'mov' with a 64bit GPR and a large immediate Enable the next form (intel style): "mov <reg64>, <largeImm>" which is should be available, where <largeImm> stands for immediates which exceed the range of a singed 32bit integer Differential Revision: https://reviews.llvm.org/D28988 llvm-svn: 293030	2017-01-25 07:09:42 +00:00
Simon Pilgrim	893d2119ee	[X86][AVX512] Remove unused argument from PMOVX tablegen patterns. NFCI. Seems to be a copy+paste legacy from the AVX2 patterns. llvm-svn: 292941	2017-01-24 16:16:29 +00:00
Martin Bohme	526299c81c	[X86][SSE] Add explicit braces to avoid -Wdangling-else warning. Reviewers: RKSimon Subscribers: llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D29076 llvm-svn: 292924	2017-01-24 12:31:30 +00:00
Simon Pilgrim	0c45338961	Fix unused variable warning llvm-svn: 292921	2017-01-24 11:54:27 +00:00
Simon Pilgrim	e1ec9072f6	[X86][SSE] Add support for constant folding vector arithmetic shift by immediates llvm-svn: 292919	2017-01-24 11:46:13 +00:00
Simon Pilgrim	6340e54861	[X86][SSE] Add support for constant folding vector logical shift by immediates llvm-svn: 292915	2017-01-24 11:21:57 +00:00
Craig Topper	fc8798fa1b	[X86] Remove unnecessary peakThroughBitcasts call that's already take care of by the ISD::isBuildVectorAllOnes check below. llvm-svn: 292894	2017-01-24 06:57:29 +00:00
Craig Topper	b0cbd5b5b0	[AVX-512] Simplify multiclasses for integer logic operations. There were several inputs that didn't vary. While there give them the same scheduling itinerary as the SSE/AVX versions. llvm-svn: 292892	2017-01-24 06:25:34 +00:00
Craig Topper	993edc9db1	[X86] Don't split v8i32 all ones values if only AVX1 is available. Keep it intact and split it at isel. This allows us to remove the check in ANDN combining that had to look through the extraction. llvm-svn: 292881	2017-01-24 04:33:03 +00:00
Craig Topper	eb440a14a5	[X86] Remove Undef handling from extractSubVector. This is now handled inside getNode. llvm-svn: 292877	2017-01-24 02:43:54 +00:00
Simon Pilgrim	0218ce1080	[X86][SSE] Add missing X86ISD::ANDNP combines. llvm-svn: 292767	2017-01-22 22:45:23 +00:00
Simon Pilgrim	7e1cc97513	[X86][SSE] Improve shuffle combining with zero insertions Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766	2017-01-22 22:21:44 +00:00
Sanjay Patel	8f49aede82	[x86] avoid crashing with illegal vector type (PR31672) https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758	2017-01-22 17:06:12 +00:00
Craig Topper	8e0724d332	[X86] Don't allow commuting to form phsub operations. Fixes PR31714. llvm-svn: 292713	2017-01-21 06:59:38 +00:00
Simon Pilgrim	3e5b525699	Remove trailing whitespace. NFCI. llvm-svn: 292613	2017-01-20 15:15:59 +00:00
Simon Pilgrim	0da4d2bc03	[CostModel][X86] Removed unused cost. NFCI. SHL v8i32 is already handled in the SSE41 cost table llvm-svn: 292612	2017-01-20 15:14:38 +00:00
Simon Pilgrim	db101e4d57	[X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI. llvm-svn: 292502	2017-01-19 18:18:32 +00:00
Simon Pilgrim	5f2f53b106	[X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493	2017-01-19 16:25:02 +00:00
Elena Demikhovsky	e01512cecf	Recommiting unsigned saturation with a bugfix. A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479	2017-01-19 12:08:21 +00:00
Craig Topper	200ea31684	[AVX-512] Support ADD/SUB/MUL of mask vectors Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474	2017-01-19 07:12:35 +00:00
Craig Topper	c227529105	[X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical. llvm-svn: 292469	2017-01-19 03:49:29 +00:00
Craig Topper	b561e66384	[AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load. llvm-svn: 292466	2017-01-19 02:34:29 +00:00
Michael Kuperstein	d3d2925933	Revert r291670 because it introduces a crash. r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444	2017-01-18 23:05:58 +00:00
Kirill Bobyrev	6afbaf0944	Revert 292404 due to buildbot failures. llvm-svn: 292407	2017-01-18 16:34:25 +00:00
Kirill Bobyrev	9ad06dbe17	[X86] Minor code cleanup to fix several clang-tidy warnings. NFC llvm-svn: 292404	2017-01-18 16:15:47 +00:00
Michael Zuckerman	0c0240ce84	[X86] Improve mul combine for negative multiplayer (2^c - 1) This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358	2017-01-18 09:31:13 +00:00
Marina Yatsina	197db00e3e	[X86] Fix for bugzilla 31576 - add support for "data32" instruction prefix This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576). "data32" instruction prefix was not defined in the llvm. An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes). Differential Revision: https://reviews.llvm.org/D28468 llvm-svn: 292352	2017-01-18 08:07:51 +00:00
Joerg Sonnenberger	270dd41f75	Remove an overeager assert from r288844. llvm-svn: 292244	2017-01-17 19:29:15 +00:00
Bob Wilson	f2d0b68b3b	Revert r291640 change to fold X86 comparison with atomic_load_add. Even with the fix from r291630, this still causes problems. I get widespread assertion failures in the Swift runtime's WeakRefCount::increment() function. I sent a reduced testcase in reply to the commit. llvm-svn: 292242	2017-01-17 19:18:57 +00:00
Craig Topper	729d30d0ae	[AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation. llvm-svn: 292201	2017-01-17 06:49:59 +00:00
Craig Topper	fba613e407	[X86] Merge the disassemblers handling of the different TYPE_RELs by getting the size information from the ENCODING field. NFCI llvm-svn: 292096	2017-01-16 06:49:09 +00:00
Craig Topper	ad944a1cac	[X86] Reduce the number of operand 'types' the disassembler needs to deal with. NFCI We were frequently checking for a list of types and the different types conveyed no real information. So lump them together explicitly. llvm-svn: 292095	2017-01-16 06:49:03 +00:00
Craig Topper	3173a1f8ff	[AVX-512] Teach the disassembler about all of the EVEX gather and scatter instructions. llvm-svn: 292094	2017-01-16 05:44:33 +00:00
Craig Topper	33ac064137	[AVX-512] Begin giving the disassembler a way to recognize that VSIB is a different encoding than regular addressing modes. This part first teaches it not to check error if EVEX.V2 is used by a VSIB instruction. llvm-svn: 292093	2017-01-16 05:44:25 +00:00
Craig Topper	7dfd583644	[AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD with ZMM index. Similar for SCATTER and the prefetch gather and scatter instructions. Fixes PR31618. llvm-svn: 292088	2017-01-16 00:55:58 +00:00
Craig Topper	8be6ebce2b	[AVX-512] Fix register class in one of the gather/scatter memory operands so that all 32 bit registers can be allowed. llvm-svn: 292087	2017-01-16 00:55:50 +00:00
Simon Pilgrim	6ed996cdf0	[CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077	2017-01-15 20:44:00 +00:00
Michael Zuckerman	6baa3838e9	Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE. llvm-svn: 292066	2017-01-15 16:43:14 +00:00
Craig Topper	f1388ef006	[AVX-512] Remove unnecessary duplicate broadcast patterns. NFC llvm-svn: 292053	2017-01-15 06:15:45 +00:00
Craig Topper	52317e8b6e	[AVX-512] Replicate some broadcast patterns to VLX and disable the AVX2 patterns when VLX is available. llvm-svn: 292051	2017-01-15 05:47:45 +00:00
Craig Topper	c294cff863	[X86] Remove untested MOVDDUP patterns. These all involve bitcasts around the memory operands. This isn't something we normally do for isel patterns. I suspect DAG combine should convert the load type making this unnecessary. llvm-svn: 292050	2017-01-15 05:21:29 +00:00
Simon Pilgrim	d419b73a42	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed llvm-svn: 292023	2017-01-14 19:24:23 +00:00
Simon Pilgrim	8e5ecf8ad1	[X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator llvm-svn: 292021	2017-01-14 18:52:13 +00:00
Simon Pilgrim	b290805e94	[X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator llvm-svn: 292020	2017-01-14 18:08:54 +00:00
Simon Pilgrim	a1631749f8	[X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages llvm-svn: 292019	2017-01-14 17:13:52 +00:00
Craig Topper	63e2cd6caa	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Craig Topper	09b7e0f01d	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Craig Topper	9cc685a56e	[X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop. llvm-svn: 291996	2017-01-14 04:29:15 +00:00
Craig Topper	9850210d03	[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994	2017-01-14 04:19:35 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Simon Pilgrim	7f2a6d5e8c	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901	2017-01-13 13:16:19 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Michael Zuckerman	558a4d8419	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Craig Topper	1ec84c2a18	[AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885	2017-01-13 07:28:56 +00:00
Craig Topper	46b6ecf41e	[X86] Move some entries in the load folding tables to move appropriate grouping. NFC llvm-svn: 291884	2017-01-13 07:28:53 +00:00
Nikolai Bozhenov	f02ac0eeb2	[X86] Replace AND+IMM64 with SRL/SHL Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806	2017-01-12 19:54:27 +00:00
Nikolai Bozhenov	6bdf92cec7	[X86] Tune bypassing of slow division for Intel CPUs 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800	2017-01-12 19:34:15 +00:00
Craig Topper	24c3a2395f	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support. llvm-svn: 291747	2017-01-12 06:49:12 +00:00
Craig Topper	69ab67b279	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq. llvm-svn: 291746	2017-01-12 06:49:08 +00:00
Elad Cohen	c5ba925ef2	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745	2017-01-12 06:49:03 +00:00
Peter Collingbourne	1b5f1cfdb4	X86: Remove dead code. NFC. llvm-svn: 291721	2017-01-11 23:00:28 +00:00
Simon Pilgrim	0c1faf432b	Remove trailing whitespace. NFCI. llvm-svn: 291680	2017-01-11 16:38:20 +00:00
Elena Demikhovsky	9d0e7c33d3	X86 CodeGen: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670	2017-01-11 12:59:32 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Elad Cohen	0c2601073e	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d} The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660	2017-01-11 09:11:48 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Hans Wennborg	6573976f57	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640	2017-01-11 01:36:57 +00:00
Hans Wennborg	12de693747	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630	2017-01-11 00:49:54 +00:00
Michael Zuckerman	bcd03e7f3b	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584	2017-01-10 18:57:17 +00:00
Craig Topper	d55b83128b	AMD family 17h (znver1) enablement Summary: This patch enables the following 1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu). 2. ISAs that are enabled for "znver1" architecture. 3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used. 4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17. 5. For the time being, it uses the btver2 scheduler model. 6. Test file is updated to check this flag. This item is linked to clang review item https://reviews.llvm.org/D28018 Patch by Ganesh Gopalasubramanian Reviewers: RKSimon, craig.topper Subscribers: vprasad, RKSimon, ashutosh.nema, llvm-commits Differential Revision: https://reviews.llvm.org/D28017 llvm-svn: 291543	2017-01-10 06:01:16 +00:00
Craig Topper	2ed461e5c4	[X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails. Fixes PR31593. llvm-svn: 291535	2017-01-10 04:12:24 +00:00
Michael Kuperstein	1559e8863e	Revert r291092 because it introduces a crash. See PR31589 for details. llvm-svn: 291478	2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov	d497d36083	X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB. Differential Revision: https://reviews.llvm.org/D28087 llvm-svn: 291473	2017-01-09 20:26:17 +00:00
Simon Pilgrim	0f23b2ba1a	[X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets. Cost model updates to follow. llvm-svn: 291451	2017-01-09 17:20:03 +00:00
Simon Pilgrim	d990cd371b	[X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw). Cost model updates to follow. llvm-svn: 291445	2017-01-09 15:15:45 +00:00
Craig Topper	96ab6fd2eb	[AVX-512] Change another pattern that was using BLENDM to use masked moves. A future patch will conver it back to BLENDM if its beneficial to register allocation. llvm-svn: 291419	2017-01-09 04:19:34 +00:00
Craig Topper	6393afce97	[AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415	2017-01-09 02:44:34 +00:00
Craig Topper	f51ba1e3da	[AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX. llvm-svn: 291402	2017-01-08 21:32:30 +00:00
Simon Pilgrim	e6d948b857	Strip trailing whitespace. llvm-svn: 291395	2017-01-08 18:37:42 +00:00
Simon Pilgrim	b2a80950fe	Fix line endings and strip trailing whitespace. llvm-svn: 291393	2017-01-08 16:45:39 +00:00
Sanjay Patel	bf51c8a975	[x86] fix usage of stale operands when lowering select I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392	2017-01-08 15:53:40 +00:00
Simon Pilgrim	9c58950eeb	[CostModel][X86] Fixed vXi8 uniform shift costs. The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391	2017-01-08 14:14:36 +00:00
Simon Pilgrim	1fa5487c05	[CostModel][X86] Moved legal uniform shift costs earlier. XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390	2017-01-08 13:12:03 +00:00
Craig Topper	5c46c7526e	[AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC llvm-svn: 291383	2017-01-08 05:46:21 +00:00
Simon Pilgrim	9681c407b4	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372	2017-01-07 22:27:43 +00:00
Craig Topper	a74e3088df	[AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions. We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator. llvm-svn: 291371	2017-01-07 22:20:34 +00:00
Craig Topper	81f20aa336	[AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions. All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP. We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation. llvm-svn: 291369	2017-01-07 22:20:26 +00:00
Craig Topper	da84ff3ed4	[AVX-512] Add masked forms of the alternate MOVDDUP patterns. I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now. llvm-svn: 291368	2017-01-07 22:20:23 +00:00
Simon Pilgrim	a470296367	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs. llvm-svn: 291366	2017-01-07 22:08:09 +00:00
Simon Pilgrim	82e3e05fe2	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365	2017-01-07 21:47:10 +00:00
Simon Pilgrim	e70644dab7	[CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion. llvm-svn: 291364	2017-01-07 21:33:00 +00:00
Simon Pilgrim	725997154d	[CostModel][X86] Merge separate AVX1 cost LUTs. NFCI. llvm-svn: 291355	2017-01-07 18:19:25 +00:00
Simon Pilgrim	a4109d6433	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets. llvm-svn: 291354	2017-01-07 17:54:10 +00:00
Simon Pilgrim	df7de7a87e	[CostModel][X86] Added missing AVX2 arithmetic costs. Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353	2017-01-07 17:27:39 +00:00
Simon Pilgrim	100eae1ee0	[CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI. llvm-svn: 291352	2017-01-07 17:03:51 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Craig Topper	42b848a683	[X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size. llvm-svn: 291338	2017-01-07 06:56:54 +00:00
Simon Pilgrim	3128d6b520	[X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI. Early step towards ignoring domain above a certain shuffle depth. llvm-svn: 291248	2017-01-06 17:34:30 +00:00
Simon Pilgrim	bd3c6824d4	[X86][SSE] Simplify float domain requirement in unary shuffle matching. The AVX1-only limit is never actually required in matchUnaryVectorShuffle llvm-svn: 291244	2017-01-06 17:00:59 +00:00
Simon Pilgrim	a08d7b9913	Remove trailing whitespace. NFCI. llvm-svn: 291240	2017-01-06 15:31:52 +00:00
Simon Pilgrim	9b8c7caf4e	[X86] Add X86Subtarget argument. NFCI. All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it. llvm-svn: 291239	2017-01-06 15:29:17 +00:00
Simon Pilgrim	d8333372bc	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs. Set the costs on the lowest target that supports the type. llvm-svn: 291229	2017-01-06 11:12:53 +00:00
Craig Topper	e86fb932ea	[AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp. llvm-svn: 291214	2017-01-06 05:18:48 +00:00
Simon Pilgrim	aa186c632d	[CostModel][X86] Tidyup arithmetic costs code. NFCI. Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187	2017-01-05 22:48:02 +00:00
Simon Pilgrim	4c050c2190	[CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI. llvm-svn: 291165	2017-01-05 19:42:43 +00:00
Simon Pilgrim	6f72eba606	Remove trailing whitespace. NFCI. llvm-svn: 291163	2017-01-05 19:24:25 +00:00
Simon Pilgrim	5b06e4d319	[CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI. llvm-svn: 291162	2017-01-05 19:19:39 +00:00
Simon Pilgrim	a8bf97569a	[CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI. Removes need for yet another LUT. llvm-svn: 291158	2017-01-05 19:01:50 +00:00
Simon Pilgrim	430d34fc14	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI. Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152	2017-01-05 18:36:48 +00:00
Simon Pilgrim	b01e844241	[CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149	2017-01-05 18:20:25 +00:00
Simon Pilgrim	f74700aa8c	[CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI. llvm-svn: 291146	2017-01-05 17:56:19 +00:00
Sanjay Patel	dea5a7bd53	less braces; NFC llvm-svn: 291126	2017-01-05 16:47:32 +00:00
Simon Pilgrim	bca02f9e20	[CostModel][X86] Add support for broadcast shuffle costs Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122	2017-01-05 15:56:08 +00:00
Zvi Rackover	4b7d724d62	[X86] Optimize vector shifts with variable but uniform shift amounts Summary: For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register. The lower 64-bits of the register are evaluated to determine the shift amount. This patch improves the construction of the vector containing the shift amount. Reviewers: craig.topper, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28353 llvm-svn: 291120	2017-01-05 15:11:43 +00:00
Simon Pilgrim	a62395a4bd	[CostModel][X86] Pulled out common type legalization code llvm-svn: 291109	2017-01-05 14:33:32 +00:00
Mohammed Agabaria	23599ba794	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Mohammed Agabaria	189e2d29ba	[Test Commit] fixing some format issue in X86TTI to match clang-format output. llvm-svn: 291095	2017-01-05 09:51:02 +00:00
Elena Demikhovsky	143cbc425b	AVX-512: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. Differential revision: https://reviews.llvm.org/D28216 llvm-svn: 291092	2017-01-05 08:21:09 +00:00
Eric Christopher	568c113ac0	Remove dead and unused variable NumSentinelElements. Fixes PR31529. llvm-svn: 290998	2017-01-04 20:05:18 +00:00
Simon Pilgrim	bb895f3e9c	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962	2017-01-04 14:01:33 +00:00
Simon Pilgrim	939b8cd708	[X86] Merged Reverse/Alternate shuffle cost tables. NFCI. As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode. llvm-svn: 290956	2017-01-04 12:08:41 +00:00
Florian Hahn	5815f6c53c	[framelowering] Skip dbg values when getting next/previous instruction. Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: aprantl, MatzeB, mkuper Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290955	2017-01-04 12:08:35 +00:00
Ayman Musa	02f9533823	[X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S\|D} instructions Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly. Differential Revision: https://reviews.llvm.org/D28138 llvm-svn: 290948	2017-01-04 08:21:54 +00:00
Simon Pilgrim	c76ea4b638	[X86] Attempt to pre-truncate arithmetic operations if useful In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947	2017-01-04 08:05:42 +00:00
Craig Topper	d0aa53b9ae	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946	2017-01-04 07:32:03 +00:00
Craig Topper	83115a809f	[AVX-512] Simplify code for creating 512-bit SHUF128 operations. We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942	2017-01-04 07:31:51 +00:00
Craig Topper	48d232d3e7	[X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872	2017-01-03 07:36:41 +00:00
Craig Topper	785e58fdc9	[AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871	2017-01-03 07:36:39 +00:00
Craig Topper	9496e3f916	[AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts. llvm-svn: 290870	2017-01-03 07:00:40 +00:00
Craig Topper	fa875a1d3d	[AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions. llvm-svn: 290869	2017-01-03 05:46:18 +00:00
Craig Topper	be9ef55152	[X86] Remove trailing whitespace and an unnecessary line wrap. NFC llvm-svn: 290867	2017-01-03 05:46:06 +00:00
Craig Topper	06bae884bd	[X86] Fix header comment. NFC llvm-svn: 290866	2017-01-03 05:46:05 +00:00
Craig Topper	c849172105	[AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation. llvm-svn: 290865	2017-01-03 05:46:02 +00:00
Craig Topper	0cda8bbf74	[AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864	2017-01-03 05:45:57 +00:00
Craig Topper	4d47c6ae57	[AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal. Hopefully we can improve that in future patches. llvm-svn: 290863	2017-01-03 05:45:46 +00:00
Dean Michael Berris	f7e7b938ea	[XRay] Merge instrumentation point table emission code into AsmPrinter. Summary: No need to have this per-architecture. While there, unify 32-bit ARM's behaviour with what changed elsewhere and start function names lowercase as per the coding standards. Individual entry emission code goes to the entry's own class. Fully tested on amd64, cross-builds on both ARMs and PowerPC. Reviewers: dberris Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28209 llvm-svn: 290858	2017-01-03 04:30:21 +00:00
Elena Demikhovsky	d96200d60a	Fixed shuffle-reverse cost on AVX-512. (This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812	2017-01-02 11:44:10 +00:00
Elena Demikhovsky	21706cbd24	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns. X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810	2017-01-02 10:37:52 +00:00
Reid Kleckner	cd46c1df80	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64" This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714	2016-12-29 17:07:10 +00:00
Reid Kleckner	c9e0a153cf	[COFF] Use 32-bit jump table entries in .rdata for Win64 Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694	2016-12-29 00:12:39 +00:00
Gadi Haber	19c4fc5e62	This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663	2016-12-28 10:12:48 +00:00
Craig Topper	e77e901130	[AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables. llvm-svn: 290591	2016-12-27 06:51:09 +00:00
Craig Topper	2da265b7bf	[AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select. llvm-svn: 290583	2016-12-27 05:30:14 +00:00
Craig Topper	89b3e0223f	[AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573	2016-12-27 03:46:05 +00:00
Craig Topper	83f2145c18	[AVX-512] Add isel patterns to turn native masked scalar add/sub/mul/div into masked instructions. llvm-svn: 290564	2016-12-27 01:56:24 +00:00
Craig Topper	5ef13ba18b	[AVX-512] Fix some patterns to use extended register classes. llvm-svn: 290536	2016-12-26 07:26:07 +00:00
Craig Topper	f56d985f77	[AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532	2016-12-26 01:40:17 +00:00
Michael Zuckerman	86602e85dd	revert commit 290516 llvm-svn: 290517	2016-12-25 12:45:18 +00:00
Michael Zuckerman	45aa420640	Commit try added new empty line llvm-svn: 290516	2016-12-25 12:01:34 +00:00
Florian Hahn	898127fe36	Revert r290423 because it broke the sanitizer-x86_64-linux-autoconf buildbot. llvm-svn: 290425	2016-12-23 12:26:11 +00:00
Florian Hahn	1d6b1a7b79	[framelowering] Skip dbg values when getting next/previous instruction. Summary: In mergeSPUpdates, debug values need to be ignored when getting the previous element, otherwise debug data could have an impact on codegen. In eliminateCallFramePseudoInstr, debug values after the erased element could have an impact on codegen and should be skipped. Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319) Reviewers: mkuper, MatzeB, aprantl Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D27688 llvm-svn: 290423	2016-12-23 11:35:00 +00:00
Wei Mi	f3f01aba48	Change the interface of TLI.isMultiStoresCheaperThanBitsMerge. This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363	2016-12-22 19:38:22 +00:00
Ayman Musa	9ff608cdc6	[X86][AVX2] Passing the appropriate memory operand class to VPMADDWD instruction. Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem. Differential Revision: https://reviews.llvm.org/D28024 llvm-svn: 290333	2016-12-22 08:42:46 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Michael Zuckerman	85e12d2851	revert first commit . removing empty line in X86.h llvm-svn: 290255	2016-12-21 12:48:01 +00:00
Michael Zuckerman	58838cf29d	First commit adding new line to X86.h llvm-svn: 290254	2016-12-21 12:44:47 +00:00
Elena Demikhovsky	7c7bf1b432	Added a template for building target specific memory node in DAG. I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250	2016-12-21 10:43:36 +00:00
Oren Ben Simhon	cb692157b7	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing a warning. llvm-svn: 290248	2016-12-21 09:47:31 +00:00
Oren Ben Simhon	de2eea7298	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing build issues. llvm-svn: 290244	2016-12-21 08:59:42 +00:00
Oren Ben Simhon	3b95157090	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240	2016-12-21 08:31:45 +00:00
Simon Pilgrim	688114d888	[X86][SSE] Ensure we're only combining shuffles with legal mask types. I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types. llvm-svn: 290183	2016-12-20 17:09:52 +00:00
Michael LeMay	20565e2aa1	[TargetInstrInfo] replace redundant expression in getMemOpBaseRegImmOfs Summary: The expression for computing the return value of getMemOpBaseRegImmOfs has only one possible value. The other value would result in a return earlier in the function. This patch replaces the expression with its only possible value. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27437 llvm-svn: 290133	2016-12-19 21:02:41 +00:00
Craig Topper	1fd4196337	[X86] When recognizing vector loads or VZEXT_LOAD in selectScalarSSELoad make sure we pass the load's user rather than load itself to the second operand of IsLegalToFold. llvm-svn: 290089	2016-12-19 08:35:56 +00:00
Craig Topper	375aa90291	[X86] Remove all of the patterns that use X86ISD:FAND/FXOR/FOR/FANDN except for the ones needed for SSE1. Anything SSE2 or above uses the integer ISD opcode. This removes 11721 bytes from the DAG isel table or 2.2% llvm-svn: 290073	2016-12-19 00:42:28 +00:00
Daniel Jasper	373f9a6a0c	Revert r289955 and r289962. This is causing lots of ASAN failures for us. Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066	2016-12-18 14:36:38 +00:00
Michael Zuckerman	4b88a770ef	[X86] [AVX512] Minor fix in encoding of scalar EVEX instructions. NFC. Commit on behalf of Gadi Haber Removed EVEX_V512 prefix from scalar EVEX instructions since HW ignores L'L bits anyway (LIG). 4 instructions are modified. The changed encodings are validated with XED. Rviewers: delena, igorb Differential revision: https://reviews.llvm.org/D27802 llvm-svn: 290065	2016-12-18 14:29:00 +00:00
Simon Pilgrim	e940daf532	[X86][SSE] Add support for combining target shuffles to SHUFPS. As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064	2016-12-18 14:26:02 +00:00
Craig Topper	7029db0eaa	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060	2016-12-18 07:54:23 +00:00
Craig Topper	add9cc697a	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057	2016-12-18 06:23:14 +00:00
Craig Topper	2baef8f466	[AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops for scalars. I missed this in r290049. llvm-svn: 290055	2016-12-18 04:17:00 +00:00
Craig Topper	d3295c6a3a	[AVX-512] Use EVEX encoded logic operations for scalar types when they are available. This gives the register allocator more registers to work with. llvm-svn: 290049	2016-12-17 19:26:00 +00:00
Hans Wennborg	ef57755427	Fix -Wself-assign from r289955 llvm-svn: 289962	2016-12-16 17:16:46 +00:00
Hans Wennborg	35f21cba13	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955	2016-12-16 16:34:59 +00:00
Simon Pilgrim	4b73c3de50	[X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950	2016-12-16 15:23:32 +00:00
Simon Pilgrim	9519bd9232	[X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946	2016-12-16 14:30:04 +00:00
Simon Pilgrim	f159a3414f	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain. We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937	2016-12-16 11:48:51 +00:00
Ahmed Bougacha	5228603387	[GlobalISel] Drop workaround for Legalizer member/class sharing a name. NFC. MachineLegalizer used to be the name of both the class and the member, causing GCC errors. r276522 fixed that by renaming the member to just 'Legalizer'. The 'class' workaround isn't necessary anymore; drop it. llvm-svn: 289848	2016-12-15 18:45:30 +00:00
Sanjay Patel	a97358bc8e	[x86] use a single shufps for 256-bit vectors when it can save instructions This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846	2016-12-15 18:43:46 +00:00
Sanjay Patel	a0d8a278a7	[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837	2016-12-15 18:03:38 +00:00
Simon Pilgrim	7522f54feb	[X86][SSE] Fix domains for scalar store instructions As discussed on D27692 llvm-svn: 289834	2016-12-15 17:09:24 +00:00
Simon Pilgrim	ba46422694	[X86][AVX512] Moved instruction domain lookups to the right table. NFCI. Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830	2016-12-15 16:38:51 +00:00
Simon Pilgrim	d7518896ff	[X86][SSE] Fix domains for VZEXT_LOAD type instructions Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825	2016-12-15 16:05:29 +00:00
Simon Pilgrim	2f7f0e7a48	[CostModel][X86] Updated reverse shuffle costs llvm-svn: 289819	2016-12-15 14:24:07 +00:00
Michael Zuckerman	1ce2a23a1e	Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945. Test explanation: Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP. In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers"). Missed optimization opportunity: vpcmpled %zmm0, %zmm1, %k0 knotw %k0, %k1 can be combine to vpcmpgtd %zmm0, %zmm2, %k1 Reviewers: 1. delena 2. igorb Commited after check all Differential Revision: https://reviews.llvm.org/D27160 llvm-svn: 289653	2016-12-14 14:57:10 +00:00
Stephan Bergmann	17c7f70362	Replace APFloatBase static fltSemantics data members with getter functions At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647	2016-12-14 11:57:17 +00:00
Philip Reames	1f1bbac8da	[peephole] Enhance folding logic to work for STATEPOINTs The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510	2016-12-13 01:38:41 +00:00
Sanjay Patel	62104ee6d9	[x86] fix formatting; NFC llvm-svn: 289476	2016-12-12 22:31:01 +00:00
Simon Pilgrim	4cbe1834e4	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	5ebd2b542b	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	369cd349b9	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Craig Topper	36ecce9bed	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424	2016-12-12 07:57:24 +00:00
Craig Topper	f2c6f7abf3	[X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423	2016-12-12 07:57:21 +00:00
Craig Topper	081c0e2864	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Simon Pilgrim	831435cb14	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Ayman Musa	7ec4ed55d3	[X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404	2016-12-11 20:11:17 +00:00
Oren Ben Simhon	9683ecbff6	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Craig Topper	e7166ce237	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Craig Topper	1f1b441267	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350	2016-12-11 01:26:44 +00:00
Craig Topper	edab02b50b	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344	2016-12-10 23:09:43 +00:00
Simon Pilgrim	a03e350e69	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Craig Topper	abe7c5b5e9	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics. llvm-svn: 289340	2016-12-10 21:15:52 +00:00
Simon Pilgrim	fb58550d73	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	18b57da491	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	8e288e0b68	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	85f0e57c33	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Craig Topper	a39b650d72	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316	2016-12-10 06:02:48 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Craig Topper	38b1b5d44f	[X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match. sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case. There is a test case that does depend on this behavior. llvm-svn: 289193	2016-12-09 07:57:21 +00:00
Craig Topper	a55b483bb5	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Craig Topper	c4f2b0996d	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Craig Topper	2aeb456425	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Matthias Braun	0c989a893b	LivePhysReg: Use reference instead of pointer in init(); NFC llvm-svn: 289002	2016-12-08 00:15:51 +00:00
Michael Kuperstein	5842b20633	[X86] Skip over DEBUG_VALUE while looking for start of call sequence If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g code. This fixes PR31242. Differential Revision: https://reviews.llvm.org/D27485 llvm-svn: 288965	2016-12-07 19:31:08 +00:00
Michael Kuperstein	18092cf2c3	[X86] Do not assume "ri" instructions always have an immediate operand The second operand of an "ri" instruction may be an immediate, but it may also be a globalvariable, so we should make any assumptions. This fixes PR31271. Differential Revision: https://reviews.llvm.org/D27481 llvm-svn: 288964	2016-12-07 19:29:18 +00:00
Simon Pilgrim	c3c6463ce0	[X86][SSE] Remove AND -> VZEXT combine This is now performed more generally by the target shuffle combine code. Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....). Differential Revision: https://reviews.llvm.org/D27510 llvm-svn: 288918	2016-12-07 17:02:41 +00:00
Simon Pilgrim	8893bd95f0	[X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain. Differential Revision: https://reviews.llvm.org/D27419 llvm-svn: 288902	2016-12-07 12:10:49 +00:00
Simon Pilgrim	d5bc5c16b2	[X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296) The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version. Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries. llvm-svn: 288898	2016-12-07 11:19:00 +00:00
Zvi Rackover	8bc7e4da51	[X86] Prefer reduced width multiplication over pmulld on Silvermont Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844	2016-12-06 19:35:20 +00:00
Ayman Musa	86c00b799f	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804	2016-12-06 12:24:14 +00:00
Florian Hahn	7582c669bd	[framelowering] Improve tracking of first CS pop instruction. Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated. Reviewers: mkuper, aprantl, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27343 llvm-svn: 288794	2016-12-06 10:24:55 +00:00
Craig Topper	b34eef7b41	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern. This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784	2016-12-06 08:08:12 +00:00
Craig Topper	683470bf1b	[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781	2016-12-06 08:08:04 +00:00
Craig Topper	5fc7bc91f9	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779	2016-12-06 08:07:58 +00:00
Craig Topper	6413f8a8f2	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Michael Kuperstein	e3036abcf9	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703	2016-12-05 20:57:37 +00:00
Sanjay Patel	f807f6a05f	[x86] fold fand (fxor X, -1) Y --> fandn X, Y I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675	2016-12-05 15:45:27 +00:00
Simon Pilgrim	5e922eb0a3	Use range based for loop. NFCI. llvm-svn: 288671	2016-12-05 14:25:04 +00:00
Simon Pilgrim	b08c98f125	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH. llvm-svn: 288663	2016-12-05 11:25:13 +00:00
Simon Pilgrim	20b1409f35	[X86][SSE] Add helper function to create UNPCKL/UNPCKH shuffle masks. NFCI. llvm-svn: 288659	2016-12-05 11:00:25 +00:00
Craig Topper	088ba17f88	[X86] Remove unnecessary explicit uses of .SimpleTy just to do an equality comparison. MVT's operator== already takes care of this. NFCI llvm-svn: 288646	2016-12-05 06:09:55 +00:00
Craig Topper	db8467ae26	[AVX-512] Teach fast isel to handle 512-bit vector bitcasts. llvm-svn: 288641	2016-12-05 05:50:51 +00:00
Craig Topper	7ef6ea324a	[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel. llvm-svn: 288636	2016-12-05 04:51:31 +00:00
Craig Topper	9d16bfa0f5	[AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table. llvm-svn: 288591	2016-12-03 19:37:39 +00:00
Craig Topper	c210827b53	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables. llvm-svn: 288587	2016-12-03 17:19:15 +00:00
Craig Topper	8e7498976a	[X86] Fix VEX encoded VPMADDUBSW to not be marked commutable. This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241. llvm-svn: 288578	2016-12-03 05:35:44 +00:00
Simon Pilgrim	9cb74267ac	Tidyup code with indentation and clang-format. NFCI. llvm-svn: 288505	2016-12-02 15:44:30 +00:00
Simon Pilgrim	cbf5f97018	[X86][SSE] Add support for extracting constant bit data from broadcasted constants llvm-svn: 288499	2016-12-02 13:16:08 +00:00
Simon Pilgrim	b3ae416839	[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI. getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well. Converted Constant bit extraction and Bitset splitting to helper lambda functions. llvm-svn: 288496	2016-12-02 11:58:05 +00:00
Craig Topper	4961fa9bba	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. llvm-svn: 288484	2016-12-02 07:57:11 +00:00
Craig Topper	17ddb521ef	[AVX-512] Add EVEX PSHUFB instructions to load folding tables. llvm-svn: 288482	2016-12-02 07:06:30 +00:00
Craig Topper	f7866fad54	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. llvm-svn: 288481	2016-12-02 06:24:38 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Benjamin Kramer	215b22e612	Fix unused variable warning in Release builds. NFC. llvm-svn: 288416	2016-12-01 20:49:34 +00:00
David L Kreitzer	0e3ae305b6	Refactored X86InterleavedAccess into a class. NFCI. Patch by Farhana Aleen Differential Revision: https://reviews.llvm.org/D25986 llvm-svn: 288410	2016-12-01 19:56:39 +00:00
Matthias Braun	d0ee66c2e9	Move most EH from MachineModuleInfo to MachineFunction Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405	2016-12-01 19:32:15 +00:00
Simon Pilgrim	17d5b6b493	[X86][SSE] Moved shuffle mask widening/narrowing helper functions earlier in the file. Will be necessary for a future patch. llvm-svn: 288395	2016-12-01 18:27:19 +00:00
Simon Pilgrim	5fe6236035	[X86][SSE] Classify AND bitmasks as variable shuffle masks They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask. llvm-svn: 288367	2016-12-01 16:00:14 +00:00
Simon Pilgrim	1e4d870999	[X86][SSE] Add support for combining AND bitmasks to shuffles. llvm-svn: 288365	2016-12-01 15:41:40 +00:00
Asaf Badouh	7f6968ed0a	[LMT] Restrict nop length to one not all lakemont MCU support long nop. we can't assume we can generate long nop by default for MCU. Differential Revision: https://reviews.llvm.org/D26895 llvm-svn: 288363	2016-12-01 15:19:10 +00:00
Daniel Jasper	19b9284f1d	Silence GCC's -Wenum-compare after r288335 in the same way it is done in X86FastISel.cpp. llvm-svn: 288337	2016-12-01 14:33:50 +00:00
Simon Pilgrim	55066e5622	[X86][SSE] Add support for combining target shuffles to AND bitmasks. llvm-svn: 288335	2016-12-01 13:47:02 +00:00
Simon Pilgrim	947650e99d	[X86][SSE] Add support for combining ISD::AND with shuffles. Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask. llvm-svn: 288333	2016-12-01 11:52:37 +00:00
Eric Christopher	e70b7c3dfb	Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction" This apprears to have broken the global isel bot: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console This reverts commit r288293. llvm-svn: 288322	2016-12-01 07:50:12 +00:00
Matthias Braun	ed14cb0604	Move most EH from MachineModuleInfo to MachineFunction Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288293	2016-11-30 23:49:01 +00:00
Matthias Braun	f23ef437cc	Move FrameInstructions from MachineModuleInfo to MachineFunction This is per function data so it is better kept at the function instead of the module. This is a necessary step to have machine module passes work properly. Differential Revision: https://reviews.llvm.org/D27185 llvm-svn: 288291	2016-11-30 23:48:42 +00:00
Paul Robinson	78a695321e	[PS4] Tighten up a triple check. llvm-svn: 288286	2016-11-30 23:14:27 +00:00
Matthias Braun	c52fe2961c	Clarify rules for reserved regs, fix aarch64 ones. No test case necessary as the problematic condition is checked with the newly introduced assertAllSuperRegsMarked() function. Differential Revision: https://reviews.llvm.org/D26648 llvm-svn: 288277	2016-11-30 22:17:10 +00:00
Simon Pilgrim	288c088c17	[X86][SSE] Add support for target shuffle constant folding Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future. I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction. Differential Revision: https://reviews.llvm.org/D27220 llvm-svn: 288250	2016-11-30 16:33:46 +00:00
Simon Pilgrim	edccc1254b	Avoid repeated calls to MVT getSizeInBits and getScalarSizeInBits(). NFCI. llvm-svn: 288170	2016-11-29 17:57:48 +00:00
Simon Pilgrim	001368abc8	[X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI. llvm-svn: 288147	2016-11-29 15:32:58 +00:00
Simon Pilgrim	35c47c494d	[X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX. We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output. llvm-svn: 288140	2016-11-29 14:18:51 +00:00
Simon Pilgrim	923020a652	Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI. llvm-svn: 288138	2016-11-29 13:43:08 +00:00
Matthias Braun	115efcd3d1	MachineScheduler: Export function to construct "default" scheduler. This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057	2016-11-28 20:11:54 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Sanjay Patel	100bc01a72	[x86] fix formatting; NFC llvm-svn: 288045	2016-11-28 17:39:21 +00:00
Simon Pilgrim	3f10e66981	[X86][SSE] Added support for combining bit-shifts with shuffles. Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040	2016-11-28 16:25:01 +00:00
Craig Topper	17786f77f0	[X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output. llvm-svn: 288011	2016-11-27 21:37:04 +00:00
Craig Topper	13b27a2748	[X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns. llvm-svn: 288010	2016-11-27 21:37:02 +00:00
Craig Topper	ff9d45875a	[X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions. llvm-svn: 288009	2016-11-27 21:37:00 +00:00
Craig Topper	3674f44e40	[X86] Add SHL by 1 to the load folding tables. I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007	2016-11-27 21:36:54 +00:00
Simon Pilgrim	91d6f5fbc1	[X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts llvm-svn: 288006	2016-11-27 21:08:19 +00:00
Craig Topper	4fab487265	[AVX-512] Add integer and fp unpck instructions to load folding tables. llvm-svn: 288004	2016-11-27 19:51:41 +00:00
Simon Pilgrim	cdb2ce661d	[X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI. Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit). llvm-svn: 288003	2016-11-27 19:28:39 +00:00
Craig Topper	7ad961cc70	[X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001	2016-11-27 18:51:13 +00:00
Craig Topper	c3b3926f8b	[AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287995	2016-11-27 08:55:31 +00:00
Craig Topper	fb64a25ba1	[X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991	2016-11-27 01:52:51 +00:00
Craig Topper	837ff25da1	[X86] Remove hasOneUse check that is redundant with the one in IsProfitableToFold. llvm-svn: 287987	2016-11-26 18:43:26 +00:00
Craig Topper	e266e126ff	[X86] Fix the zero extending load detection in X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold. Previously we were passing the SCALAR_TO_VECTOR node. llvm-svn: 287986	2016-11-26 18:43:24 +00:00
Craig Topper	d3ab1a3905	[X86] Simplify control flow. NFCI llvm-svn: 287985	2016-11-26 18:43:21 +00:00
Craig Topper	991d1ca3ba	[X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times. Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied. Reviewers: RKSimon, zvi, delena, spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D26790 llvm-svn: 287983	2016-11-26 17:29:25 +00:00
Craig Topper	10d5eec1a1	[AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287975	2016-11-26 08:21:52 +00:00
Craig Topper	97169ea5f9	[AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables. llvm-svn: 287974	2016-11-26 08:21:48 +00:00
Craig Topper	53b33de1e3	[AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables. llvm-svn: 287972	2016-11-26 07:21:00 +00:00
Craig Topper	6677bb4e50	[AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971	2016-11-26 07:20:57 +00:00
Craig Topper	39265bb1ce	[AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables. llvm-svn: 287970	2016-11-26 07:20:53 +00:00
Craig Topper	7f76c23781	[X86][XOP] Add a reversed reg/reg form for VPROT instructions. The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions. llvm-svn: 287961	2016-11-26 02:14:00 +00:00
Craig Topper	516fd7abfe	[X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960	2016-11-26 02:13:58 +00:00
Craig Topper	a363d42973	[AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956	2016-11-25 23:21:34 +00:00
Simon Pilgrim	8e8ae7219f	Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI llvm-svn: 287940	2016-11-25 17:19:53 +00:00
Craig Topper	88071b37ab	[AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939	2016-11-25 16:48:05 +00:00
Craig Topper	1e48829747	[AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables. llvm-svn: 287937	2016-11-25 16:33:53 +00:00
Craig Topper	d4091494d3	[X86] Invert an 'if' and early out to fix a weird indentation. NFCI llvm-svn: 287909	2016-11-25 02:29:24 +00:00
Craig Topper	a46936185a	[X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI llvm-svn: 287908	2016-11-25 02:29:21 +00:00
Simon Pilgrim	f1ee930db0	Fix unused variable warning llvm-svn: 287889	2016-11-24 15:24:47 +00:00
Benjamin Kramer	fc54e35d94	[X86] Don't round trip a unique_ptr through a raw pointer for assignment. No functional change. llvm-svn: 287888	2016-11-24 15:17:39 +00:00
Simon Pilgrim	9c71e07276	[X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64 Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit). The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32. Differential Revision: https://reviews.llvm.org/D26938 llvm-svn: 287886	2016-11-24 15:12:56 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	7c26a6f9ef	[X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result llvm-svn: 287878	2016-11-24 14:02:30 +00:00
Simon Pilgrim	ab323ec411	[X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering llvm-svn: 287877	2016-11-24 13:38:59 +00:00
Nikolai Bozhenov	3a8d108b2b	[x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875	2016-11-24 13:23:35 +00:00
Nikolai Bozhenov	bb64aa14a3	[x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter Move the definitions of three variables out of the switch. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25192 llvm-svn: 287874	2016-11-24 13:15:49 +00:00
Nikolai Bozhenov	a2dabed3b6	[x86] Rewrite getAddressFromInstr helper function - It does not modify the input instruction - Second operand of any address is always an Index Register, make sure we actually check for that, instead of a check for an immediate value Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D24938 llvm-svn: 287873	2016-11-24 13:05:43 +00:00
Simon Pilgrim	a3af79678e	[X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870	2016-11-24 12:13:46 +00:00
Simon Pilgrim	3ce6a545c7	[X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq llvm-svn: 287835	2016-11-23 22:35:06 +00:00
Michael Kuperstein	47eb85a003	[X86] Allow folding of stack reloads when loading a subreg of the spilled reg We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792	2016-11-23 18:33:49 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Craig Topper	f57e17def0	[AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles. llvm-svn: 287744	2016-11-23 06:54:55 +00:00
Zvi Rackover	14aba43ea9	[X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors. Reviewers: RKSimon, craig.topper, delena, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26985 llvm-svn: 287743	2016-11-23 06:45:25 +00:00
Kuba Mracek	06995e866b	[xray] Add XRay support for Mach-O in CodeGen Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support. Differential Revision: https://reviews.llvm.org/D26983 llvm-svn: 287734	2016-11-23 02:07:04 +00:00
Simon Pilgrim	4aa876ca7c	[X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles. This occurs during UINT_TO_FP v2f64 lowering. We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle llvm-svn: 287676	2016-11-22 17:50:06 +00:00
Tim Northover	b64fb453ea	CodeGen: simplify TargetMachine::getSymbol interface. NFC. No-one actually had a mangler handy when calling this function, and getSymbol itself went most of the way towards getting its own mangler (with a local TLOF variable) so forcing all callers to supply one was just extra complication. llvm-svn: 287645	2016-11-22 16:17:20 +00:00
Zvi Rackover	9a355219d1	[X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC. llvm-svn: 287644	2016-11-22 15:33:28 +00:00
Zvi Rackover	0aa1c32d14	[X86] Remove dead code from LowerVectorBroadcast Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish. Reviewers: craig.topper, delena, RKSimon, andreadb Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D26678 llvm-svn: 287643	2016-11-22 15:17:52 +00:00
Coby Tayree	49b3733d57	[AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly. This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]"). GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches: "vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier. "vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier. This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand. For "vaddps xmm1, xmm2, [a]" "char a;" will imply "BYTE PTR" qualifier "short a;" will imply "WORD PTR" qualifier. This commit aligns LLVM to GCC’s behavior. This is the LLVM part of the review. The Clang part of the review: https://reviews.llvm.org/D26587 Differential Revision: https://reviews.llvm.org/D26586 llvm-svn: 287630	2016-11-22 09:30:29 +00:00
Craig Topper	3dcf45f08d	[X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version. I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX. There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table. llvm-svn: 287622	2016-11-22 05:31:43 +00:00
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Craig Topper	da22267055	[AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type Summary: Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking. This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018. I plan to add support for more operations in future patches. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26902 llvm-svn: 287612	2016-11-22 03:51:53 +00:00
Coby Tayree	94ddbb4a04	small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax. Differential Revision: https://reviews.llvm.org/D26913 llvm-svn: 287548	2016-11-21 15:50:56 +00:00
Simon Pilgrim	b7bbaa669b	[X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input). This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit. llvm-svn: 287535	2016-11-21 12:05:49 +00:00
Michael Zuckerman	8462faeaba	Fixing a small typo (A->U). This seem to fixes PR30992. - HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX + HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX llvm-svn: 287532	2016-11-21 11:52:11 +00:00
Craig Topper	9f2d632ee7	[AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX. llvm-svn: 287523	2016-11-21 07:51:31 +00:00
Craig Topper	0dfc09372f	[X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC llvm-svn: 287519	2016-11-21 04:07:56 +00:00
Coby Tayree	99a6639047	The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it. More specifically - (MS dialect) broadcasting variants were implemented falsely. Differential Revision: https://reviews.llvm.org/D26257 llvm-svn: 287501	2016-11-20 17:19:55 +00:00
Coby Tayree	97e9cf62f4	Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list: vcvtps2pd vcvtudq2pd vcvtps2qq vcvttps2qq vcvtps2uqq vcvttps2uqq variants are: [Dst]XMM(zero-masked/merge-masked/unmasked) [Src]Mem64 Differential Revision: https://reviews.llvm.org/D26799 llvm-svn: 287500	2016-11-20 17:09:56 +00:00
Simon Pilgrim	5fadce4a3f	[X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible llvm-svn: 287497	2016-11-20 16:11:36 +00:00
Simon Pilgrim	5401bae523	[X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines llvm-svn: 287496	2016-11-20 15:24:38 +00:00
Simon Pilgrim	3f40412e0f	[X86][AVX512] Add support for VBMI VPERMV target shuffle combines llvm-svn: 287495	2016-11-20 15:05:45 +00:00
Simon Pilgrim	c17e1b74b8	[X86][AVX512VL] Removed duplicate operation action Basic AVX512F already declared uint_to_fp v4i32 as legal llvm-svn: 287493	2016-11-20 14:19:29 +00:00
Simon Pilgrim	3f10e9953d	Strip trailing whitespace llvm-svn: 287492	2016-11-20 14:05:23 +00:00
Simon Pilgrim	096b6d4f81	[X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64) llvm-svn: 287491	2016-11-20 14:03:23 +00:00
Oren Ben Simhon	c0f073b67f	[X86] RegCall - Handling long double arguments The change is part of RegCall calling convention support for LLVM. Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack). This review present the change and the corresponding tests. Differential Revision: https://reviews.llvm.org/D26151 llvm-svn: 287485	2016-11-20 11:06:07 +00:00
Coby Tayree	179ff0e541	[X86][InlineAsm]Test commit. Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false" Differential Revision: https://reviews.llvm.org/D26797 llvm-svn: 287484	2016-11-20 09:31:11 +00:00
Simon Pilgrim	a14e0cb852	[X86][SSE] Improve PSHUFB lowering from either input Canonicalization may leave the zeroable vector in the first input. llvm-svn: 287461	2016-11-19 20:41:48 +00:00
Simon Pilgrim	623a7c57b5	[X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets llvm-svn: 287459	2016-11-19 20:12:34 +00:00
Craig Topper	893ea9fb2c	[X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI llvm-svn: 287443	2016-11-19 17:33:17 +00:00
Simon Pilgrim	7938bd666e	Cleanup function with clang-format. NFCI. llvm-svn: 287340	2016-11-18 12:16:18 +00:00
Craig Topper	02b5a1b50f	[AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects. The same thing was done to 32-bit and 64-bit element sizes previously. This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics. llvm-svn: 287312	2016-11-18 05:04:44 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Simon Pilgrim	9d15fb3c10	Fix spelling mistakes in X86 target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287247	2016-11-17 19:03:05 +00:00
Simon Pilgrim	67ef3b984a	Wdocumentation fix llvm-svn: 287224	2016-11-17 12:21:45 +00:00
Simon Pilgrim	8eca5520dc	[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves. If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer). This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases). Partial fix for PR30845 Differential Revision: https://reviews.llvm.org/D26590 llvm-svn: 287223	2016-11-17 12:14:49 +00:00
Oren Ben Simhon	489d6eff4f	[X86] RegCall - Handling v64i1 in 32/64 bit target Register Calling Convention defines a new behavior for v64i1 types. This type should be saved in GPR. However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit). Differential Revision: https://reviews.llvm.org/D26181 llvm-svn: 287217	2016-11-17 09:59:40 +00:00
Craig Topper	05b0fcd168	[X86] Fix formatting. NFC llvm-svn: 287211	2016-11-17 05:59:55 +00:00
Sanjay Patel	066139a3ec	[x86] allow FP-logic ops when one operand is FP and result is FP We save an inter-register file move this way. If there's any CPU where the FP logic is slower, we could transform this back to int-logic in MachineCombiner. This helps, but doesn't solve, PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 The 'andn' test shows that we're missing a pattern match to recognize the xor with -1 constant as a 'not' op. llvm-svn: 287171	2016-11-16 22:34:05 +00:00
Peter Collingbourne	7d0c869b86	X86: Simplify X86ISD::Wrapper operand checks. NFCI. We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and invert the other. Also update the documentation comment for X86ISD::Wrapper. Differential Revision: https://reviews.llvm.org/D26731 llvm-svn: 287160	2016-11-16 21:48:59 +00:00
Sanjay Patel	7f3d51f840	[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122	2016-11-16 17:42:40 +00:00
Simon Pilgrim	b57dd17142	[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen. LLVM counterpart to D26686 Differential Revision: https://reviews.llvm.org/D26736 llvm-svn: 287108	2016-11-16 14:48:32 +00:00
Ayman Musa	4d60243bfd	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics. Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 287087	2016-11-16 09:00:28 +00:00
Craig Topper	6910fa0ef4	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083	2016-11-16 05:24:10 +00:00
Sanjay Patel	73d1d35d21	fix formatting; NFC llvm-svn: 286989	2016-11-15 17:47:13 +00:00
Simon Pilgrim	ceffb43b1b	[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum) This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979	2016-11-15 16:24:40 +00:00
Zvi Rackover	6f76f46d2c	[X86][FastISel] Assert that we are dealing with arithmetic with overflow intrinsics. NFC llvm-svn: 286961	2016-11-15 13:50:35 +00:00
Zvi Rackover	f0b9b57bd3	[X86][FastISel] Fix lowering of overflow result on AVX512 targets Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958	2016-11-15 13:29:23 +00:00
Zvi Rackover	76dbf26599	[X86][GlobalISel] Add minimal call lowering support to the IRTranslator Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935	2016-11-15 06:34:33 +00:00
Simon Pilgrim	779da8e5ea	[CostModel][X86] Added mul costs for vXi8 vectors More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838	2016-11-14 15:54:24 +00:00

... 9 10 11 12 13 ...

15063 Commits