llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	75eb0af874	[X86] Correct the memory operand for the FLD emitted in FP_TO_INTHelper for 32-bit SSE targets. We were using DstTy, but that represents the integer type we are converting to which is i64 in this case. The FLD is part of an intermediate step to get from the SSE registers to the x87 registers. If the floating point type is f32, the memory operand should reflect a 4 byte access not an 8 byte access. The store we used to get from SSE to the stack is using the corect size. While there, consistenly use TheVT in place of Op.getOperand(0).getValueType() throughout the function. llvm-svn: 353745	2019-02-11 20:38:10 +00:00
Sam McCall	e825ba9165	Revert "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" This reverts commit r353610. It causes a miscompile visible in macro expansion in a bootstrapped clang. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190211/626590.html llvm-svn: 353699	2019-02-11 14:05:36 +00:00
Simon Pilgrim	f6e6c369c0	[X86] EltsFromConsecutiveLoads - replace SmallBitVector with APInt (NFC). Minor refactor to simplify some incoming patches to improve broadcast loads. llvm-svn: 353655	2019-02-10 22:45:48 +00:00
Sanjay Patel	833550fc74	[x86] narrow 256-bit horizontal ops via demanded elements 256-bit horizontal math ops are an x86 monstrosity (and thankfully have not been extended to 512-bit AFAIK). The two 128-bit halves operate on separate halves of the inputs. So if we don't demand anything in the upper half of the result, we can extract the low halves of the inputs, do the math, and then insert that result into a 256-bit output. All of the extract/insert is free (ymm<-->xmm), so we're left with a narrower (cheaper) version of the original op. In the affected tests based on: https://bugs.llvm.org/show_bug.cgi?id=33758 https://bugs.llvm.org/show_bug.cgi?id=38971 ...we see that the h-op narrowing can result in further narrowing of other math via existing generic transforms. I originally drafted this patch as an exact pattern match starting from extract_vector_elt, but I thought we might see diffs starting from extract_subvector too, so I changed it to a more general demanded elements solution. There are no extra existing regression test improvements from that switch though, so we could go back. Differential Revision: https://reviews.llvm.org/D57841 llvm-svn: 353641	2019-02-10 15:22:06 +00:00
Simon Pilgrim	6bf7b30b10	[X86] CombineOr - fold to generic funnel shifts As discussed on D57389, this is a first step towards moving the SHLD/SHRD matching code to DAGCombiner using FSHL/FSHR instead. There's a bit of work to do before I can do that, so this just folds to FSHL/FSHR in the existing code (handling the different SHRD/FSHR argument ordering), which fixes the issue we had with i16 shift amounts not being correctly masked. llvm-svn: 353626	2019-02-09 20:34:59 +00:00
Simon Pilgrim	690a2889d8	[X86][SSE] Generalize X86ISD::BLENDI support to more value types D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 353610	2019-02-09 13:13:59 +00:00
Sanjay Patel	e9cc26a56a	[x86] fix formatting; NFC (test commit #2 migrating to git) llvm-svn: 353533	2019-02-08 16:48:40 +00:00
Craig Topper	738180cc7f	Fix the lowering issue of intrinsics llvm.localaddress on X86 Patch by Yuanke Luo Reviewers: craig.topper, annita.zhang, smaslov, rnk, wxiao3 Reviewed By: rnk Subscribers: efriedma, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57501 llvm-svn: 353492	2019-02-08 01:14:12 +00:00
Sanjay Patel	81f859d169	[x86] fix formatting; NFC llvm-svn: 353477	2019-02-07 22:36:55 +00:00
Simon Pilgrim	fe3ac70b18	[DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111) Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111. Differential Revision: https://reviews.llvm.org/D57754 llvm-svn: 353457	2019-02-07 20:14:43 +00:00
Sanjay Patel	a5c4a5e958	[x86] split more 256/512-bit shuffles in lowering This is intentionally a small step because it's hard to know exactly where we might introduce a conflicting transform with the code that tries to form wider shuffles. But I think this is safe - if we have a wide shuffle with 2 operands, then we should do better with an extract + narrow shuffle. Differential Revision: https://reviews.llvm.org/D57867 llvm-svn: 353427	2019-02-07 17:10:49 +00:00
Nirav Dave	84e5bf0c95	[X86] Simplify casing. NFC. llvm-svn: 353417	2019-02-07 15:43:40 +00:00
Nirav Dave	c6bfa103a5	[X86][DAG] Avoid creating dangling bitcast. combineExtractWithShuffle may leave a dangling bitcast which may prevent further optimization in later passes. Avoid constructing it unless it is used. llvm-svn: 353333	2019-02-06 19:45:47 +00:00
Nirav Dave	e5c37958f9	[InlineAsm][X86] Add backend support for X86 flag output parameters. Allow custom handling of inline assembly output parameters and add X86 flag parameter support. llvm-svn: 353307	2019-02-06 15:26:29 +00:00
Sanjay Patel	e84fbb67a1	[x86] vectorize cast ops in lowering to avoid register file transfers The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch. There are several potential follow-ups to enhance this: 1. Allow extraction from non-zero element index. 2. Peek through extends of smaller width integers. 3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P Differential Revision: https://reviews.llvm.org/D56864 llvm-svn: 353302	2019-02-06 14:59:39 +00:00
Simon Pilgrim	b0afc69435	[X86][SSE] Disable ZERO_EXTEND shuffle combining rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly. llvm-svn: 353198	2019-02-05 19:15:48 +00:00
Simon Pilgrim	822d2e35e7	[X86][AVX] Attempt to combine shuffles to subvector broadcast load llvm-svn: 353189	2019-02-05 17:02:49 +00:00
Simon Pilgrim	62af24cc93	[X86][SSE] Add SimplifyDemandedVectorElts support for X86ISD::BLENDV llvm-svn: 353165	2019-02-05 12:27:29 +00:00
Simon Pilgrim	9e595e3663	[X86][AVX] Attempt to share broadcasts of different widths (PR39454) If we have broadcasts of different vector widths, keep the longest vector width and extract subvectors for the shorter vectors (which should be free). Differential Revision: https://reviews.llvm.org/D57663 llvm-svn: 353154	2019-02-05 10:58:43 +00:00
Craig Topper	f86eb00f12	[X86] Connect the default fpsr and dirflag clobbers in inline assembly to the registers we have defined for them. Summary: We don't currently map these constraints to physical register numbers so they don't make it to the MachineIR representation of inline assembly. This could have problems for proper dependency tracking in the machine schedulers though I don't have a test case that shows that. Reviewers: rnk Reviewed By: rnk Subscribers: eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57641 llvm-svn: 353141	2019-02-05 06:13:06 +00:00
Simon Pilgrim	6e5350a367	[X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly. Differential Revision: https://reviews.llvm.org/D57667 llvm-svn: 353051	2019-02-04 15:43:36 +00:00
Simon Pilgrim	9899967464	Use auto for dyn_cast case to save a line. NFCI. llvm-svn: 353041	2019-02-04 12:32:39 +00:00
Simon Pilgrim	1fce5a8b75	[X86][AVX] Support shuffle combining for VBROADCAST with smaller vector sources getTargetShuffleMask can only do this safely if we're extracting the lowest subvector from a vector of the same result type. llvm-svn: 352999	2019-02-03 16:51:33 +00:00
Simon Pilgrim	18b73a655b	[X86][AVX] Support shuffle combining for VPMOVZX with smaller vector sources llvm-svn: 352997	2019-02-03 16:10:18 +00:00
Simon Pilgrim	a2a3e5b811	[X86][AVX] More aggressively simplify BROADCAST source operand Aim to use scalar source or lowest 128-bit vector directly. We're still missing some VZMOVL_LOAD combines. llvm-svn: 352994	2019-02-03 14:39:41 +00:00
Craig Topper	950ca192f6	[X86] Lower ISD::UADDO to use the Z flag instead of C flag when the RHS is a constant 1 to encourage INC formation. Summary: Add an additional combine to combineCarryThroughADD to reverse it back to the C flag to avoid regressions. I believe this catches the cases that D57547 got. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57637 llvm-svn: 352984	2019-02-03 07:25:06 +00:00
Simon Pilgrim	dbf302c9f1	[X86][AVX] Enable INSERT_SUBVECTOR(SRC0, SHUFFLE(SRC1)) shuffle combining Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles. The is exposes a couple of minor issues that will be fixed shortly: Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts). llvm-svn: 352963	2019-02-02 18:08:04 +00:00
Simon Pilgrim	bd42f97946	[SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI. We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts....... I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary. llvm-svn: 352961	2019-02-02 17:35:06 +00:00
James Y Knight	14359ef1b6	[opaque pointer types] Pass value type to LoadInst creation. This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911	2019-02-01 20:44:24 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Simon Pilgrim	85184017e9	[X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section. For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts). Differential Revision: https://reviews.llvm.org/D56784 llvm-svn: 352883	2019-02-01 16:02:12 +00:00
Simon Pilgrim	1a529f58f9	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1))) Enable peeking through one use bitcasts to the subvector shuffle. This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns llvm-svn: 352879	2019-02-01 15:31:01 +00:00
James Y Knight	13680223b9	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc doesn't choke on it, hopefully. Original Message: The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352827	2019-02-01 02:28:03 +00:00
James Y Knight	fadf25068e	Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it." This reverts commit `f47d6b38c7` (r352791). Seems to run into compilation failures with GCC (but not clang, where I tested it). Reverting while I investigate. llvm-svn: 352800	2019-01-31 21:51:58 +00:00
James Y Knight	f47d6b38c7	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352791	2019-01-31 20:35:56 +00:00
Simon Pilgrim	00cefe1158	Trim trailing whitespace. NFCI. llvm-svn: 352775	2019-01-31 17:49:25 +00:00
Simon Pilgrim	eb6aef6db3	[X86][AVX] Fold concat(broadcast(x),broadcast(x)) -> broadcast(x) Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352774	2019-01-31 17:48:35 +00:00
Simon Pilgrim	d04a2d2d5e	[X86][AVX] insert_subvector(bitcast(v), bitcast(s), c1) -> bitcast(insert_subvector(v,s,c2)) Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling. Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352773	2019-01-31 17:38:10 +00:00
Simon Pilgrim	63f3383ece	[X86][AVX] Fold broadcast(bitcast(src)) -> bitcast(broadcast(src)) llvm-svn: 352751	2019-01-31 14:04:07 +00:00
Simon Pilgrim	a001008a09	[X86] combineExtractWithShuffle - more aggressively peek through bitcasts Fixes regression introduced by rL352743 llvm-svn: 352745	2019-01-31 11:55:30 +00:00
Simon Pilgrim	b96a2c7fed	[X86][AVX] Enable AVX1 broadcasts in shuffle combining Enables 32/64-bit scalar load broadcasts on AVX1 targets The extractelement-load.ll regression will be fixed shortly in a followup commit. llvm-svn: 352743	2019-01-31 11:41:10 +00:00
Simon Pilgrim	51c2efc104	[X86][AVX] Fold vt1 concat_vectors(vt2 undef, vt2 broadcast(x)) --> vt1 broadcast(x) If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast. Avoids a regression when we enable AVX1 broadcasts in shuffle combining llvm-svn: 352742	2019-01-31 11:15:05 +00:00
Craig Topper	8bdc203d4b	[X86] Remove handling of ISD::INTRINSIC_WO_CHAIN in ReplaceNodeResults. I believe this was there to handle avx512bw intrinsics that returned i64 type in 32-bit mode. But all those intrinsics have since been changed to v64i1 results or replaced with generic IR. llvm-svn: 352698	2019-01-31 00:04:46 +00:00
Simon Pilgrim	317fad5921	[X86][AVX] Prefer to combine shuffle to broadcasts whenever possible This is the first step towards improving broadcast support on AVX1 targets. llvm-svn: 352634	2019-01-30 16:19:19 +00:00
Mikael Holmen	b792627ce9	Fix compiler warning when using clang 3.6.0 Without the fix we get the following (with -Werror): ../lib/Target/X86/X86ISelLowering.cpp:14181:58: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] SmallVector<std::array<int, 2>, 2> LaneSrcs(NumLanes, {-1, -1}); ^~~~~~ { } 1 error generated. llvm-svn: 352455	2019-01-29 06:51:28 +00:00
Craig Topper	390ac61b93	Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This did not cause the buildbot failure it was previously reverted for. Original commit message: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. llvm-svn: 352433	2019-01-28 21:38:47 +00:00
Simon Pilgrim	2c17512456	[X86][AVX] Remove lowerShuffleByMerging128BitLanes 2-lane restriction First step towards adding support for 64-bit unary "sublane" handling (a bit like lowerShuffleAsRepeatedMaskAndLanePermute). This allows us to add lowerV64I8Shuffle handling. llvm-svn: 352389	2019-01-28 17:02:35 +00:00
Sanjay Patel	94cca60b82	[x86] allow more shuffle splitting to avoid vpermps (PR40434) This is tricky to make optimal: sometimes we're better off using a single wider op, but other times it makes more sense to combine a narrow ops to achieve the same result. This solves the case from: https://bugs.llvm.org/show_bug.cgi?id=40434 There's potentially a similar change for vectors with 64-bit elements, but it needs adjustments similar to rL352333 to avoid creating infinite loops. llvm-svn: 352380	2019-01-28 15:51:34 +00:00
Craig Topper	453150bc18	[X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument. Remove and autoupgrade the old intrinsics llvm-svn: 352343	2019-01-28 07:03:03 +00:00
Sanjay Patel	ebe6b43aec	[x86] add restriction for lowering to vpermps This transform was added with rL351346, and we had an escape for shufps, but we also want one for unpckps vs. vpermps because vpermps doesn't take an immediate shuffle index operand. llvm-svn: 352333	2019-01-27 21:53:33 +00:00
Simon Pilgrim	670a6971f8	[X86][SSE] Add UNDEF handling to combineSelect ISD::USUBSAT matching (PR40083) llvm-svn: 352330	2019-01-27 21:01:23 +00:00
Simon Pilgrim	f10b6623cc	[X86][SSE] Permit UNDEFs in combineAddToSUBUS matching (PR40083) llvm-svn: 352328	2019-01-27 20:36:37 +00:00
Sanjay Patel	5f1fdaa192	[x86] refactor logic in lowerShuffleWithUndefHalf Although this is longer code, this is no-functional-change-intended. The goal is to untangle the conditions under which we bail out, so that's easier to adjust. llvm-svn: 352320	2019-01-27 18:12:03 +00:00
Simon Pilgrim	a914fa4dd8	[X86] combineAddOrSubToADCOrSBB/combineCarryThroughADD - use oneuse for entire SDNode Fix issue noted in D57281 that only tested the one use for the SDValue (the result flag), not the entire SUB. I've added the getNode() to make it clearer what is intended than just the -> redirection. llvm-svn: 352291	2019-01-26 21:29:16 +00:00
Simon Pilgrim	37a8e65a60	[X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545) As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction. Differential Revision: https://reviews.llvm.org/D57281 llvm-svn: 352289	2019-01-26 20:23:04 +00:00
Simon Pilgrim	b7a15acd38	[X86] Fold X86ISD::SBB(ISD::SUB(X,Y),0) -> X86ISD::SBB(X,Y) (PR25858) We often generate X86ISD::SBB(X, 0) for carry flag arithmetic. I had tried to create test cases for the ADC equivalent (which often uses the same pattern) but haven't managed to find anything yet. Differential Revision: https://reviews.llvm.org/D57169 llvm-svn: 352288	2019-01-26 20:13:44 +00:00
Simon Pilgrim	6162fba57c	[X86][SSE] Generalized unsigned compares to support nonsplat constant vectors (PR39859) llvm-svn: 352283	2019-01-26 16:40:03 +00:00
Sanjay Patel	a03c63b77f	[x86] add helper for creating a half-width shuffle; NFC This reduces a bit of duplication between the combining and lowering places that use it, but the primary motivation is to make it easier to rearrange the lowering logic and solve PR40434: https://bugs.llvm.org/show_bug.cgi?id=40434 llvm-svn: 352280	2019-01-26 16:20:22 +00:00
Craig Topper	58e6b37e62	Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This might be breaking an lldb windows buildbot. llvm-svn: 352268	2019-01-26 02:44:58 +00:00
Craig Topper	7a8e74775c	[X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest. Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56971 llvm-svn: 352260	2019-01-26 01:17:09 +00:00
Craig Topper	b1d3457c03	[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer Summary: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization. On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57186 llvm-svn: 352255	2019-01-26 00:26:37 +00:00
Craig Topper	4cf28bad5b	[X86] Combine masked store and truncate into masked truncating stores. We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch. This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate. Differential Revision: https://reviews.llvm.org/D57218 llvm-svn: 352230	2019-01-25 18:37:36 +00:00
Sanjay Patel	0020f8bb23	[x86] simplify logic in lowerShuffleWithUndefHalf(); NFCI This seems unnecessarily complicated because we gave names to opposite polarity bools and have code comments that don't really line up with the logic. Step 1: remove UndefUpper and assert that it is the opposite of UndefLower after the initial early exit. llvm-svn: 352217	2019-01-25 17:00:41 +00:00
Simon Pilgrim	f56298f4b9	[X86] Simplify X86ISD::ADD/SUB if we don't use the result flag Simplify to the generic ISD::ADD/SUB if we don't make use of the result flag. This mainly helps with ADDCARRY/SUBBORROW intrinsics which get expanded to X86ISD::ADD/SUB but could be simplified further. Noticed in some of the test cases in PR31754 Differential Revision: https://reviews.llvm.org/D57234 llvm-svn: 352210	2019-01-25 15:58:28 +00:00
Sanjay Patel	21aa6ddc14	[x86] narrow a shuffle that doesn't use or set any high elements This isn't the final fix for our reduction/horizontal codegen, but it takes care of a lot of the problems. After we narrow the shuffle, existing combines for insert/extract and binops kick in, and we end up with cheaper 128-bit ops. The avg and mul reduction tests show an existing shuffle lowering hole for AVX2/AVX512. I think in its most minimal form this is: https://bugs.llvm.org/show_bug.cgi?id=40434 ...but we might need multiple fixes to get it right. Differential Revision: https://reviews.llvm.org/D57156 llvm-svn: 352209	2019-01-25 15:37:42 +00:00
Sanjay Patel	4c304b2923	[x86] move half-size shuffle mask creation to helper; NFC As noted in D57156, we want to check at least part of this pattern earlier (in combining), so this will allow the code to be shared instead of duplicated. llvm-svn: 352127	2019-01-24 23:12:36 +00:00
Sanjay Patel	e524639d72	[x86] rename VectorShuffle -> Shuffle; NFC This wasn't consistent within the file, so made it harder to search. Standardize on the shorter name to save some typing. llvm-svn: 352077	2019-01-24 18:52:12 +00:00
Sanjay Patel	e5a0bcf7b8	[x86] add low/high undef half shuffle mask helpers; NFC This is the most common usage for isUndefInRange, so make the code slightly less duplicated and more readable. llvm-svn: 352063	2019-01-24 17:05:02 +00:00
Matt Arsenault	a5840c3c39	Codegen support for atomicrmw fadd/fsub llvm-svn: 351851	2019-01-22 18:36:06 +00:00
Simon Pilgrim	933673d878	[X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y)) For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations. This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns. Differential Revision: https://reviews.llvm.org/D55935 llvm-svn: 351819	2019-01-22 13:44:49 +00:00
Craig Topper	bcbdf61078	[X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit cvtpd2ps intrinsics. Summary: Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR. In the process I found a couple duplicate isel patterns which I also deleted in this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56991 llvm-svn: 351762	2019-01-21 20:14:09 +00:00
Craig Topper	c2087d8f3f	[X86] Change avx512 COMPRESS and EXPAND lowering to use a single masked node instead of expand/compress+select. Summary: For compress, a select node doesn't semantically reflect the behavior of the instruction. The mask would have holes in it, but the resulting write is to contiguous elements at the bottom of the vector. Furthermore, as far as the compressing and expanding is concerned the behavior is depended on the mask. You can't just have an expand/compress node that only reads the input vector. That node would have no meaning by itself. This all only works because we pattern match the compress/expand+select back to the instruction. But conceivably an optimization of the select could break the pattern and leave something meaningless. This patch modifies the expand and compress node to take the mask and passthru as additional inputs and gets rid of the select all together. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57002 llvm-svn: 351761	2019-01-21 20:02:28 +00:00
Craig Topper	4aa74fff1f	[X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps cvtuqq2ps nodes that produce less than 128-bits of results. These nodes zero the upper half of the result and can't be represented with vselect. llvm-svn: 351666	2019-01-19 21:26:20 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Reid Kleckner	38f9900aa5	[X86] Deduplicate static calling convention helpers for code size, NFC Summary: Right now we include ${TGT}GenCallingConv.inc once per each instruction selection method implemented by ${TGT}: - ${TGT}ISelLowering.cpp - ${TGT}CallLowering.cpp - ${TGT}FastISel.cpp Instead, add a mechanism to tablegen for marking a particular convention as "External", which causes tablegen to emit into the ::llvm namespace, instead of as a static helper. This allows us to provide a header to forward declare it, so we can simply call the function from all the places it is referenced. Typically the calling convention analyzer is called indirectly, so it doesn't benefit from inlining. This saves a bit of final binary size, but mostly just saves object file size: before after diff artifact 12852K 12492K -360K X86ISelLowering.cpp.obj 4640K 4280K -360K X86FastISel.cpp.obj 1704K 2092K +388K X86CallingConv.cpp.obj 52448K 52336K -112K llc.exe I didn't collect before numbers for X86CallLowering.cpp.obj, which is for GlobalISel, but we should save 360K there as well. This patch applies the strategy to the X86 backend, but there is no reason it couldn't be applied to the other backends that implement multiple ISel strategies, like AArch64. Reviewers: craig.topper, hfinkel, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56883 llvm-svn: 351616	2019-01-19 00:33:02 +00:00
Craig Topper	08d3d32ead	[X86] Lower avx512f scatter intrinsics to X86MaskedScatterSDNode instead of going directly to MachineSDNode. This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field. llvm-svn: 351583	2019-01-18 20:14:46 +00:00
Craig Topper	b9d4461f9f	[X86] Lower avx2/avx512f gather intrinsics to X86MaskedGatherSDNode instead of going directly to MachineSDNode.: This sends these intrinsics through isel in a much more normal way. This should allow addressing mode matching in isel to make better use of the displacement field. Differential Revision: https://reviews.llvm.org/D56827 llvm-svn: 351570	2019-01-18 18:22:26 +00:00
Sanjay Patel	b6c91a1a59	[x86] simplify code for SDValue.getOperand(); NFC llvm-svn: 351557	2019-01-18 15:55:21 +00:00
Craig Topper	59abdf5f3f	[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381	2019-01-16 21:46:32 +00:00
Craig Topper	5ea3120718	[X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv just before isel table lookup. Remove vselect isel patterns. This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition. I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV. Differential Revision: https://reviews.llvm.org/D56771 llvm-svn: 351380	2019-01-16 21:46:28 +00:00
Craig Topper	e5b7cc8aa0	[X86] Add a one use check to the setcc inversion code in combineVSelectWithAllOnesOrZeros If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc. Differential Revision: https://reviews.llvm.org/D56765 llvm-svn: 351378	2019-01-16 21:29:29 +00:00
Simon Pilgrim	5a2bbe267a	[X86] getFauxShuffleMask - bail for non-byte aligned shuffle types Remove the existing assertion and just return false for unexpected shuffle value types (<X x i1> mainly....). Found while updating combineX86ShufflesRecursively to run within SimplifyDemandedVectorElts/SimplifyDemandedBits. llvm-svn: 351365	2019-01-16 18:15:31 +00:00
Simon Pilgrim	524daea429	[X86] Add combineX86ShufflesRecursively helper. NFCI. combineX86ShufflesRecursively is pretty cumbersome with a lot of arguments that only matter later in recursion. This commit adds a wrapper version that only takes the initial root Op to simplify calls that don't need to worry about these. An early, cleanup step towards merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts/SimplifyDemandedBits. llvm-svn: 351352	2019-01-16 16:01:42 +00:00
Sanjay Patel	0dbecd05ed	[x86] lower shuffle of extracts to AVX2 vperm instructions I was trying to prevent shuffle regressions while matching more horizontal ops and ended up here: shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0 The affected tests were added for: https://bugs.llvm.org/show_bug.cgi?id=34380 This patch won't change the examples in the bug report itself, but we should be able to extend this to catch more types. Differential Revision: https://reviews.llvm.org/D56756 llvm-svn: 351346	2019-01-16 14:15:18 +00:00
Mandeep Singh Grang	436735c3fe	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp Summary: Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc. Refer D53541 for the context. Clang counterpart D56748. Reviewers: rnk, efriedma Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56747 llvm-svn: 351281	2019-01-16 00:37:13 +00:00
Craig Topper	0e420e6a62	[X86] Rename SHRUNKBLEND ISD node to BLENDV. That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use. llvm-svn: 351278	2019-01-16 00:20:30 +00:00
Craig Topper	34ac509ac8	[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer. We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways. llvm-svn: 351275	2019-01-15 23:36:25 +00:00
Craig Topper	82015b633b	[X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen. I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics. I'll work on scatter and gatherpf/scatterpf next. Differential Revision: https://reviews.llvm.org/D56527 llvm-svn: 351234	2019-01-15 20:12:33 +00:00
Nirav Dave	dbc41bac0c	[X86] Fix register class for assembly constraints to ST(7). NFCI. Modify getRegForInlineAsmConstraint to return special singleton register class when a constraint references ST(7) not RFP80 for which ST(7) is not a member. llvm-svn: 351206	2019-01-15 17:09:14 +00:00
Simon Pilgrim	b8f08c8d7b	[X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306) If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements. We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets. llvm-svn: 351203	2019-01-15 16:56:55 +00:00
Benjamin Kramer	2fc8ede082	[X86] Fix unused variable warning in Release builds. NFC. llvm-svn: 351136	2019-01-14 23:29:54 +00:00
Craig Topper	9906f77f82	[X86] Silence a -Wparentheses warning on gcc. NFC llvm-svn: 351111	2019-01-14 19:44:02 +00:00
Simon Pilgrim	bfe2ee453a	[X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306) If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements. llvm-svn: 351103	2019-01-14 19:07:26 +00:00
Sanjay Patel	b23ff7a0e2	[x86] lower extracted add/sub to horizontal vector math add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0 This is the integer sibling to D56011. There's an additional restriction to only to do this transform in the case where we don't have extra extracts from the source vector. Without that, we can fail to match larger horizontal patterns that are more beneficial than this minimal case. An improvement to the more general h-op lowering may allow us to remove the restriction here in a follow-up. llvm-svn: 351093	2019-01-14 18:44:02 +00:00
Craig Topper	c8cd85588b	[X86] Remove unused intrinsic handlers. NFC llvm-svn: 351032	2019-01-14 01:56:59 +00:00
Craig Topper	075fcc1151	[X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC llvm-svn: 351031	2019-01-14 01:44:09 +00:00
Craig Topper	3f3b8ef442	[X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028	2019-01-14 00:03:50 +00:00
Craig Topper	31156bbdb9	[X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018	2019-01-13 02:59:59 +00:00
Craig Topper	4561edbec0	[X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017	2019-01-13 02:59:57 +00:00
Simon Pilgrim	a0069ba0db	[X86] More aggressive shuffle mask widening in combineExtractWithShuffle Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013	2019-01-12 16:38:56 +00:00
Simon Pilgrim	a21e2bd682	[X86] Improve vXi64 ISD::ABS codegen with SSE41+ Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999	2019-01-12 10:28:12 +00:00
Simon Pilgrim	ca0de0363b	[X86][AARCH64] Improve ISD::ABS support This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998	2019-01-12 09:59:32 +00:00
Craig Topper	90fe6edcba	[X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering. llvm-svn: 350995	2019-01-12 08:15:54 +00:00
Craig Topper	33b2cf50e3	[X86] Add ISD node for masked version of CVTPS2PH. The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994	2019-01-12 08:05:12 +00:00
Craig Topper	a69d903204	[X86] Remove unnecessary code from getMaskNode. We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them. llvm-svn: 350991	2019-01-12 06:13:44 +00:00
Craig Topper	bf61525e8c	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989	2019-01-12 02:22:10 +00:00
Craig Topper	abe6ef8d09	[X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985	2019-01-12 00:55:27 +00:00
Sanjay Patel	40cd4b77e9	[x86] allow insert/extract when matching horizontal ops Previously, we limited this transform to cases where the extraction into the build vector happens from vectors of the same type as the build vector, but that's not required. There's a slight potential regression seen in the AVX512 result for phadd -- we're using the 256-bit flavor of the instruction now even though the 128-bit subset is sufficient. The same problem could already be seen in the AVX2 result. Follow-up patches will attempt to narrow that back down. llvm-svn: 350928	2019-01-11 14:27:59 +00:00
Craig Topper	b97885cc2e	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918	2019-01-11 05:44:56 +00:00
Craig Topper	844f989608	[X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created. This should help some of the regressions from D56387 Differential Revision: https://reviews.llvm.org/D56421 llvm-svn: 350875	2019-01-10 19:05:34 +00:00
Craig Topper	350e6e9d7c	[X86] Simplify the BRCOND handling for FCMP_UNE. Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch. As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all. Differential Revision: https://reviews.llvm.org/D56309 llvm-svn: 350873	2019-01-10 19:02:14 +00:00
Sanjay Patel	87ae1460f7	[x86] fix remaining miscompile bug in horizontal binop matching (PR40243) When we use the partial-matching function on a 128-bit chunk, we must account for the possibility that we've matched undef halves of the original source vectors, so the outputs may need to be reset. This should allow closing PR40243: https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350830	2019-01-10 15:27:23 +00:00
Sanjay Patel	ed5cfc6792	[x86] fix horizontal binop matching for 256-bit vectors (PR40243) This is a partial fix for: https://bugs.llvm.org/show_bug.cgi?id=40243 ...as seen in the integer test, we still need to correct the result when using the existing (old) horizontal op matching function because it does not model the way x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). A potential follow-up change for that is discussed in the bug report - see also D56490. This generally duplicates a lot of the existing matching code, but we can't just remove that without introducing regressions, so the existing code is renamed and used less often. Follow-ups may try to reduce that overlap. Differential Revision: https://reviews.llvm.org/D56450 llvm-svn: 350826	2019-01-10 15:04:52 +00:00
Craig Topper	c38c9c120f	[X86] After turning VSELECT into SHRUNKBLEND, make we push the VSELECT into the worklist so it can be deleted. Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user. The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do. llvm-svn: 350781	2019-01-10 00:14:27 +00:00
Simon Pilgrim	5a7132ff0f	[X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors llvm-svn: 350716	2019-01-09 13:23:28 +00:00
Craig Topper	2fa8e2d8a8	[X86] Correct the MaskVT for avx512 gather/scatter intrinsics to use the min of the number of index and data elements. When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made. This is consistent with the isel patterns used for the target independent gather/scatter intrinsic. llvm-svn: 350687	2019-01-09 04:21:12 +00:00
Craig Topper	6ffeeb705f	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions. Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498	2019-01-06 18:10:18 +00:00
Craig Topper	d0ba531a0c	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL. llvm-svn: 350481	2019-01-05 22:42:58 +00:00
Craig Topper	46f8b4a11e	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480	2019-01-05 21:40:07 +00:00
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Nikita Popov	c35b4a37ba	[X86] Fix warning; NFC llvm-svn: 350437	2019-01-04 21:41:35 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	d4db122483	[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO. These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205	2019-01-02 05:46:03 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Sanjay Patel	738a863648	[x86] move/rename helper for horizontal op codegen; NFC Preliminary commit as suggested in D56011. llvm-svn: 350193	2019-01-01 16:08:36 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Craig Topper	a8f07e51f9	[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082	2018-12-27 01:50:40 +00:00
Craig Topper	4f1ef9fc0f	[X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it. LowerAndToBT now returns both the BT node and the X86 specific condition code separately. llvm-svn: 350081	2018-12-27 01:50:38 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
Craig Topper	0adc3fe9e7	[X86] Remove unused variables left after r350041. NFC llvm-svn: 350043	2018-12-24 05:45:45 +00:00
Craig Topper	d8217b23ff	[X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into SRL/SHL+TEST to X86ISelDAGToDAG. This cleans more code out of EmitTest. llvm-svn: 350041	2018-12-24 05:27:13 +00:00
Craig Topper	e8c50fc6af	[X86] Remove the ANDN check from EmitTest. Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND. This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare. llvm-svn: 350038	2018-12-24 01:10:13 +00:00
Craig Topper	006bac6880	[X86] Return false from hasAndNotCompare if the comparision value is a constant. We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018	2018-12-23 05:52:55 +00:00
Craig Topper	3cc92a28ce	[X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013	2018-12-23 01:54:43 +00:00
Sanjay Patel	80187b8a17	[x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937	2018-12-21 18:48:32 +00:00
Simon Pilgrim	57733507fe	[X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902	2018-12-21 14:25:14 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Craig Topper	9ca2f5605e	[X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this. llvm-svn: 349714	2018-12-20 01:32:06 +00:00
Craig Topper	217b3b20d8	[X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly. llvm-svn: 349695	2018-12-19 23:13:03 +00:00
Craig Topper	d16da2b479	[X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something. llvm-svn: 349671	2018-12-19 19:39:34 +00:00
Craig Topper	8434ef7d1e	[X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526	2018-12-18 19:29:08 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Simon Pilgrim	8488a44c34	[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462	2018-12-18 09:11:34 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Simon Pilgrim	6b5e0b7b2b	[X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400	2018-12-17 21:36:17 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Craig Topper	fa4907d671	[X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3. llvm-svn: 349330	2018-12-17 06:40:35 +00:00
Simon Pilgrim	d0c9e43b1c	[X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319	2018-12-16 19:46:04 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Simon Pilgrim	52c982406e	[X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts. llvm-svn: 349286	2018-12-15 21:11:49 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	b0b2f1503a	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052	2018-12-13 15:50:31 +00:00
Simon Pilgrim	ba91ff4a86	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243) llvm-svn: 349047	2018-12-13 15:23:09 +00:00
Simon Pilgrim	7c84f7ae3a	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if(). llvm-svn: 349040	2018-12-13 14:51:28 +00:00
Simon Pilgrim	320fd7383f	[X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034	2018-12-13 13:44:33 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Craig Topper	a048d58de7	[X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC llvm-svn: 349007	2018-12-13 06:14:25 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00
Sanjay Patel	134f56e702	[x86] fix formatting; NFC This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context. llvm-svn: 348776	2018-12-10 17:23:44 +00:00
Craig Topper	2b09d17d93	[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727	2018-12-09 18:02:37 +00:00
Craig Topper	2c7a9476e0	[X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision: https://reviews.llvm.org/D55355 llvm-svn: 348536	2018-12-06 22:26:59 +00:00
Simon Pilgrim	bb650daeaf	[X86] Refactored IsSplatVector to use switch. NFCI. Initial step towards making the function more generic (and probably move into SelectionDAG). This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate). llvm-svn: 348498	2018-12-06 16:29:14 +00:00
Craig Topper	6a6d77b851	[X86] Remove some leftover code for handling an i1 setcc type. NFC We should only need to handle i8 now. llvm-svn: 348460	2018-12-06 07:00:02 +00:00
Simon Pilgrim	32483668d7	[X86][SSE] Begun adding modulo rotate support to LowerRotate Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions). I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions. llvm-svn: 348366	2018-12-05 14:46:37 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Simon Pilgrim	07843640d5	[X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need. llvm-svn: 348282	2018-12-04 16:52:32 +00:00
Simon Pilgrim	b1d6db7693	[X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call. The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it. llvm-svn: 348253	2018-12-04 12:21:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Craig Topper	35585aff34	[X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG. We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have. llvm-svn: 348237	2018-12-04 04:51:07 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	5440b63fa8	[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55165 llvm-svn: 348159	2018-12-03 18:26:27 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Craig Topper	959b415e2f	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. llvm-svn: 348104	2018-12-02 19:47:14 +00:00
Craig Topper	6f54ff57fd	[X86] Fix bad comment. NFC llvm-svn: 348103	2018-12-02 19:47:13 +00:00

... 2 3 4 5 6 ...

6149 Commits