llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	c8cd85588b	[X86] Remove unused intrinsic handlers. NFC llvm-svn: 351032	2019-01-14 01:56:59 +00:00
Craig Topper	075fcc1151	[X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC llvm-svn: 351031	2019-01-14 01:44:09 +00:00
Craig Topper	3f3b8ef442	[X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028	2019-01-14 00:03:50 +00:00
Craig Topper	31156bbdb9	[X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018	2019-01-13 02:59:59 +00:00
Craig Topper	4561edbec0	[X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017	2019-01-13 02:59:57 +00:00
Simon Pilgrim	a0069ba0db	[X86] More aggressive shuffle mask widening in combineExtractWithShuffle Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013	2019-01-12 16:38:56 +00:00
Simon Pilgrim	a21e2bd682	[X86] Improve vXi64 ISD::ABS codegen with SSE41+ Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999	2019-01-12 10:28:12 +00:00
Simon Pilgrim	ca0de0363b	[X86][AARCH64] Improve ISD::ABS support This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998	2019-01-12 09:59:32 +00:00
Craig Topper	90fe6edcba	[X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering. llvm-svn: 350995	2019-01-12 08:15:54 +00:00
Craig Topper	33b2cf50e3	[X86] Add ISD node for masked version of CVTPS2PH. The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994	2019-01-12 08:05:12 +00:00
Craig Topper	a69d903204	[X86] Remove unnecessary code from getMaskNode. We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them. llvm-svn: 350991	2019-01-12 06:13:44 +00:00
Craig Topper	bf61525e8c	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989	2019-01-12 02:22:10 +00:00
Craig Topper	abe6ef8d09	[X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985	2019-01-12 00:55:27 +00:00
Sanjay Patel	40cd4b77e9	[x86] allow insert/extract when matching horizontal ops Previously, we limited this transform to cases where the extraction into the build vector happens from vectors of the same type as the build vector, but that's not required. There's a slight potential regression seen in the AVX512 result for phadd -- we're using the 256-bit flavor of the instruction now even though the 128-bit subset is sufficient. The same problem could already be seen in the AVX2 result. Follow-up patches will attempt to narrow that back down. llvm-svn: 350928	2019-01-11 14:27:59 +00:00
Craig Topper	b97885cc2e	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918	2019-01-11 05:44:56 +00:00
Craig Topper	844f989608	[X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created. This should help some of the regressions from D56387 Differential Revision: https://reviews.llvm.org/D56421 llvm-svn: 350875	2019-01-10 19:05:34 +00:00
Craig Topper	350e6e9d7c	[X86] Simplify the BRCOND handling for FCMP_UNE. Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch. As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all. Differential Revision: https://reviews.llvm.org/D56309 llvm-svn: 350873	2019-01-10 19:02:14 +00:00
Sanjay Patel	87ae1460f7	[x86] fix remaining miscompile bug in horizontal binop matching (PR40243) When we use the partial-matching function on a 128-bit chunk, we must account for the possibility that we've matched undef halves of the original source vectors, so the outputs may need to be reset. This should allow closing PR40243: https://bugs.llvm.org/show_bug.cgi?id=40243 llvm-svn: 350830	2019-01-10 15:27:23 +00:00
Sanjay Patel	ed5cfc6792	[x86] fix horizontal binop matching for 256-bit vectors (PR40243) This is a partial fix for: https://bugs.llvm.org/show_bug.cgi?id=40243 ...as seen in the integer test, we still need to correct the result when using the existing (old) horizontal op matching function because it does not model the way x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). A potential follow-up change for that is discussed in the bug report - see also D56490. This generally duplicates a lot of the existing matching code, but we can't just remove that without introducing regressions, so the existing code is renamed and used less often. Follow-ups may try to reduce that overlap. Differential Revision: https://reviews.llvm.org/D56450 llvm-svn: 350826	2019-01-10 15:04:52 +00:00
Craig Topper	c38c9c120f	[X86] After turning VSELECT into SHRUNKBLEND, make we push the VSELECT into the worklist so it can be deleted. Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user. The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do. llvm-svn: 350781	2019-01-10 00:14:27 +00:00
Simon Pilgrim	5a7132ff0f	[X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors llvm-svn: 350716	2019-01-09 13:23:28 +00:00
Craig Topper	2fa8e2d8a8	[X86] Correct the MaskVT for avx512 gather/scatter intrinsics to use the min of the number of index and data elements. When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made. This is consistent with the isel patterns used for the target independent gather/scatter intrinsic. llvm-svn: 350687	2019-01-09 04:21:12 +00:00
Craig Topper	6ffeeb705f	[X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions. Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56361 llvm-svn: 350498	2019-01-06 18:10:18 +00:00
Craig Topper	d0ba531a0c	[X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL. llvm-svn: 350481	2019-01-05 22:42:58 +00:00
Craig Topper	46f8b4a11e	[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8. This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both. llvm-svn: 350480	2019-01-05 21:40:07 +00:00
Craig Topper	3f48dbf72e	[X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available. llvm-svn: 350473	2019-01-05 18:48:11 +00:00
Nikita Popov	c35b4a37ba	[X86] Fix warning; NFC llvm-svn: 350437	2019-01-04 21:41:35 +00:00
Sanjay Patel	6153565511	[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try The 1st try for this was at rL350369, but it caused IR-level diffs because our cost models differentiate custom vs. legal/promote lowering. So that was reverted at rL350373. The cost models were fixed independently at rL350403, so this is effectively the same patch as last time. Original commit message: This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350421	2019-01-04 17:48:13 +00:00
Simon Pilgrim	9f4dea8c06	[X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine Repeat of the generic SimplifyDemandedBits shift combine llvm-svn: 350399	2019-01-04 15:43:43 +00:00
Sanjay Patel	26ce9c38a7	revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math There are non-codegen tests that need to be updated with this code change. llvm-svn: 350373	2019-01-04 00:02:02 +00:00
Sanjay Patel	ef4afca2ad	[x86] lower extracted fadd/fsub to horizontal vector math This would show up if we fix horizontal reductions to narrow as they go along, but it's an improvement for size and/or Jaguar (fast-hops) independent of that. We need to do this late to not interfere with other pattern matching of larger horizontal sequences. We can extend this to integer ops in a follow-up patch. Differential Revision: https://reviews.llvm.org/D56011 llvm-svn: 350369	2019-01-03 23:16:19 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Craig Topper	9d4860ec4e	[X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245	2019-01-02 19:01:05 +00:00
Simon Pilgrim	d8125726d5	[X86] Support SHLD/SHRD masked shift-counts (PR34641) Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222	2019-01-02 17:05:37 +00:00
Craig Topper	f7cc7e3201	[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206	2019-01-02 06:40:11 +00:00
Craig Topper	d4db122483	[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO. These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205	2019-01-02 05:46:03 +00:00
Craig Topper	00b390a000	[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198	2019-01-01 19:34:11 +00:00
Sanjay Patel	738a863648	[x86] move/rename helper for horizontal op codegen; NFC Preliminary commit as suggested in D56011. llvm-svn: 350193	2019-01-01 16:08:36 +00:00
Craig Topper	bb0873cf46	[X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode. Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178	2018-12-31 19:09:27 +00:00
Craig Topper	a32e353afa	[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1. This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159	2018-12-30 03:05:07 +00:00
Craig Topper	f237ce159e	[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158	2018-12-30 02:30:34 +00:00
Craig Topper	0a6cec6f9f	[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141	2018-12-29 01:17:11 +00:00
Craig Topper	f814d28eb3	[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134	2018-12-28 19:19:39 +00:00
Craig Topper	787ad92bf6	[X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084	2018-12-27 03:37:04 +00:00
Craig Topper	a8f07e51f9	[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082	2018-12-27 01:50:40 +00:00
Craig Topper	4f1ef9fc0f	[X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it. LowerAndToBT now returns both the BT node and the X86 specific condition code separately. llvm-svn: 350081	2018-12-27 01:50:38 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
Craig Topper	0adc3fe9e7	[X86] Remove unused variables left after r350041. NFC llvm-svn: 350043	2018-12-24 05:45:45 +00:00
Craig Topper	d8217b23ff	[X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into SRL/SHL+TEST to X86ISelDAGToDAG. This cleans more code out of EmitTest. llvm-svn: 350041	2018-12-24 05:27:13 +00:00
Craig Topper	e8c50fc6af	[X86] Remove the ANDN check from EmitTest. Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND. This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare. llvm-svn: 350038	2018-12-24 01:10:13 +00:00
Craig Topper	006bac6880	[X86] Return false from hasAndNotCompare if the comparision value is a constant. We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets. llvm-svn: 350018	2018-12-23 05:52:55 +00:00
Craig Topper	3cc92a28ce	[X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2. llvm-svn: 350013	2018-12-23 01:54:43 +00:00
Sanjay Patel	80187b8a17	[x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937	2018-12-21 18:48:32 +00:00
Simon Pilgrim	57733507fe	[X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902	2018-12-21 14:25:14 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Craig Topper	9ca2f5605e	[X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this. llvm-svn: 349714	2018-12-20 01:32:06 +00:00
Craig Topper	217b3b20d8	[X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly. llvm-svn: 349695	2018-12-19 23:13:03 +00:00
Craig Topper	d16da2b479	[X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something. llvm-svn: 349671	2018-12-19 19:39:34 +00:00
Craig Topper	8434ef7d1e	[X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526	2018-12-18 19:29:08 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Simon Pilgrim	8488a44c34	[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462	2018-12-18 09:11:34 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Simon Pilgrim	6b5e0b7b2b	[X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400	2018-12-17 21:36:17 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Craig Topper	fa4907d671	[X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3. llvm-svn: 349330	2018-12-17 06:40:35 +00:00
Simon Pilgrim	d0c9e43b1c	[X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319	2018-12-16 19:46:04 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Simon Pilgrim	52c982406e	[X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts. llvm-svn: 349286	2018-12-15 21:11:49 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	b0b2f1503a	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052	2018-12-13 15:50:31 +00:00
Simon Pilgrim	ba91ff4a86	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243) llvm-svn: 349047	2018-12-13 15:23:09 +00:00
Simon Pilgrim	7c84f7ae3a	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if(). llvm-svn: 349040	2018-12-13 14:51:28 +00:00
Simon Pilgrim	320fd7383f	[X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034	2018-12-13 13:44:33 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Craig Topper	a048d58de7	[X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC llvm-svn: 349007	2018-12-13 06:14:25 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00
Sanjay Patel	134f56e702	[x86] fix formatting; NFC This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context. llvm-svn: 348776	2018-12-10 17:23:44 +00:00
Craig Topper	2b09d17d93	[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727	2018-12-09 18:02:37 +00:00
Craig Topper	2c7a9476e0	[X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision: https://reviews.llvm.org/D55355 llvm-svn: 348536	2018-12-06 22:26:59 +00:00
Simon Pilgrim	bb650daeaf	[X86] Refactored IsSplatVector to use switch. NFCI. Initial step towards making the function more generic (and probably move into SelectionDAG). This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate). llvm-svn: 348498	2018-12-06 16:29:14 +00:00
Craig Topper	6a6d77b851	[X86] Remove some leftover code for handling an i1 setcc type. NFC We should only need to handle i8 now. llvm-svn: 348460	2018-12-06 07:00:02 +00:00
Simon Pilgrim	32483668d7	[X86][SSE] Begun adding modulo rotate support to LowerRotate Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions). I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions. llvm-svn: 348366	2018-12-05 14:46:37 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Simon Pilgrim	07843640d5	[X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need. llvm-svn: 348282	2018-12-04 16:52:32 +00:00
Simon Pilgrim	b1d6db7693	[X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call. The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it. llvm-svn: 348253	2018-12-04 12:21:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00

1 2 3 4 5 ...

5955 Commits