llvm-project

Commit Graph

Author	SHA1	Message	Date
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Craig Topper	089082378f	[X86] Add DAG combine to remove sext i32->i64 from gather/scatter instructions. Only do this pre-legalize in case we're using the sign extend to legalize for KNL. This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends. There's more work that could be done here particularly to fix the i8->i64 test case that experienced split. llvm-svn: 318468	2017-11-16 23:09:06 +00:00
Craig Topper	e85ff4f732	[X86] Pre-truncate gather/scatter indices that have element sizes larger than 64-bits before Legalize. The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization. This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers. llvm-svn: 318452	2017-11-16 20:23:22 +00:00
Craig Topper	04be793cec	[X86] DAGCombinerInfo is in TargetLowering not X86TargetLowering. llvm-svn: 318451	2017-11-16 20:23:17 +00:00
Craig Topper	e6601fd30e	[X86] Custom type legalize v2f32 masked gathers instead of trying to cleanup after type legalization. llvm-svn: 318368	2017-11-16 02:07:45 +00:00
Craig Topper	54b57b0dd8	[X86] Add a return to the end of a switch to prevent an accidental fallthrough in the future. llvm-svn: 318330	2017-11-15 20:42:47 +00:00
Craig Topper	16a91cee6c	[X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 mask instead of a VK4 mask. This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction. While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking. llvm-svn: 318278	2017-11-15 07:46:43 +00:00
Craig Topper	23493f3777	[X86] Attempt to fix signed and unsigned comparison warning. llvm-svn: 318010	2017-11-13 02:19:13 +00:00
Craig Topper	63157c4784	[X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008	2017-11-13 02:03:00 +00:00
Craig Topper	0af48f1ad4	[X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007	2017-11-13 02:02:58 +00:00
Craig Topper	b42a23ff8f	[X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics. This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999	2017-11-12 18:51:09 +00:00
Craig Topper	1382932c12	[X86] Remove some no longer needed intrinsic lowering code. llvm-svn: 317997	2017-11-12 18:51:06 +00:00
Simon Pilgrim	294b87b432	[X86] Attempt to match multiple binary reduction ops at once. NFCI matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729). This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time. Differential Revision: https://reviews.llvm.org/D39726 llvm-svn: 317985	2017-11-11 18:16:55 +00:00
Craig Topper	1a0da2db5f	[X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C) Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG. llvm-svn: 317878	2017-11-10 08:22:37 +00:00
Craig Topper	93e27d2ecc	[X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine. r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes. llvm-svn: 317748	2017-11-09 01:06:47 +00:00
Craig Topper	cf8e6d0a76	[X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544	2017-11-07 07:13:03 +00:00
Craig Topper	428a4e6374	[X86] Make FeatureAVX512 imply FeatureF16C. The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available. Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns. All known CPUs with AVX512 have F16C so this should safe for now. llvm-svn: 317521	2017-11-06 22:49:04 +00:00
Simon Pilgrim	ad9b9720e8	[X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI. We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch. llvm-svn: 317485	2017-11-06 15:28:25 +00:00
Simon Pilgrim	14450720e6	[X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before XFormVExtractWithShuffleIntoLoad combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad. Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit. llvm-svn: 317481	2017-11-06 14:34:19 +00:00
Uriel Korach	bb86686a8b	[X86][AVX512] Improve lowering of AVX512 test intrinsics Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465	2017-11-06 09:22:38 +00:00
Zvi Rackover	3122698040	X86 ISel: Basic support for variable-index vector permutations Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a constant vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463	2017-11-06 08:25:46 +00:00
Jina Nahias	3844f1ad5c	Revert "adding a pattern for broadcastm" This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462	2017-11-06 07:48:58 +00:00
Jina Nahias	7b705f1f91	[x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458	2017-11-06 07:09:24 +00:00
Jina Nahias	9c6561b648	adding a pattern for broadcastm Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457	2017-11-06 07:09:09 +00:00
Craig Topper	07dac55d95	[X86] Add scalar FMA ISD nodes without rounding mode. NFC Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453	2017-11-06 05:48:25 +00:00
Craig Topper	692c8efe30	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413	2017-11-04 18:26:41 +00:00
Craig Topper	a96d62b360	[X86] Teach shuffle lowering to use 256-bit SHUF128 when possible. This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403	2017-11-04 06:44:47 +00:00
Craig Topper	d21a53f246	[X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load. llvm-svn: 317382	2017-11-03 22:48:13 +00:00
Simon Pilgrim	ae1f013495	[X86][SSE] Add PACKUS support to combineVectorTruncation Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315	2017-11-03 11:33:48 +00:00
Craig Topper	333897ec31	[X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299	2017-11-03 06:48:02 +00:00
Simon Pilgrim	e152c2c447	[X86][SSE] Add PACKUS support to LowerTruncate Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128	2017-11-01 21:52:29 +00:00
Simon Pilgrim	778810eb42	[X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with PACKSS/PACKUS functions Renamed to truncateVectorWithPACK llvm-svn: 317098	2017-11-01 15:31:51 +00:00
Simon Pilgrim	f657ba0cb6	[X86][SSE] Truncate with PACKSS any input with sufficient sign-bits So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes). The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc. Differential Revision: https://reviews.llvm.org/D39476 llvm-svn: 317086	2017-11-01 11:47:44 +00:00
Simon Pilgrim	f3c33ca83e	[X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to computeKnownBits/ComputeNumSignBits Mainly a perf improvements as most combines will have occurred before we lower to these instructions llvm-svn: 317005	2017-10-31 16:06:21 +00:00
Jina Nahias	5bf6620b15	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b llvm-svn: 316921	2017-10-30 16:37:28 +00:00
Craig Topper	4e13d4de52	[X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913	2017-10-30 14:51:37 +00:00
Jina Nahias	e63db55c67	Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic." This reverts commit r316890. Change-Id: I683cceee9848ef309b452293086b1f26a941950d llvm-svn: 316894	2017-10-30 10:35:53 +00:00
Jina Nahias	70280f9a0d	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 316890	2017-10-30 09:59:52 +00:00
Craig Topper	495a1bc893	[X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this. If the carry flag is being used, this transformation isn't safe. This does prevent some test cases from using DEC now, but I'll try to look into that separately. Fixes PR35068. llvm-svn: 316860	2017-10-29 06:51:04 +00:00
Craig Topper	7a60e29185	[X86] Fix typo in comment. NFC llvm-svn: 316859	2017-10-29 06:51:02 +00:00
Craig Topper	0692ca4bd2	[X86] Remove invalid code from LowerVSELECT. This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal. We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable. llvm-svn: 316851	2017-10-28 23:10:13 +00:00
Simon Pilgrim	294f88dfa0	[X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS. llvm-svn: 316845	2017-10-28 20:51:27 +00:00
Simon Pilgrim	bd3852aa5e	[X86][SSE] Split off matchVectorShuffleWithPACK. NFCI. Split matchVectorShuffleWithPACK from lowerVectorShuffleWithPACK so that we can reuse it for target shuffle combines llvm-svn: 316844	2017-10-28 20:27:22 +00:00
Simon Pilgrim	25808c303f	[X86][SSE] Rename truncateVectorCompareWithPACKSS to truncateVectorWithPACKSS. NFC. We no longer rely on the vector source being a comparison result, just have sufficient sign bits. llvm-svn: 316834	2017-10-28 17:59:56 +00:00
Craig Topper	b8d7d4d683	[X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702	2017-10-26 21:12:03 +00:00
Sanjay Patel	ac50f3e907	[x86] use an insert op to put one variable element into a constant of vectors Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it. Differential Revision: https://reviews.llvm.org/D38756 llvm-svn: 316685	2017-10-26 18:27:55 +00:00
Simon Pilgrim	5e8c3f328f	[X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC llvm-svn: 316462	2017-10-24 17:04:57 +00:00
Simon Pilgrim	0a12c239b6	[X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448	2017-10-24 15:38:16 +00:00
Simon Pilgrim	c36dd6ae9c	[X86] truncateVectorCompareWithPACKSS - remove duplicate variables. NFCI. llvm-svn: 316440	2017-10-24 14:18:32 +00:00
Simon Pilgrim	321e54f72d	[X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8 Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS llvm-svn: 316383	2017-10-23 22:05:02 +00:00

1 2 3 4 5 ...

4926 Commits