llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	e268598dd3	[X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335	2017-12-22 02:30:30 +00:00
Craig Topper	9befe89367	[X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1. llvm-svn: 321334	2017-12-22 02:30:26 +00:00
Craig Topper	8772228963	[X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering extract_vector_elt from vXi1 with a non-const index. We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing. llvm-svn: 321315	2017-12-21 22:08:23 +00:00
Craig Topper	742ac98d01	[X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to 512-bit if we have VLX. This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage. llvm-svn: 321303	2017-12-21 20:45:13 +00:00
Craig Topper	410a289b79	[X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX. We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX. We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support. llvm-svn: 321291	2017-12-21 18:44:06 +00:00
Simon Pilgrim	4de5bb093c	[X86][SSE] Split large PAVGB/PAVGW vectors to legal widths Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly. Differential Revision: https://reviews.llvm.org/D41440 llvm-svn: 321288	2017-12-21 18:12:31 +00:00
Craig Topper	72c22f4366	[X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W. PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first. llvm-svn: 321251	2017-12-21 08:22:51 +00:00
Craig Topper	38af615b4c	[X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have otherwise used two PSHUFBs ORed together. llvm-svn: 321249	2017-12-21 07:31:30 +00:00
Craig Topper	03b2bc4838	[X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are supported. llvm-svn: 321248	2017-12-21 05:58:31 +00:00
Craig Topper	07820f2fe4	[X86] Remove zext from vXi32 to vXi64 on indices of gather/scatter instructions if we can prove the pre-extended value is positive. Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension. llvm-svn: 321209	2017-12-20 19:25:33 +00:00
Craig Topper	bc92e00f2e	[X86] Implement the fusing of MUL+SUBADD to FMSUBADD This patch turns shuffles of fadd/fsub with fmul into fmsubadd. Patch by Dmitry Venikov Differential Revision: https://reviews.llvm.org/D40335 llvm-svn: 321200	2017-12-20 18:05:15 +00:00
Craig Topper	abed821c36	[X86] Optimize sign extends on index operand to gather/scatter to not sign extend past i32. The gather instruction will implicitly sign extend to the pointer width, we don't need to further extend it. This can prevent unnecessary splitting in some cases. There's still an issue that lowering on non-VLX can introduce another sign extend that doesn't get combined with shifts from a lowered sign_extend_inreg. llvm-svn: 321152	2017-12-20 07:36:59 +00:00
Craig Topper	158d54d954	[X86] Add a missing return to combineGatherScatter after sucessful combine. Not sure how to test this cause I think the worst that happens is that we don't revisit the node a second time to look for additional combines. We used UpdateNodeOperands so the updating the DAG work was already done. llvm-svn: 321148	2017-12-20 06:44:50 +00:00
Craig Topper	aee3acb9a8	[X86] Remove code from combineSext that looks for MVT::i1 after operation legalization which can never happen. Type legalization guarantees this to be impossible since MVT::i1 isn't a legal type. llvm-svn: 321132	2017-12-20 01:00:01 +00:00
Craig Topper	fbdb236a8a	[X86] Add an assert to indicate that there is only once specific VT allowed at a certain point in LowerMULH. Helps with code readability a little. llvm-svn: 321118	2017-12-19 22:38:09 +00:00
Simon Pilgrim	d873b6f6ba	[X86][AVX512] Attempt target shuffle combining to different types instead of early-out We try to prevent shuffle combining to value types that would stop the folding of masked operations, but by just returning early, we were failing to try different shuffle types. The TODOs are all still relevant here to improve codegen but we're lacking test examples. llvm-svn: 321085	2017-12-19 16:54:07 +00:00
Simon Pilgrim	fd5df639a3	[X86][SSE] Add cpu feature for aggressive combining to variable shuffles As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads). This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles). The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet. Differential Revision: https://reviews.llvm.org/D41323 llvm-svn: 321074	2017-12-19 13:16:43 +00:00
Simon Pilgrim	f6d4ab6daf	[X86][SSE] Use (V)PHMINPOSUW for vXi8 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841) Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed. This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16. Differential Revision: https://reviews.llvm.org/D41294 llvm-svn: 321070	2017-12-19 12:02:40 +00:00
Craig Topper	13142b10d5	[X86] Don't extend v16i8 non-uniform shifts to v16i32 if we have BWI. Use v16i16 instead. BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can. llvm-svn: 321059	2017-12-19 06:59:10 +00:00
Craig Topper	6e3091c265	[X86] Use a specific list of MVTs in combineShiftRightArithmetic instead of iterating over every integer VT and checking their size. Previously, we were checking for MVTs with sizes betwen 8 and 64 which only includes i8, i16, i32, and i64 today. But I don't think we should assume that and should list the types that are legal for x86. I also don't think we need i64 since type legalization is guaranteed to split those up. llvm-svn: 321058	2017-12-19 06:29:00 +00:00
Craig Topper	eb13a418e1	[X86] Remove unnecessary check for integer VT from combineShiftRightArithmetic. I doubt there's any way to create a ashr for an FP type. llvm-svn: 321057	2017-12-19 06:28:58 +00:00
Craig Topper	da853a9c2f	[X86] Remove dead code for turning vector shifts by large amounts into a zero vector. Pretty sure these are handled by a target independent DAG combine that turns them into undef these days. llvm-svn: 321056	2017-12-19 05:21:50 +00:00
Craig Topper	ad3a554889	[X86] Use ZERO_EXTEND instead of ANY_EXTEND when extending the shift amount for a non-uniform shift. My reading of the SDM says that all bits of the shift amount are used. If the value of the element is larger than the number of bits the result the shift result is zero. So I think we need to zero_extend here to avoid garbage in the upper bits. In reality we lower any_extend as zero_extend so in most cases it would be hard to hit this. llvm-svn: 321055	2017-12-19 04:52:04 +00:00
Matthias Braun	a4852d2c19	X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036	2017-12-18 23:19:42 +00:00
Craig Topper	8e2837cc6e	[X86] Fix mistake that I made when splitting up the setOperationAction calls recently. The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI \|\| hasVLX. Here I've qualified it with hasBWI && (hasAVX512 \|\| hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch. llvm-svn: 320957	2017-12-18 04:50:05 +00:00
Craig Topper	fd8d040820	[X86] Make the code that creates fmaddsub from build_vector of extracts and inserts functional and add tests. Summary: We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2. The missing coverage was noticed during the review of D40335. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41133 llvm-svn: 320950	2017-12-17 18:23:45 +00:00
Craig Topper	ee1e71e576	[X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 extractions. llvm-svn: 320937	2017-12-17 01:35:48 +00:00
Craig Topper	c0c2d19e08	[X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32. This allows us to remove some isel patterns that allowed MVT::i8 result type. llvm-svn: 320936	2017-12-17 01:35:47 +00:00
Craig Topper	c609dc8f55	[X86] Don't create X86ISD::VEXTRACT nodes directly. Use EXTRACT_VECTOR_ELT and allow that to be legaized to VEXTRACT. I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step. llvm-svn: 320935	2017-12-17 01:35:44 +00:00
Simon Pilgrim	5c0c93ed4c	Fix unused variable warning. llvm-svn: 320934	2017-12-16 23:37:51 +00:00
Simon Pilgrim	4c9e8215e9	[X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTs Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source. Fixes PR32007 llvm-svn: 320933	2017-12-16 23:32:18 +00:00
Simon Pilgrim	88c10bc969	[X86][AVX] Use extract128BitVector helper. NFCI. llvm-svn: 320932	2017-12-16 23:09:57 +00:00
Simon Pilgrim	f3b6da00f5	[X86][AVX] Fix failed broadcast fold Strip excess BITCASTs from EXTRACT_SUBVECTOR input llvm-svn: 320930	2017-12-16 22:57:17 +00:00
Craig Topper	849b717c86	[X86] Don't pass a zero input to the passthru operand of getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs. llvm-svn: 320928	2017-12-16 21:12:24 +00:00
Craig Topper	93253e189c	[X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB instead of creating a select with one input being 0. llvm-svn: 320927	2017-12-16 21:12:23 +00:00
Craig Topper	1260a4e826	[X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32. Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half. llvm-svn: 320926	2017-12-16 19:31:36 +00:00
Craig Topper	1c7d07c601	[X86] Remove unneeded code for handling the old kunpck intrinsics. llvm-svn: 320917	2017-12-16 06:58:30 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Craig Topper	422ed23298	[X86] In LowerVectorCTPOP use ISD::ZERO_EXTEND/ISD::TRUNCATE instead of the target specific nodes. The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better. llvm-svn: 320863	2017-12-15 21:18:05 +00:00
Craig Topper	f08ab74ae3	[X86] Remove unnecessary TODO. When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register. llvm-svn: 320862	2017-12-15 20:57:18 +00:00
Craig Topper	3fb8386685	[SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with non-constant index Summary: Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1. We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires. For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that. For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them. Reviewers: RKSimon, delena, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40942 llvm-svn: 320849	2017-12-15 19:35:22 +00:00
Craig Topper	ad9221d684	[X86] Widen (v2i32 (fp_to_uint v2f64)) to (v8i32 (fp_to_uint v8f64)) during legalization if we have AVX512F, but not VLX. NFC Previously we widened it using isel patterns. llvm-svn: 320824	2017-12-15 16:22:20 +00:00
Craig Topper	7cfacbf6ea	[X86] Fix a couple bugs in my recent changes to vXi1 insert_subvector lowering. A couple places didn't use the same SDValue variables to connect everything all the way through. I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors. llvm-svn: 320790	2017-12-15 07:16:41 +00:00
Craig Topper	1a1e6d6cf6	[X86] Add a TODO about v8i1 CONCAT_VECTORS. llvm-svn: 320784	2017-12-15 01:03:46 +00:00
Craig Topper	5ebf3ac9c2	[X86] Further rearrange the setOperationAction calls to separate the ones that require 512-bit registers OR VLX into separate sections. NFCI We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel. This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX. The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch. llvm-svn: 320782	2017-12-15 01:03:43 +00:00
Craig Topper	07a28f777e	[X86] Group setOperationActions related to vXi1 masks together. NFCI Previously they were sort of interleaved in with XMM/YMM/ZMM action related code. Trying to separate things so its easier to split 512-bit vectors later. llvm-svn: 320781	2017-12-15 01:03:42 +00:00
Craig Topper	b89bc20a64	[X86] Make ISD::INSERT_SUBVECTOR v8i1 legal with AVX512F because we should be custom lowering inserting v1i1 into v8i1 under this. I don't have a test case at the moment. Just noticed while auditing things. llvm-svn: 320780	2017-12-15 01:03:40 +00:00
Craig Topper	212070486d	[X86] Move some of the hasVLX qualified code out of the main hasAVX512 block in the X86ISelLowering constructor. NFCI Move it into the separate hasVLX block later in the constructor. I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096. llvm-svn: 320779	2017-12-15 01:03:38 +00:00
Craig Topper	4341a7b08c	[X86] Remove an unnecessary SmallVector that was collecting chains for two SDNode's we're still holding SDValues for. NFCI We can just get the chains from those SDValues to create the TokenFactor. llvm-svn: 320757	2017-12-14 22:50:10 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00

1 2 3 4 5 ...

5065 Commits