llvm-project

Commit Graph

Author	SHA1	Message	Date
Zachary Turner	260fe3eca6	Fix many -Wsign-compare and -Wtautological-constant-compare warnings. Most of the -Wsign-compare warnings are due to the fact that enums are signed by default in the MS ABI, while the tautological comparison warnings trigger on x86 builds where sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max() is always false. Differential Revision: https://reviews.llvm.org/D41256 llvm-svn: 320750	2017-12-14 22:07:03 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Craig Topper	600f1ba333	[X86] Don't zero the upper bits of the k-register before extracting a single bit from a vXi1. This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway. llvm-svn: 320723	2017-12-14 18:35:25 +00:00
Michael Zuckerman	19fd217eaa	[AVX512] Adding support for load truncate store of I1 store operation on a truncated memory (load) of vXi1 is poorly supported by LLVM and most of the time end with an assertion. This patch fixes this issue. Differential Revision: https://reviews.llvm.org/D39547 Change-Id: Ida5523dd09c1ad384acc0a27e9e59273d28cbdc9 llvm-svn: 320691	2017-12-14 11:55:50 +00:00
Craig Topper	8cdf7c0e68	[X86] Make ANY_EXTEND from vXi1 Custom for more types. We should be able to support ANY_EXTEND for any types we support ZERO_EXTEND for. llvm-svn: 320675	2017-12-14 08:26:00 +00:00
Craig Topper	271a5c72a0	[X86] Remove redundant setOperationAction calls. These calls already exist earlier under AVX2 feature. llvm-svn: 320673	2017-12-14 08:25:53 +00:00
Simon Pilgrim	f51f4d3623	[X86][SSE] MOVMSK only uses the sign bit from each vector element Pass the input vector through SimplifyDemandedBits as we only need the sign bit from each vector element of MOVMSK We'd probably get more hits if SimplifyDemandedBits was better at handling vectors... Differential Revision: https://reviews.llvm.org/D41119 llvm-svn: 320570	2017-12-13 11:43:14 +00:00
Craig Topper	712a209db9	[X86] Add a couple TODOs about missing coverage/features motivated by D40335 D40335 was wanting to add FMSUBADD support, but it discovered that there are two pieces of code to make FMADDSUB and only one of those is tested. So I've asked that review to implement the one path until we get tests that test the existing code. llvm-svn: 320507	2017-12-12 18:39:04 +00:00
Nirav Dave	674d053d18	[X86] Cleanup type conversion of 64-bit load-store pairs. Summary: Simplify and generalize chain handling and search for 64-bit load-store pairs. Nontemporal test now converts 64-bit integer load-store into f64 which it realizes directly instead of splitting into two i32 pairs. Reviewers: craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40918 llvm-svn: 320505	2017-12-12 18:25:48 +00:00
Ayman Musa	c2eed926b0	[X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then will be replaced by X86 BZHI machine instruction. Recognize constant arrays with the following values: 0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, .... , 2^(size - 1) -1 where //size// is the size of the array. the result of a load with index //idx// from this array is equivalent to the result of the following: (0xFFFFFFFF >> (sub 32, idx)) (assuming the array of type 32-bit integer). And the result of an 'AND' operation on the returned value of such a load and another input, is exactly equivalent to the X86 BZHI instruction behavior. See test cases in the LIT test for better understanding. Differential Revision: https://reviews.llvm.org/D34141 llvm-svn: 320481	2017-12-12 14:13:51 +00:00
Craig Topper	5ac75d5628	[X86] Improve lowering of vXi1 insert_subvectors to better utilize (insert_subvector zero, vec, 0) for zeroing upper bits. This can be better recognized during isel when the producer already zeroed the upper bits. llvm-svn: 320267	2017-12-09 22:44:42 +00:00
Craig Topper	504534514c	[X86] Don't use getTargetConstant for all 0s and all 1s mask vector. llvm-svn: 320260	2017-12-09 19:18:30 +00:00
Craig Topper	6504a8f888	[X86] When inserting into the upper bits of a vXi1 vector, make sure we shift enough bits if we widened the vector. We may need to widen the vector to make the shifts legal, but if we do that we need to make sure we shift left/right after accounting for the new size. If not we can't guarantee we are shifting in zeros. The test cases affected actually show cases where we should move the shifts all together, but that's another problem. llvm-svn: 320248	2017-12-09 08:19:07 +00:00
Craig Topper	b3e14ce90c	[X86] Improve lowering of concats of mask vectors to better optimize zero vector inputs. We were previously using kunpck with zero inputs unnecessarily. And we had cases where we would insert into a zero vector and then insert into larger zero vector incurring two sets of shifts. llvm-svn: 320244	2017-12-09 07:02:19 +00:00
Craig Topper	7f0d456ef8	[X86] Teach lowering to only let through (insert_subvector (vXi1 zeros), subvec, 0) for vector sizes that have native KSHIFT support. For narrow sizes we'll widen the zero vector and widen the insert. Then do an extract_subvector to get back down to correct size. This allows us to remove some patterns from the isel table that had to COPY_TO_REGCLASS to an oversized register, do the shift and then COPY_TO_REGCLASS back to the narrow register. Now this is represented explicitly in the DAG. This seems to have perturbed the register allocation in one of the tests, but the number of instructions didn't change. llvm-svn: 320190	2017-12-08 20:10:33 +00:00
Sanjay Patel	d4468912b0	[x86] use hasAVX2() rather than hasInt256(); NFC These are aliases, but the thing we're checking here is that the target has vpsllv*, not that the data type is 256-bit. Those instructions exist for 128-bit vectors too...but sadly, not for all element sizes. llvm-svn: 320170	2017-12-08 18:35:51 +00:00
Craig Topper	037115c29f	[X86] Always consider inserting a vXi1 vector into the lsbs of a zero vector to be legal during lowering. Add isel patterns to emit shifts. Previously we only allowed these through if the subvector came from a compare or test instruction which we would again check for during isel. With this change we only check for the compare and test instructions during isel and have fallback patterns that emit the shifts if needed. I noticed that in a lot of cases we don't actually see the compare during lowering and rely on an odd legalization of concat_vectors with a zero vector as the second argument. This keeps the concat_vectors around long enough for a later dag combine to expose the compare then we re-legalize the concat_vectors and catch the compare. llvm-svn: 320134	2017-12-08 08:10:58 +00:00
Craig Topper	323ba39f10	[X86] Handle alls version of vXi1 insert_vector_elt with a constant index without falling back to shuffles. We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert. This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position. The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift. llvm-svn: 320120	2017-12-08 00:16:09 +00:00
Craig Topper	fd86b3cf22	[X86] Fix indentation. NFC llvm-svn: 320119	2017-12-08 00:15:57 +00:00
Craig Topper	dfc79c7c33	[X86] Fix InsertBitToMaskVector to only issue KSHIFTS of native size so that upper bits are properly zeroed. There's no v2i1 or v4i1 kshift, and v8i1 is only supported with AVXDQ. Isel has fake patterns to extend these types to native shifts, but makes no guarantees about the value of any bits shifted in when shifting right. This patch promotes the vector to a type that supports a native shift first and only allows inserting into the msb of a native sized shift. I've constructed this in a way that doesn't do the promotion if we're going to fallback to using a xmm/ymm/zmm shuffle. I think I have a plan to remove the shuffle fall back entirely. In which case we this can be simplified, but I wanted to fix the correctness issue first. llvm-svn: 320081	2017-12-07 20:10:04 +00:00
Craig Topper	7b8fa5f782	[X86] Fix typo in variable name. NFC llvm-svn: 320080	2017-12-07 20:10:01 +00:00
Craig Topper	b67e5da89b	[X86] Make a couple helper lowering methods static. llvm-svn: 320079	2017-12-07 20:09:55 +00:00
Benjamin Kramer	1e9bf765a1	[X86] Avoid unused variable warning in Release builds. NFCI. llvm-svn: 319891	2017-12-06 13:32:36 +00:00
Craig Topper	3275eb7a68	[X86] Split 512-bit vector extends from types other than vXi1 out of LowerZERO_EXTEND_AVX512/LowerSIGN_EXTEND_AVX512. NFCI Most of the code in these routines is for handling extends from vXi1 types. The 512-bit handling for other extends is very much like the AVX2 code. So make the special routines just do vXi1 types and move the other 512-bit handling to the place that handles AVX2. llvm-svn: 319878	2017-12-06 07:37:20 +00:00
Craig Topper	647e4f590f	[X86] Update to getSetCCResultType to be more robust to EVT types. Attempt to determine what the type will be legalized to and then analyze that to see if we will be able to use a vXi1 compare. llvm-svn: 319861	2017-12-06 00:15:17 +00:00
Hans Wennborg	5df9f0878b	Re-commit r319490 "XOR the frame pointer with the stack cookie when protecting the stack" The patch originally broke Chromium (crbug.com/791714) due to its failing to specify that the new pseudo instructions clobber EFLAGS. This commit fixes that. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319824	2017-12-05 20:22:20 +00:00
Jina Nahias	51c1a627c2	[x86][AVX512] Lowering kunpack intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D39719), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39720 Change-Id: I4088d9428478f9457f6afddc90bd3d66b3daf0a1 llvm-svn: 319778	2017-12-05 15:42:56 +00:00
Craig Topper	a404ce955a	[X86] Use vector widening to support sign extend from i1 when the dest type is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319740	2017-12-05 06:37:21 +00:00
Craig Topper	e1ba2450c2	[X86] Fix a crash if avx512bw and xop are both enabled when the IR contrains a v32i8 bitreverse. llvm-svn: 319737	2017-12-05 04:47:12 +00:00
Craig Topper	276c770e57	[X86] Use vector widening to support zero extend from i1 when the dest type is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319728	2017-12-05 01:45:46 +00:00
Craig Topper	913b42b0e1	[X86] Don't use kunpck for vXi1 concat_vectors if the upper bits are undef. This can be efficiently selected by a COPY_TO_REGCLASS without the need for an extra instruction. llvm-svn: 319726	2017-12-05 01:28:06 +00:00
Craig Topper	6302012442	[X86] Use getZeroVector and remove an unnecessary creation of an APInt before calling getConstant. NFCI The getConstant function can take care of creating the APInt internally. getZeroVector will take care of using the correct type for the build vector to avoid re-lowering. The test change here is because execution domain constraints apparently pass through undef inputs of a zeroing xor. So the different ordering of register allocation here caused the dependency to change. llvm-svn: 319725	2017-12-05 01:28:04 +00:00
Craig Topper	adadaae586	[X86] Rearrange some of the code around AVX512 sign/zero extends. NFCI Move the AVX512 code out of LowerAVXExtend. LowerAVXExtend has two callers but one of them pre-checks for AVX-512 so the code is only live from the other caller. So move the AVX-512 checks up to that caller for symmetry. Move all of the i1 input type code in Lower_AVX512ZeroExend together. llvm-svn: 319724	2017-12-05 01:28:00 +00:00
Hans Wennborg	361d4392cf	Revert r319490 "XOR the frame pointer with the stack cookie when protecting the stack" This broke the Chromium build (crbug.com/791714). Reverting while investigating. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 319706	2017-12-04 22:21:15 +00:00
Craig Topper	4520d4f8ad	[X86] Allow VPMAXUQ/VPMAXSQ/VPMINUQ/VPMINSQ to be used with 128/256 bit vectors when AVX512 is enabled. These instructions can be used by widening to 512-bits and extracting back to 128/256. We do similar to several other instructions already. llvm-svn: 319641	2017-12-04 07:21:01 +00:00
Craig Topper	1151facf76	[X86] Don't turn UINT_TO_FP into SINT_TO_FP during lowering. We already do this as a DAG combine. The version during lowering can only trigger if known bits changes something that improves known bits analysis. But this means we should be improving known bits analysis to work on the unlowered form instead. llvm-svn: 319640	2017-12-04 05:38:44 +00:00
Craig Topper	f8470a6399	[X86] Custom legalize v2i32 gathers via widening rather than promoting. The default legalization for v2i32 is promotion to v2i64. This results in a gather that reads 64-bit elements rather than 32. If one of the elements is near a page boundary this can cause an illegal access that can fault. We also miscalculate the scale for the gather which is an even worse problem, but we probably could have found a separate way to fix that. llvm-svn: 319521	2017-12-01 06:02:02 +00:00
Craig Topper	11f733df9b	[X86] Add a DAG combine to simplify masks for AVX2 gather instructions. AVX2 gathers only use the upper bit of the mask allowing us to simplify sign_extend_inreg to a shift left. llvm-svn: 319514	2017-12-01 02:49:07 +00:00
Reid Kleckner	ba4014e9dc	XOR the frame pointer with the stack cookie when protecting the stack Summary: This strengthens the guard and matches MSVC. Reviewers: hans, etienneb Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319490	2017-11-30 22:41:21 +00:00
Craig Topper	d4257565cf	[X86] Promote i8 CTPOP to i32 instead of i16 when we have the POPCNT instruction. The 32-bit version is shorter to encode and the zext we emit for the promotion is likely going to be a 32-bit zero extend anyway. llvm-svn: 319468	2017-11-30 20:15:31 +00:00
Francis Visoiu Mistrih	93ef145862	[CodeGen] Print "%vreg0" as "%0" in both MIR and debug output As part of the unification of the debug format and the MIR format, avoid printing "vreg" for virtual registers (which is one of the current MIR possibilities). Basically: * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g" * grep -nr '%vreg' . and fix if needed * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g" * grep -nr 'vreg[0-9]\+' . and fix if needed Differential Revision: https://reviews.llvm.org/D40420 llvm-svn: 319427	2017-11-30 12:12:19 +00:00
Craig Topper	a495744d2c	[X86] Optimize avx2 vgatherqps for v2f32 with v2i64 index type. Normal type legalization will widen everything. This requires forcing 0s into the mask register. We can instead choose the form that only reads 2 elements without zeroing the mask. llvm-svn: 319406	2017-11-30 07:01:40 +00:00
Craig Topper	321a8b9b63	[X86] Make sure we don't remove sign extends of masks with AVX2 masked gathers. We don't use k-registers and instead use the MSB so we need to make sure we sign extend the mask to the msb. llvm-svn: 319405	2017-11-30 06:31:31 +00:00
Craig Topper	56a41d4b3a	[X86] Remove some questionable looking code that seems to be looking through a VZEXT to create a larger VSEXT. If the input the vzext was signed this would do the wrong thing. Not sure how to test this. llvm-svn: 319382	2017-11-29 23:08:25 +00:00
Craig Topper	e3515001b9	[X86] Remove setOperationAction Promote for ISD::SINT_TO_FP MVT::v8i16/v16i8/v16i16. A DAG combine ensures these ops are always promoted to vXi32. llvm-svn: 319298	2017-11-29 08:19:36 +00:00
Craig Topper	fbf7b3bf3e	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization. llvm-svn: 319266	2017-11-29 00:32:09 +00:00
Craig Topper	88ffb5d4d5	[X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of legal. Fix infinite loop in op legalization when promotion requires 2 steps. Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel. The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value. llvm-svn: 319259	2017-11-28 23:56:02 +00:00
Craig Topper	ab9bfc904b	[X86] Remove unused variable. llvm-svn: 319239	2017-11-28 22:28:23 +00:00
Craig Topper	a27f1e675a	[X86] Remove code from combineUIntToFP that tried to favor UINT_TO_FP if legal when zero extending from vXi8/vX816. The UINT_TO_FP is immediately converted to SINT_TO_FP when the node is re-evaluated because we'll detect that the sign bit is zero. llvm-svn: 319234	2017-11-28 22:08:51 +00:00
Craig Topper	3aaa71f222	[X86] Remove custom lowering for uint_to_fp from vXi8/vXi16. We have a DAG combine that uses a zero extend that should prevent this from ever occurring now. llvm-svn: 319233	2017-11-28 22:08:48 +00:00
Craig Topper	dd4295626b	[X86] In lowerVectorShuffleAsElementInsertion, if were able to find a scalar i8 or i16 and need to zero extend it, make sure we use a vXi32 type of the full vector width. Previously, this was hardcoded to v4i32, but if the input type is 256 bits we need to use v8i32. Fixes PR35443 llvm-svn: 319208	2017-11-28 19:25:45 +00:00
Craig Topper	ddbc340c20	[X86] Make zero extend from v16i1/v8i1 to v16i8/v8i16/v16i16 not scalarize under AVX512. llvm-svn: 319136	2017-11-28 01:36:33 +00:00
Craig Topper	8b9cd03824	[X86] Remove unnecessary fp<->int setOperationAction lines from a hasVLX block. NFCI These lines all exist identically either under SSE2, AVX2 or AVX512. Given that VLX implies all of those, these aren't providing anything new. llvm-svn: 319124	2017-11-28 00:41:12 +00:00
Craig Topper	ce732e7c30	[X86] Remove duplicate calls to setOperationAction. NFCI These same calls exist a few lines down. llvm-svn: 319122	2017-11-28 00:16:42 +00:00
Craig Topper	256cc48df6	[X86] Teach getSetCCResultType to handle more than just SimpleVTs when looking at larger than 512-bit vectors. Which VTs are considered simple is determined by the superset of the legal types of all targets in LLVM. If we're looking at VTs that are going to be split down to 512-bits we should allow any VT not just simple ones since the simple list changes over time as new targets are added. llvm-svn: 319110	2017-11-27 22:56:10 +00:00
Craig Topper	4aa519507d	[X86] Remove lines that set v8f32 FP_ROUND/FP_EXTEND to Legal under AVX512. NFCI We don't do this for narrow vectors under AVX or SSE features. We also don't set them to Expand like we do for many vectors op. Nor does TargetLoweringBase.cpp. This leads me to believe these default to Legal. llvm-svn: 319103	2017-11-27 22:01:17 +00:00
Craig Topper	a4120fc42c	[X86] Teach combineX86ShuffleChain that AllowIntDomain requires at least SSE2. I don't have a good test case for this at the moment. I was playing around with a change in legalizing and triggered this code to produce a PSHUFD with sse1 only. llvm-svn: 319066	2017-11-27 18:15:14 +00:00
Craig Topper	62189f7ab3	[X86] Make getSetCCResultType return vXi1 for any vXi32/vXi64 vector over 512 bits long when AVX512 is enabled. Similar for vXi16/vXi8 with BWI. Any vector larger than 512 bits will be split to 512 bits during legalization. But without this we will fold sexts with them before that making it difficult to recover leading to scalarization. llvm-svn: 319059	2017-11-27 17:51:55 +00:00
Craig Topper	074003c8e2	[X86] Fix an assert that was incorrectly checking for BMI instead of AVX512VBMI. The check is actually unnecessary since AVX512VBMI implies AVX512BW which is the other part of the assert. llvm-svn: 319006	2017-11-26 21:14:48 +00:00
Coby Tayree	d8b17bedfa	[x86][icelake]GFNI galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993	2017-11-26 09:36:41 +00:00
Craig Topper	e485631cd1	[X86] Add separate intrinsics for scalar FMA4 instructions. Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984	2017-11-25 18:32:43 +00:00
Craig Topper	a456f13af2	[X86] Simplify some code in combineSetCC. NFCI Make the condition for doing a std::swap simpler so we don't have to repeat the full checks. llvm-svn: 318970	2017-11-25 07:20:24 +00:00
Craig Topper	696bfc08d8	[X86] Qualify some vector specific code with VT.isVector(). NFCI Other checks inside require a build_vector, but we this lets us stop earlier and makes the code more clear. llvm-svn: 318969	2017-11-25 07:20:23 +00:00
Craig Topper	c1b3269171	[X86] Support folding to andnps with SSE1 only. With SSE1 only, we emit FAND and FXOR nodes for v4f32. llvm-svn: 318968	2017-11-25 07:20:22 +00:00
Craig Topper	5b85df8605	[X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR whe only SSE1 is available. v4i32 isn't a legal type with sse1 only and would end up getting scalarized otherwise. This isn't completely ideal as it doesn't handle cases like v8i32 that would get split to v4i32. But it at least helps with code written using the clang intrinsic header. llvm-svn: 318967	2017-11-25 07:20:21 +00:00
Craig Topper	13ed01e635	[X86] Prevent using X * rsqrt(X) to approximate sqrt when only sse1 is enabled. This optimization can occur after type legalization and emit a vselect with v4i32 type. But that type is not legal with sse1. This ultimately gets scalarized by the second type legalization that runs after vector op legalization, but that's really intended to handle the scalar types that might be introduced by legalizing vector ops. For now just stop this from happening by disabling the optimization with sse1. llvm-svn: 318965	2017-11-24 19:57:48 +00:00
Craig Topper	f31b0b850b	[X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register. llvm-svn: 318933	2017-11-23 18:41:21 +00:00
Craig Topper	94b994972c	[X86] Remove some unneeded opcodes from getVectorMaskingNode. NFC We never reach here with these opcodes. llvm-svn: 318932	2017-11-23 18:41:20 +00:00
Craig Topper	b663adddb0	[X86] Add X86ISD::CMPM_RND to getVectorMaskingNode to select ISD::AND instead of ISD::VSELECT A later DAG combine will turn the VSELECT into an AND, but we have the other mask compare opcodes here so add this one too. llvm-svn: 318931	2017-11-23 18:41:19 +00:00
Craig Topper	27d182b7d4	[X86] Remove some dead code leftover from when i1 was a legal type. NFCI llvm-svn: 318930	2017-11-23 18:41:18 +00:00
Craig Topper	be9bf65d76	[X86] Remove some dead code. NFC AVX512 code never reaches here so we don't need to handle X86ISD::CMPM as an opcode. llvm-svn: 318929	2017-11-23 18:41:17 +00:00
Simon Pilgrim	90accbc5d9	[X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841) (V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well. This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW. A later patch will use this for v16i8 reductions as well (PR32841). Differential Revision: https://reviews.llvm.org/D39729 llvm-svn: 318917	2017-11-23 13:50:27 +00:00
Coby Tayree	e8bdd383e9	[x86][icelake]BITALG 2/3 vpshufbitqmb encoding 3/3 vpshufbitqmb intrinsics Differential Revision: https://reviews.llvm.org/D40222 llvm-svn: 318904	2017-11-23 11:15:50 +00:00
Craig Topper	a7864ed64a	[X86] Turn an if condition that should always be true into an assert. NFCI If Values.size() == 0, we should have returned 0 or undef earlier. If it was 1, it's a splat and we already handled that too. llvm-svn: 318894	2017-11-23 03:24:01 +00:00
Craig Topper	6a0177bcf1	[X86] Remove unnecessary check for is128BitVector. NFC 256 and 512 bit vectors were picked off earlier in the function. Lots of code between there and here already assumed 128-bit vectors. llvm-svn: 318893	2017-11-23 03:24:00 +00:00
Craig Topper	2a38887f28	[X86] Simplify some bitmasking and use llvm_unreachable to mark an impossible case. NFC llvm-svn: 318892	2017-11-23 03:23:59 +00:00
Craig Topper	ac4b0b1a2a	[X86] Remove a ternary operator that can only ever be false. NFC We are checking for AVX512 in an SSE1 only block. llvm-svn: 318891	2017-11-23 03:23:58 +00:00
Craig Topper	726968d6a2	[X86] Support v32i16/v64i8 CTLZ using lookup table. Had to tweak the setcc's used by the code to use a vXi1 result type with a sign extend back to vector size. llvm-svn: 318871	2017-11-22 20:05:57 +00:00
Craig Topper	8ad818656a	[X86] Move the BITALG setOperationAction code into the hasBWI section to match what is done for VPOPCNTDQ in the AVX512F block. NFC llvm-svn: 318870	2017-11-22 20:05:54 +00:00
Craig Topper	e15cc16873	[X86] Sink the MGATHER setOperationActions for AVX2 into the AVX block where most of the rest of the AVX2 legalization lives. llvm-svn: 318869	2017-11-22 20:05:51 +00:00
Craig Topper	ee74044f93	[X86] Add an X86ISD::MSCATTER node for consistency with the X86ISD::MGATHER. This makes the fact that X86 needs an explicit mask output not part of the type constraint for the ISD::MSCATTER. This also gives the X86ISD::MGATHER/MSCATTER nodes a common base class simplifying the address selection code in X86ISelDAGToDAG.cpp llvm-svn: 318823	2017-11-22 08:10:54 +00:00
Craig Topper	c1e7b3f6ca	[X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER. Now we consistently represent the mask result without relying on isel ignoring it. We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints. llvm-svn: 318821	2017-11-22 07:11:03 +00:00
Coby Tayree	5c7fe5df53	[x86][icelake]BITALG vpopcnt{b,w} Differential Revision: https://reviews.llvm.org/D40213 llvm-svn: 318748	2017-11-21 10:32:42 +00:00
Coby Tayree	3880f2a363	[x86][icelake]VNNI Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746	2017-11-21 10:04:28 +00:00
Coby Tayree	71e37cc9ff	[x86][icelake]vbmi2 introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745	2017-11-21 09:48:44 +00:00
Mohammed Agabaria	115f68ea3e	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
Craig Topper	410bbcdcf1	[X86] Qualify a few places with ExperimentalVectorWideningLegalization. I'm playing around with this flag and these places cause errors if not qualified. llvm-svn: 318595	2017-11-18 18:49:16 +00:00
Simon Pilgrim	c9bc55a08d	[X86] Add todo comment for TRUNC(SUB(X,C)) -> SUB(TRUNC(X),C') As discussed on PR35295, but it causes regressions in combineSubToSubus which need to be addressed first llvm-svn: 318594	2017-11-18 18:33:07 +00:00
Craig Topper	3a431cfb13	[X86] Fix typo in variable name. NFC llvm-svn: 318590	2017-11-18 05:09:55 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Craig Topper	089082378f	[X86] Add DAG combine to remove sext i32->i64 from gather/scatter instructions. Only do this pre-legalize in case we're using the sign extend to legalize for KNL. This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends. There's more work that could be done here particularly to fix the i8->i64 test case that experienced split. llvm-svn: 318468	2017-11-16 23:09:06 +00:00
Craig Topper	e85ff4f732	[X86] Pre-truncate gather/scatter indices that have element sizes larger than 64-bits before Legalize. The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization. This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers. llvm-svn: 318452	2017-11-16 20:23:22 +00:00
Craig Topper	04be793cec	[X86] DAGCombinerInfo is in TargetLowering not X86TargetLowering. llvm-svn: 318451	2017-11-16 20:23:17 +00:00
Craig Topper	e6601fd30e	[X86] Custom type legalize v2f32 masked gathers instead of trying to cleanup after type legalization. llvm-svn: 318368	2017-11-16 02:07:45 +00:00
Craig Topper	54b57b0dd8	[X86] Add a return to the end of a switch to prevent an accidental fallthrough in the future. llvm-svn: 318330	2017-11-15 20:42:47 +00:00
Craig Topper	16a91cee6c	[X86] Redefine the 128-bit version of VPGATHERQD and VGATHERQPS to use a VK2 mask instead of a VK4 mask. This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction. While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking. llvm-svn: 318278	2017-11-15 07:46:43 +00:00
Craig Topper	23493f3777	[X86] Attempt to fix signed and unsigned comparison warning. llvm-svn: 318010	2017-11-13 02:19:13 +00:00
Craig Topper	63157c4784	[X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008	2017-11-13 02:03:00 +00:00
Craig Topper	0af48f1ad4	[X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007	2017-11-13 02:02:58 +00:00
Craig Topper	b42a23ff8f	[X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics. This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999	2017-11-12 18:51:09 +00:00

1 2 3 4 5 ...

5065 Commits