llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	59d8d13c7b	[X86] getTargetShuffleInputs - check that the source inputs are all the right size. I'm hoping to begin improving shuffle combining across different vector sizes, but before that we must ensure that all existing getTargetShuffleInputs calls must bail if the inputs aren't the same size.	2020-02-24 16:26:10 +00:00
Craig Topper	7a7146cf72	[X86] When creating X86ISD::MGATHER nodes from AVX2 gather intrinsics, cast the mask to integer type. The gather intrinsics use a floating point mask when the result type is FP. But we call DemandedBits on the mask assuming its an integer type. We also use integer types when we create it from generic IR. So add a bitcast to the intrinsic path to guarantee the integer type.	2020-02-23 23:00:41 -08:00
Craig Topper	5a70518660	[X86] Remove most X86 specific subclasses of MemSDNode. Just use a MemIntrinsicSDNode as we usually do. Leave the gather/scatter subclasses, but make them inherit from MemIntrinsicSDNode and delete their constructor and destructor. This way we can still have the getIndex, getMask, etc. convenience functions.	2020-02-23 15:13:32 -08:00
Craig Topper	15b6aa7448	[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1. Still a little room for improvement by using movlps to store to the stack temporary needed to move data out of the xmm register after the load.	2020-02-23 15:11:38 -08:00
Craig Topper	2a10f8019d	[X86] Use FIST for i64 atomic stores on 32-bit targets without SSE.	2020-02-23 15:11:38 -08:00
Craig Topper	84cd968f75	[X86] Add AddToWorklist(N) after calls to SimplifyDemandedBits/SimplifyDemandedVectorElts that are called on an operand of N. If a simplication occurs the operand will be added to the worklist. But since the demanded mask was based on N, we need to make sure we revisit N in case there are more simplifications to be done. Returning SDValue(N, 0) as we do, only tells DAG combine that something changed, but that won't make it add anything to the worklist. Found while playing around with using VEXTRACT_STORE in more cases. But I guess this doesn't affect any of our existing tests.	2020-02-22 21:42:59 -08:00
Craig Topper	bdb1729c83	[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets. We can use MOVLPS which will load 64 bits, but we need a v4f32 result type. We already have isel patterns for this. The code here is a little hacky. We can probably improve it with more isel patterns.	2020-02-22 18:50:52 -08:00
Craig Topper	e7a184fc7c	[X86] Use movlps for i64 atomic stores on 32-targets with sse1. This is similar to using movd which we do for sse2 targets. I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts to clean up some artifacts from type legalization.	2020-02-22 18:22:47 -08:00
Craig Topper	228a2bc9b7	[X86] Teach combineCVTPH2PS to shrink v8i16 loads when the output type is v4f32. Remove extra isel patterns. Similar to what do for other operations that use a subset of bits. Allows us to remove a pattern that shrinks a load. Which was incorrect if the load was volatile.	2020-02-21 18:11:07 -08:00
Nikita Popov	c90ea87cfd	[X86] Fix SDLoc initialization Fixes -Wparentheses warning, in this case indicating a genuine bug.	2020-02-21 18:26:05 +01:00
Craig Topper	97f11600e0	[X86] Don't bother avoiding illegal FCMOVs if we don't have the cmov subtarget feature. We'll be forced to emit branches so we might as well use the most direct condition.	2020-02-21 00:34:15 -08:00
Craig Topper	263bef2bbc	[X86] Make combineCMov not create unsupported FCMOVs when f32/f64 are using X87. This makes the behavior consistent with what's in LowerSELECT.	2020-02-21 00:34:15 -08:00
Craig Topper	4576606831	[X86] Remove unnecessary isNullConstant in LowerSelect. NFC At this point in the code we know that Op1 or Op2 is all ones. Y points to the other operand. In the case that Op2 is zero, Op1 must be all ones and Y is Op2. The OR ORs Y into Res. But if Y is 0 the OR will be folded away by getNode so we don't need to check for it.	2020-02-20 21:41:13 -08:00
Craig Topper	78be618717	[X86] Add CMOV_VR64 pseudo instruction for MMX. Remove mmx handling from combineSelect. The combineSelect code was casting to i64 without any check that i64 was legal. This can break after type legalization. It also required splitting the mmx register on 32-bit targets. It's not clear that this makes sense. Instead switch to using a cmov pseudo like we do for XMM/YMM/ZMM.	2020-02-20 20:30:56 -08:00
Craig Topper	e5782377f3	[X86] Add CMOV_VK1 pseudo so we don't crash on v1i1 ISD::SELECT	2020-02-20 15:13:48 -08:00
Craig Topper	7e92769862	[X86] Expand vselect of v1i1 under avx512. We already do this for v2i1, v4i1, etc.	2020-02-20 15:13:47 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	c7b54a196e	Recommit "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT"" With the correct author this time	2020-02-20 12:28:54 -08:00
Craig Topper	1d8860f90b	Revert `714265dabb` "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT" I accidentally messed up the author on the previous commit somehow.	2020-02-20 12:28:33 -08:00
Quentin Colombet	714265dabb	[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT The type here isn't guaranteed to be a simple type. Fixes PR44976	2020-02-20 12:25:37 -08:00
Sanjay Patel	064cd2ecdb	[x86] allow peeking through an extract_subvector to find a splatted operand The motivating case is seen in "splat4_v8f32_load_store" and based on code in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 (I haven't stepped through the v8i32 sibling test yet to see why that diverged.) There are other potential improvements visible like allowing scalarization or vector narrowing. Differential Revision: https://reviews.llvm.org/D74909	2020-02-20 13:59:59 -05:00
Craig Topper	0ed7a61543	[X86] Fix a -Wparentheses warning. NFC	2020-02-20 09:32:03 -08:00
Craig Topper	3543ac9ab5	[X86] Rewrite LowerBRCOND to remove dead code and handle ISD::SETCC and overflow ops directly. There's a lot of old leftover code in LowerBRCOND. Especially the detecting or AND or OR of X86ISD::SETCC nodes. Those were needed before LegalizeDAG was changed to visit nodes before their operands. It also relied on reversing the output of LowerSETCC to find the flags producing node to use for the X86ISD::BRCOND node. Rather than using LowerSETCC this patch uses emitFlagsForSetcc to handle the integer ISD::SETCC case. This gives the flag producer and the comparison code to use directly. I've removed the addTest flag and just produce a X86ISD::BRCOND and return immediately. Floating point ISD::SETCC case is just an X86ISD::FCMP with special care for OEQ and UNE derived from the previous code. I've left f128 out so it will emit a test. And LowerSETCC will be called later to produce a libcall and X86ISD::SETCC. We have combines that can merge the test and X86ISD::SETCC. We need to handle two cases for overflow ops. Either they are used directly or they have a seteq 0 or setne 1 to invert the overflow. The old code did not handle the setne 1 case, but I think some other combines were making up for it. If we fail to find a condition, we'll wrap an AND with 1 on the original condition and tell emitFlagsForSetcc to emit a compare with 0. This will pickup the LowerAndToBT and or the EmitTest case. I kept the isTruncWithZeroHighBitsInput call, but we might be able to fold that in to emitFlagsForSetcc. Differential Revision: https://reviews.llvm.org/D74750	2020-02-20 08:50:18 -08:00
Craig Topper	12cc105f80	[X86] Add DAG combines to form CVTPH2PS/CVTPS2PH from vXf16->vXf32/vXf64 fp_extends and vXf32->vXf16 fp_round. Only handle power of 2 element count for simplicity. Not sure what to do with vXf64->vXf16 fp_round to avoid double rounding Differential Revision: https://reviews.llvm.org/D74886	2020-02-20 08:26:17 -08:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Craig Topper	f559cecc3e	[X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore. We only need to split after type legalization. If we're before we can just use a wide store and type legalization will split it. Add a v128i1 test to exercise it post type legalization.	2020-02-19 09:18:16 -08:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
Craig Topper	f69a29da5a	[X86] Remove vXi1 select optimization from LowerSELECT. Move it to DAG combine.	2020-02-19 00:00:55 -08:00
Craig Topper	0dbc4658d8	[X86] Handle splats in LowerBUILD_VECTORvXi1 by directly emitting scalar selects instead of deferring that to LowerSELECT. LoweSELECT will detect the constant inputs and convert to scalar selects, but we can do it directly here. I might remove some of the code from LowerSELECT and move it to DAG combine so doing this explicitly will make us less dependent on it happening in lowering.	2020-02-18 22:39:30 -08:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Craig Topper	89ab5c69c8	[X86] Add a helper function to pull some repeated code out of combineGatherScatter. NFC	2020-02-18 11:10:40 -08:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Craig Topper	e90dc7c48b	[X86] Move avx512 code that forces zeros to the false side of vselects above a check for legal types. This helps this transform occur earlier so we can fold the not with setcc. If we delay it until after type legalization we might have introduced instructions to widen the mask if the vselect was widened. This can prevent the not from making it to the setcc. We could of course add more DAG combines to handle that, but moving this earlier is easier.	2020-02-17 22:24:21 -08:00
Craig Topper	b0840934a7	[X86] Use isScalarFPTypeInSSEReg to simplify code in LowerSELECT. NFC	2020-02-17 19:43:57 -08:00
Craig Topper	3f4490d384	[X86] Add one use check to '0-x == y --> x+y == 0' in EmitCmp. I failed to copy it when I moved this in `b62de210cf`	2020-02-17 18:16:42 -08:00
Craig Topper	43e948c4b7	[X86] Change how the alignment for the stack object is created in LowerFLT_ROUNDS_. We don't need FrameInfo's concept of the stack alignment. We just need to tell it the desired alignment. Which in this case is 2.	2020-02-17 11:27:34 -08:00
Craig Topper	b62de210cf	[X86] Move '0-x == y --> x+y == 0' and similar combines to EmitCmp. AArch64 handles this pattern in their lowering code. By emitting CMN. ARM handles it as an isel pattern.	2020-02-17 11:27:34 -08:00
Nikita Popov	80397d2d12	[IRBuilder] Delete copy constructor D73835 will make IRBuilder no longer trivially copyable. This patch deletes the copy constructor in advance, to separate out the breakage. Currently, the IRBuilder copy constructor is usually used by accident, not by intention. In rG7c362b25d7a9 I've fixed a number of cases where functions accepted IRBuilder rather than IRBuilder &, thus performing an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an IRBuilder was copied, while an InsertPointGuard should have been used instead. The only non-trivial use of the copy constructor is the getIRBForDbgInsertion() helper, for which I separated construction and setting of the insertion point in this patch. Differential Revision: https://reviews.llvm.org/D74693	2020-02-17 18:14:48 +01:00
Craig Topper	464729cf7c	[X86] Remove unnecessary check for null SDValue. NFC	2020-02-16 20:25:24 -08:00
Craig Topper	272d35aef5	[X86] Separate floating point handling out of EmitCmp and emitFlagsForSetcc. Both of those functions only have a single caller starting at LowerSETCC. Just handle floating point directly in LowerSETCC. This removes the need to pass Chain and IsSignaling all the way down.	2020-02-16 10:51:05 -08:00
Craig Topper	d26f11108b	[X86] Split X86ISD::CMP into an integer and FP opcode.	2020-02-16 10:10:19 -08:00
Simon Pilgrim	b85df2e185	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR	2020-02-16 16:13:26 +00:00
Simon Pilgrim	c9c1c2b335	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts	2020-02-16 16:13:25 +00:00
Sanjay Patel	e48b536be6	[x86] form broadcast of scalar memop even with >1 use The unseen logic diff occurs because MayFoldLoad() is defined like this: static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode()); } The test diffs here all seem ok to me on screen/paper, but it's hard to know if that will lead to universally better perf for all targets. For example, if a target implements broadcast from mem as multiple uops, we would have to weigh the potential reduction of instructions and register pressure vs. possible increase in number of uops. I don't know if we can make a truly informed decision on this at compile-time. The motivating case that I'm looking at in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 ...resembles the diff in extract-concat.ll, but we're not going to change the larger example there without at least 1 other fix. Differential Revision: https://reviews.llvm.org/D74088	2020-02-16 10:32:56 -05:00
Simon Pilgrim	34a054ce71	[X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.	2020-02-15 20:04:54 +00:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Craig Topper	c2e8a421ac	[X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL. If we widen the compare we might trigger a spurious exception from the garbage data. We have two choices here. Explicitly force the upper bits to zero. Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM result to mask register. I've chosen to go with the second option. I'm not sure which is really best. In some cases we could get rid of the zeroing since the producing instruction probably already zeroed it. But we lose the ability to fold a load. So which is best is dependent on surrounding code. Differential Revision: https://reviews.llvm.org/D74522	2020-02-13 13:26:40 -08:00
Amy Huang	de1d90299b	Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets" This reverts commit `11c16e7159` because it causes a crash in chromium code. See https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.	2020-02-12 17:00:37 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Simon Pilgrim	ff307c8120	[X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN. Differential Revision: https://reviews.llvm.org/D74231	2020-02-12 16:07:27 +00:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Craig Topper	0daf9b8e41	[X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses them to implement soft promotion for the half type. This is enough to provide basic support for __fp16 with strictfp. Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C is enabled.	2020-02-11 22:30:04 -08:00
Craig Topper	846d0ac43e	[X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512 We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen. Differential Revision: https://reviews.llvm.org/D74431	2020-02-11 14:36:29 -08:00
Simon Pilgrim	fa620fc8e2	[X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC.	2020-02-11 13:37:57 +00:00
Simon Pilgrim	11c16e7159	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.	2020-02-11 12:21:03 +00:00
Craig Topper	798305d29b	[X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns. We need to use vector instructions for these operations. Previously we handled this with isel patterns that used extra instructions and copies to handle the the conversions. Now we use custom lowering to emit the conversions. This allows them to be pattern matched and optimized on their own. For example we can now emit vpextrw to store the result if its going directly to memory. I've forced the upper elements to VCVTPHS2PS to zero to keep some code similar. Zeroes will be needed for strictfp. I've added a DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra instructions in between to be closer to the previous codegen. This is a step towards strictfp support for f16 conversions.	2020-02-10 22:01:48 -08:00
Simon Pilgrim	f319074824	[X86] combineConcatVectorOps - combine X86ISD::PACKSS ops	2020-02-10 17:48:02 +00:00
Simon Pilgrim	74c0f98cf5	[X86] combineConcatVectorOps - combine X86ISD::VPERMI ops	2020-02-10 17:48:01 +00:00
Simon Pilgrim	2463b8c97d	[X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-10 16:59:09 +00:00
Simon Pilgrim	06617c4522	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.	2020-02-10 16:16:56 +00:00
Simon Pilgrim	39eade73a5	Revert rGe82e17d4d4cac8b2df00094e80d5e1cb22795664 - [X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types). --- Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.	2020-02-10 12:14:26 +00:00
Craig Topper	06ba969c9d	[X86] Make (insert_vector_elt (v8i16 zerovec), i16 %x, 0) generate the same code as (v8i16 (build_vector %x, 0, 0, 0, 0, 0, 0, 0)). Instead of using a insrw to element 0, use movzx and movd. Same for v16i8.	2020-02-09 21:52:11 -08:00
Simon Pilgrim	29e646fe65	[X86] combineConcatVectorOps - combine VROTLI/VROTRI ops Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:50:10 +00:00
Simon Pilgrim	e82e17d4d4	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:15:03 +00:00
Simon Pilgrim	29621b2534	[X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI. A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.	2020-02-09 18:35:50 +00:00
Simon Pilgrim	3ec6de07e9	Fix signed/unsigned warning.	2020-02-09 13:35:03 +00:00
Simon Pilgrim	644d56b432	[X86] Recognise ROTLI/ROTRI rotations as faux shuffles Allows us to combine rotations with shuffles. One of many things necessary to fix PR44379 (lowering shuffles to rotations)	2020-02-09 12:25:49 +00:00
serge_sans_paille	e67cbac812	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 10:42:45 +01:00
serge-sans-paille	4546211600	Revert "Support -fstack-clash-protection for x86" This reverts commit `0fd51a4554`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354	2020-02-09 10:06:31 +01:00
serge_sans_paille	0fd51a4554	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 09:35:42 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
serge-sans-paille	658495e6ec	Revert "Support -fstack-clash-protection for x86" This reverts commit `e229017732`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604 http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308	2020-02-08 14:26:22 +01:00
serge_sans_paille	e229017732	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with better option handling and more portable testing Differential Revision: https://reviews.llvm.org/D68720	2020-02-08 13:31:52 +01:00
Simon Pilgrim	7f5b3fa73c	[X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree Peek through X86ISD::FRCP nodes to see if there is a negatible input.	2020-02-08 10:56:27 +00:00
Simon Pilgrim	4229f12a22	[TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC. This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.	2020-02-08 08:55:51 +00:00
Nico Weber	b03c3d8c62	Revert "Support -fstack-clash-protection for x86" This reverts commit `4a1a0690ad`. Breaks tests on mac and win, see https://reviews.llvm.org/D68720	2020-02-07 14:49:38 -05:00
serge_sans_paille	4a1a0690ad	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with correct option flags set. Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 19:54:39 +01:00
Sanjay Patel	de6f7eb47e	[x86] don't create an unused constant vector Noticed while scanning through debug spew. Creating unused nodes is inefficient and makes following the debug output harder.	2020-02-07 12:05:02 -05:00
Simon Pilgrim	c96001035d	[X86] isNegatibleForFree - allow pre-legalized FMA negation As long as the FMA operation is legal (which we can proxy for the FMA3/FMA4 variants as well), we don't have to wait for the LegalOperations stage.	2020-02-07 17:04:17 +00:00
serge-sans-paille	f6d98429fc	Revert "Support -fstack-clash-protection for x86" This reverts commit `39f50da2a3`. The -fstack-clash-protection is being passed to the linker too, which is not intended. Reverting and fixing that in a later commit.	2020-02-07 11:36:53 +01:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
serge_sans_paille	39f50da2a3	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html Differential Revision: https://reviews.llvm.org/D68720	2020-02-07 10:56:15 +01:00
Craig Topper	3f62028f2f	[X86] Use SelectionDAG::getAllOnesConstant to simplify some code. NFC	2020-02-06 21:32:53 -08:00
Craig Topper	ec9a94af4d	[X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2 X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812	2020-02-06 13:32:13 -08:00
Craig Topper	4175d7e22e	[X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary If we don't have cmov, X87 compares write to FPSW and we need to move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions. Previously this was done by calling ConvertCmpIfNecessary in multiple places which would emit the extra code for the FNSTSW, a shift, a truncate, and a SAHF instructions. Isel would then select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW. This patch centralizes all of the handling into a single custom isel handler. This allows us to remove ConvertCmpIfNecessary and a couple target specific ISD opcodes. Differential Revision: https://reviews.llvm.org/D73863	2020-02-06 10:43:06 -08:00
Sanjay Patel	0a389c81cd	[x86] use getSplatIndex() in lowerShuffleAsBroadcast() The old code was doing an N^2 search for splat index. Differential Revision: https://reviews.llvm.org/D74064	2020-02-05 14:55:02 -05:00
Craig Topper	a3d489e87e	[X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns. Same for any_extend though we don't have coverage for that. The test changes are because isel didn't check one use of the setcc_carry. So in isel we would end up with two different sized setcc_carry instructions. And since it clobbers the flags we would need to recreate the flags for the second instruction. This code handles additional uses by truncating the new wide setcc_carry back to the original size for those uses.	2020-02-04 22:40:36 -08:00
Craig Topper	016f42e3dc	[X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist. lrint/llrint are defined as rounding using the current rounding mode. Numbers that can't be converted raise FE_INVALID and an implementation defined value is returned. They may also write to errno. I believe this means we can use cvtss2si/cvtsd2si or fist to convert as long as -fno-math-errno is passed on the command line. Clang will leave them as libcalls if errno is enabled so they won't become ISD::LRINT/LLRINT in SelectionDAG. For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si but we can use fist since it can write to a 64-bit memory location. Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq targets? gcc also does this optimization. I think we might be able to do this with STRICT_LRINT/LLRINT as well, but I've left that for future work. Differential Revision: https://reviews.llvm.org/D73859	2020-02-04 16:15:40 -08:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Craig Topper	e195ff98f6	Recommit "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This time with correct types for the data result from the SUB. Original commit message: Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-04 12:19:34 -08:00
Kadir Cetinkaya	d2b6ac6ccd	Revert "[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places." This reverts commit `8413116bf1`. this seems to be causing crashes while compiling ncurses. ``` $ ./bin/llc bugpoint-reduced-simplified.ll LLVM ERROR: Cannot emit physreg copy instruction ``` Here are the crashers: https://gist.github.com/kadircet/918f5bb97a2afe048cb875490edba46e executing with an llc compiled at `904d54de9b` works fine.	2020-02-04 11:22:53 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Craig Topper	cd14b4a62b	[X86] Remove unneeded code that looks for (and (i8 (X86setcc_c)) I don't believe we use this construct anymore so I don't think we need to look for it.	2020-02-03 23:18:11 -08:00
Craig Topper	4581d97416	[X86] Remove some uncovered and possibly broken code from combineZext. This code matches (zext (trunc (setcc_carry))) -> (and (setcc_carry), 1) but the code never checks what type we're truncating too. An and mask of 1 would only make sense if the trunc was to MVT::i1, but we didn't check for that. I believe this code is a leftover from when i1 was a legal type.	2020-02-03 22:59:39 -08:00
Craig Topper	8413116bf1	[X86] Use X86ISD::SUB instead of X86ISD::CMP in some places. Our normal lowering for ISD::SETCC uses X86ISD::SUB to enable CSE unless the RHS is 0. optimizeCompareInstr called by the peephole pass can turn subs with unused results into cmps to clean this up. This commit makes other places that create X86ISD::CMP have the same behavior.	2020-02-03 21:01:11 -08:00

1 2 3 4 5 ...

6990 Commits