llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	05c0d34918	[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0) If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB. The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4). This matches what we already do for PEXTRW. Differential Revision: https://reviews.llvm.org/D76138	2020-03-13 18:43:04 +00:00
Simon Pilgrim	846c614f54	[X86] combineExtractWithShuffle - pull out repeated getSizeInBits() call. NFC.	2020-03-13 15:36:04 +00:00
Simon Pilgrim	fe047fbccc	[X86] LowerEXTRACT_VECTOR_ELT - pull out repeated getOperand() calls. NFC. Also, cleanup LowerEXTRACT_VECTOR_ELT_SSE4 comments which had references to non-constant extraction indices.	2020-03-13 15:36:02 +00:00
Simon Pilgrim	4689eae820	[X86] combineOrShiftToFunnelShift - remove shift by immediate handling. Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.	2020-03-12 11:46:51 +00:00
Simon Pilgrim	b3b4727a3e	[X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467) For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD. The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms. Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases. The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up. Differential Revision: https://reviews.llvm.org/D75748	2020-03-11 11:17:49 +00:00
Simon Pilgrim	c8ede5e485	[X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.	2020-03-10 15:42:37 +00:00
Simon Pilgrim	e6a7e3b5e3	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles. This causes one minor test change but is mainly necessary for an upcoming patch.	2020-03-10 15:42:36 +00:00
Simon Pilgrim	18c19441d1	[X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128 For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern. At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.	2020-03-10 10:44:28 +00:00
Craig Topper	ef4f939d38	[X86] Remove isel patterns for (X86VBroadcast (i16 (trunc (i32 (load))))). Replace with a DAG combine to form VBROADCAST_LOAD. isTypeDesirableForOp prevents loads from being shrunk to i16 by DAG combine. Because of this we can't just match the broadcast and a scalar load. So look for broadcast+truncate+load and form a vbroadcast_load during DAG combine. This replaces what was previously done as an isel pattern and I think fixes it so we won't change the size of a volatile load. But my main motivation is just to clean up our isel patterns.	2020-03-10 00:07:07 -07:00
Simon Pilgrim	4b130b883d	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - reduce vector width of X86ISD::BLENDI If we don't need the upper subvector elements of the BLENDI node then use a smaller vector size. This causes a couple of minor regressions in insertelement-ones.ll which are more examples of PR26018; given how cheap allones generation is I don't consider that a showstopper, just an annoyance (and there's plenty of other poor codegen cases in that file).	2020-03-09 18:29:28 +00:00
Craig Topper	3dcc0db15e	[X86] Teach combineToExtendBoolVectorInReg to create opportunities for using broadcast load instructions. If we're inserting a scalar that is smaller than the element size of the final VT, the value of the extra bits doesn't matter. Previously we any_extended in the scalar domain before inserting. This patch changes this to use a broadcast of the original scalar type and then a bitcast to the final type. This might enable the use of a broadcast load. This recovers regressions from `07d68c24aa` and `9fcd212e2f` without relying on alignment of the load. Differential Revision: https://reviews.llvm.org/D75835	2020-03-09 11:26:12 -07:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
Craig Topper	70e4fb8a53	[X86] Add DAG combine to turn (vzext_movl (vbroadcast_load)) -> vzext_load. If we're zeroing the other elements then we don't need the broadcast.	2020-03-08 00:35:40 -08:00
Craig Topper	d81d451442	[X86] Add DAG combine to replace vXi64 vzext_movl+scalar_to_vector with vYi32 vzext_movl+scalar_to_vector if the upper 32 bits of the scalar are zero. We can just use a 32-bit copy and zero in the SSE domain when we zero the upper bits. Remove an isel pattern that becomes dead with this.	2020-03-07 16:14:26 -08:00
Craig Topper	d41ea65ee8	[X86] Add DAG combines to enable removing of movddup/vbroadcast + simple_load isel patterns.	2020-03-07 15:22:02 -08:00
Craig Topper	bc65b68661	[X86] Add a DAG combine to turn vbroadcast(vzload X) -> vbroadcast_load Remove now unneeded isel patterns.	2020-03-07 15:22:02 -08:00
Craig Topper	ec1d1f6ae7	[X86] Use MVT instead of EVT in a couple shuffle lowering functions.	2020-03-07 09:50:53 -08:00
Reid Kleckner	65b21282c7	Avoid emitting unreachable SP adjustments after `throw` In `172eee9c`, we tried to avoid these by modelling the callee as internally resetting the stack pointer. However, for the majority of functions with reserved stack frames, this would lead LLVM to emit extra SP adjustments to undo the callee's internal adjustment. This lead us to fix the problem further on down the pipeline in eliminateCallFramePseudoInstr. In `5b79e603d3`, I added use a heuristic to try to detect when the adjustment would be unreachable. This heuristic is imperfect, and when exception handling is involved, it fails to fire. The new test is an example of this. Simply throwing an exception with an active cleanup emits dead SP adjustments after the throw. Not only are they dead, but if they were executed, they would be incorrect, so they are confusing. This change essentially reverts `172eee9c` and makes the `5b79e603d3` heuristic responsible for preventing unreachable stack adjustments. This means we may emit unreachable stack adjustments for functions using EH with unreserved call frames, but that is not very many these days. Back in 2016 when this change was added, we were focused on 32-bit, which we observed to have fewer reserved frames. Fixes PR45064 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D75712	2020-03-06 13:33:45 -08:00
Craig Topper	4c7c87f245	[X86] Simplify the code at the end of lowerShuffleAsBroadcast. The original code could create a bitcast from f64 to i64 and back on 32-bit targets. This was only working because getBitcast was able to fold the casts away to avoid leaving the illegal i64 type. Now we handle the scalar case directly by broadcasting using the scalar type as the element type. Then bitcasting to the final VT. This works since we ensure the scalar type is the same size as the final VT element type. No more casts to i64. For the vector case, we cast to VT or subvector of VT. And then do the broadcast. I think this all matches what we generated before, just in a more readable way.	2020-03-04 20:45:02 -08:00
Craig Topper	eadea7868f	[X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file Previously we tried to promote these to xmm/ymm/zmm by promoting in the X86CallingConv.td file. But this breaks when we run out of xmm/ymm/zmm registers and need to fall back to memory. We end up trying to create a non-sensical scalar to vector. This lead to an assertion. The new tests in avx512-calling-conv.ll all trigger this assertion. Since we really want to treat these types like we do on avx2, it seems better to promote them before the calling convention code gets involved. Except when the calling convention is one that passes the vXi1 type in a k register. The changes in avx512-regcall-Mask.ll are because we indicated that xmm/ymm/zmm types should be passed indirectly for the Win64 ABI before we go to the common lines that promoted the vXi1 types. This caused the promoted types to be picked up by the default calling convention code. Now we promote them earlier so they get passed indirectly as though they were xmm/ymm/zmm. Differential Revision: https://reviews.llvm.org/D75154	2020-03-04 15:02:32 -08:00
Craig Topper	06de426426	[X86] Directly form VBROADCAST_LOAD in lowerShuffleAsBroadcast on AVX targets. If we would emit a VBROADCAST node, we can instead directly emit a VBROADCAST_LOAD. This allows us to get rid of the special case to use an f64 load on 32-bit targets for vXi64. I believe there is more cleanup we can do later in this function, but I'll do that in follow ups.	2020-03-04 09:11:57 -08:00
Craig Topper	9284abd004	[X86] Directly form VBROADCAST_LOAD for BUILD_VECTOR of splat loads in lowerBuildVectorAsBroadcast.	2020-03-03 22:27:34 -08:00
Craig Topper	3c4e635593	[X86] Always emit an integer vbroadcast_load from lowerBuildVectorAsBroadcast regardless of AVX vs AVX2 If we go with D75412, we no longer depend on the scalar type directly. So we don't need to avoid using i64. We already have AVX1 fallback patterns with i32 and i64 scalar types so we don't need to avoid using integer types on AVX1. Differential Revision: https://reviews.llvm.org/D75413	2020-03-03 10:39:11 -08:00
Craig Topper	56cd3bc209	[X86] Directly emit VBROADCAST_LOAD from constant pool in lowerBuildVectorAsBroadcast Also add a DAG combine to combine different sized broadcasts from constant pool to avoid a regression. Differential Revision: https://reviews.llvm.org/D75412	2020-03-03 10:39:10 -08:00
Craig Topper	68aeaab888	[X86] Don't count the chain uses when forming broadcast loads in lowerBuildVectorAsBroadcast. The build_vector needs to be the only user of the data, but the chain will likely have another use. So we can't make sure the build_vector is the only user of the node.	2020-03-03 08:41:31 -08:00
Craig Topper	2f4f8fcf64	[X86] Don't add DELETED_NODES to DAG combine worklist after calling SimplifyDemandedBits/SimplifyDemandedVectorElts. These AddToWorklist calls were added in `84cd968f75`. It's possible the SimplifyDemandedBits/SimplifyDemandedVectorElts triggered CSE that deleted N. Detect that and avoid adding N to the worklist. Fixes PR45067.	2020-03-01 00:06:32 -08:00
Craig Topper	f2d45e5097	[X86] Canonicalize (bitcast (vbroadcast_load)) so that the cast and vbroadcast_load are both integer or fp. Helps a little with some isel pattern matching. Especially on 32-bit targets where we sometimes use f64 loads.	2020-02-28 15:07:49 -08:00
Craig Topper	b68eeff05c	[X86] Cleanup a comment around bitcasting X86ISD::VBROADCAST_LOAD and add an assert to make sure memory VT size doesn't change.	2020-02-28 15:07:49 -08:00
Craig Topper	c0d0e6b198	[X86] Recognize CVTPH2PS from STRICT_FP_EXTEND This should avoid scalarizing the cvtph2ps intrinsics with D75162 Differential Revision: https://reviews.llvm.org/D75304	2020-02-28 10:19:57 -08:00
Simon Pilgrim	f90cc633de	Fix cppcheck definition/declaration arg mismatch warnings. NFCI.	2020-02-27 14:35:20 +00:00
Simon Pilgrim	fe6bcfaf3b	[X86] Use Subtarget.useSoftFloat() in X86TargetLowering constructor Avoid use of X86TargetLowering::useSoftFloat() in the constructor as its a virtual function	2020-02-27 14:35:20 +00:00
Simon Pilgrim	e61e7f0794	Fix shadow variable warning. NFC.	2020-02-27 14:23:05 +00:00
Simon Pilgrim	dc7ac563ac	Fix shadow variable warnings. NFC.	2020-02-27 14:21:30 +00:00
Simon Pilgrim	efe2f59ec4	[X86] LowerMSCATTER/MGATHER - reduce scope of MaskVT. NFCI. Fixes cppcheck warning.	2020-02-27 14:20:44 +00:00
Simon Pilgrim	fabe52a741	Fix uninitialized variable warning. NFC.	2020-02-27 14:20:43 +00:00
Simon Pilgrim	6bdd63dc28	[X86] createVariablePermute - handle case where recursive createVariablePermute call fails Account for the case where a recursive createVariablePermute call with a wider vector type fails. Original test case from @craig.topper (Craig Topper)	2020-02-27 13:52:31 +00:00
Craig Topper	82a21c1655	[X86] Add proper MachinePointerInfo to stack store created in LowerWin64_i128OP.	2020-02-26 16:55:24 -08:00
Craig Topper	870363a22d	[X86] Explicitly pass Destination VT and debug location to BuildFILD. NFC We'd already passed most everything else. Might was well pass these two things and stop passing Op.	2020-02-26 16:26:46 -08:00
Craig Topper	15e2831fcd	[X86] Explicitly pass Pointer, MachinePointerInfo and Alignment to BuildFILD. Previously this code was called into two ways, either a FrameIndexSDNode was passed in StackSlot. Or a load node was passed in the argument called StackSlot. This was determined by a dyn_cast to FrameIndexSDNode. In the case of a load, we had to go find the real pointer from operand 0 and cast the node to MemSDNode to find the pointer info. For the stack slot case, the code assumed that the stack slot was perfectly aligned despite not being the creator of the slot. This commit modifies the interface to make the caller responsible for passing all of the required information to avoid all the guess work and reverse engineering. I'm not aware of any issues with the original code after an earlier commit to fix the alignment of one of the stack objects. This is just clean up to make the code less surprising.	2020-02-26 16:26:26 -08:00
Craig Topper	77d9b7b2cd	[X86] Query constant pool object alignment instead of hardcoding.	2020-02-26 14:45:39 -08:00
Craig Topper	9c1a707ba3	[X86] Use proper alignment for stack temporary and correct MachinePointerInfo for stack accesses in LowerUINT_TO_FP.	2020-02-26 14:45:38 -08:00
Craig Topper	a8186935ae	[X86] Use correct MachineMemOperand for stack load in LowerFLT_ROUNDS_	2020-02-26 14:45:38 -08:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Craig Topper	9238dfb4d8	[X86] Remove mask output from X86 gather/scatter ISD opcodes. Instead add it when we make the machine nodes during instruction selections. This makes this ISD node closer to ISD::MGATHER. Trying to see if we remove the X86 specific ones.	2020-02-24 23:56:28 -08:00
Simon Pilgrim	daac8dba77	[X86] combineX86ShuffleChain - select X86ISD::FAND/ISD::AND based on MaskVT Noticed by inspection, we shouldn't use FloatDomain directly, we've already bitcast both inputs to MaskVT so select the opcode using that.	2020-02-24 18:24:44 +00:00
Simon Pilgrim	59d8d13c7b	[X86] getTargetShuffleInputs - check that the source inputs are all the right size. I'm hoping to begin improving shuffle combining across different vector sizes, but before that we must ensure that all existing getTargetShuffleInputs calls must bail if the inputs aren't the same size.	2020-02-24 16:26:10 +00:00
Craig Topper	7a7146cf72	[X86] When creating X86ISD::MGATHER nodes from AVX2 gather intrinsics, cast the mask to integer type. The gather intrinsics use a floating point mask when the result type is FP. But we call DemandedBits on the mask assuming its an integer type. We also use integer types when we create it from generic IR. So add a bitcast to the intrinsic path to guarantee the integer type.	2020-02-23 23:00:41 -08:00
Craig Topper	5a70518660	[X86] Remove most X86 specific subclasses of MemSDNode. Just use a MemIntrinsicSDNode as we usually do. Leave the gather/scatter subclasses, but make them inherit from MemIntrinsicSDNode and delete their constructor and destructor. This way we can still have the getIndex, getMask, etc. convenience functions.	2020-02-23 15:13:32 -08:00
Craig Topper	15b6aa7448	[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1. Still a little room for improvement by using movlps to store to the stack temporary needed to move data out of the xmm register after the load.	2020-02-23 15:11:38 -08:00
Craig Topper	2a10f8019d	[X86] Use FIST for i64 atomic stores on 32-bit targets without SSE.	2020-02-23 15:11:38 -08:00
Craig Topper	84cd968f75	[X86] Add AddToWorklist(N) after calls to SimplifyDemandedBits/SimplifyDemandedVectorElts that are called on an operand of N. If a simplication occurs the operand will be added to the worklist. But since the demanded mask was based on N, we need to make sure we revisit N in case there are more simplifications to be done. Returning SDValue(N, 0) as we do, only tells DAG combine that something changed, but that won't make it add anything to the worklist. Found while playing around with using VEXTRACT_STORE in more cases. But I guess this doesn't affect any of our existing tests.	2020-02-22 21:42:59 -08:00
Craig Topper	bdb1729c83	[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets. We can use MOVLPS which will load 64 bits, but we need a v4f32 result type. We already have isel patterns for this. The code here is a little hacky. We can probably improve it with more isel patterns.	2020-02-22 18:50:52 -08:00
Craig Topper	e7a184fc7c	[X86] Use movlps for i64 atomic stores on 32-targets with sse1. This is similar to using movd which we do for sse2 targets. I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts to clean up some artifacts from type legalization.	2020-02-22 18:22:47 -08:00
Craig Topper	228a2bc9b7	[X86] Teach combineCVTPH2PS to shrink v8i16 loads when the output type is v4f32. Remove extra isel patterns. Similar to what do for other operations that use a subset of bits. Allows us to remove a pattern that shrinks a load. Which was incorrect if the load was volatile.	2020-02-21 18:11:07 -08:00
Nikita Popov	c90ea87cfd	[X86] Fix SDLoc initialization Fixes -Wparentheses warning, in this case indicating a genuine bug.	2020-02-21 18:26:05 +01:00
Craig Topper	97f11600e0	[X86] Don't bother avoiding illegal FCMOVs if we don't have the cmov subtarget feature. We'll be forced to emit branches so we might as well use the most direct condition.	2020-02-21 00:34:15 -08:00
Craig Topper	263bef2bbc	[X86] Make combineCMov not create unsupported FCMOVs when f32/f64 are using X87. This makes the behavior consistent with what's in LowerSELECT.	2020-02-21 00:34:15 -08:00
Craig Topper	4576606831	[X86] Remove unnecessary isNullConstant in LowerSelect. NFC At this point in the code we know that Op1 or Op2 is all ones. Y points to the other operand. In the case that Op2 is zero, Op1 must be all ones and Y is Op2. The OR ORs Y into Res. But if Y is 0 the OR will be folded away by getNode so we don't need to check for it.	2020-02-20 21:41:13 -08:00
Craig Topper	78be618717	[X86] Add CMOV_VR64 pseudo instruction for MMX. Remove mmx handling from combineSelect. The combineSelect code was casting to i64 without any check that i64 was legal. This can break after type legalization. It also required splitting the mmx register on 32-bit targets. It's not clear that this makes sense. Instead switch to using a cmov pseudo like we do for XMM/YMM/ZMM.	2020-02-20 20:30:56 -08:00
Craig Topper	e5782377f3	[X86] Add CMOV_VK1 pseudo so we don't crash on v1i1 ISD::SELECT	2020-02-20 15:13:48 -08:00
Craig Topper	7e92769862	[X86] Expand vselect of v1i1 under avx512. We already do this for v2i1, v4i1, etc.	2020-02-20 15:13:47 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	c7b54a196e	Recommit "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT"" With the correct author this time	2020-02-20 12:28:54 -08:00
Craig Topper	1d8860f90b	Revert `714265dabb` "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT" I accidentally messed up the author on the previous commit somehow.	2020-02-20 12:28:33 -08:00
Quentin Colombet	714265dabb	[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT The type here isn't guaranteed to be a simple type. Fixes PR44976	2020-02-20 12:25:37 -08:00
Sanjay Patel	064cd2ecdb	[x86] allow peeking through an extract_subvector to find a splatted operand The motivating case is seen in "splat4_v8f32_load_store" and based on code in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 (I haven't stepped through the v8i32 sibling test yet to see why that diverged.) There are other potential improvements visible like allowing scalarization or vector narrowing. Differential Revision: https://reviews.llvm.org/D74909	2020-02-20 13:59:59 -05:00
Craig Topper	0ed7a61543	[X86] Fix a -Wparentheses warning. NFC	2020-02-20 09:32:03 -08:00
Craig Topper	3543ac9ab5	[X86] Rewrite LowerBRCOND to remove dead code and handle ISD::SETCC and overflow ops directly. There's a lot of old leftover code in LowerBRCOND. Especially the detecting or AND or OR of X86ISD::SETCC nodes. Those were needed before LegalizeDAG was changed to visit nodes before their operands. It also relied on reversing the output of LowerSETCC to find the flags producing node to use for the X86ISD::BRCOND node. Rather than using LowerSETCC this patch uses emitFlagsForSetcc to handle the integer ISD::SETCC case. This gives the flag producer and the comparison code to use directly. I've removed the addTest flag and just produce a X86ISD::BRCOND and return immediately. Floating point ISD::SETCC case is just an X86ISD::FCMP with special care for OEQ and UNE derived from the previous code. I've left f128 out so it will emit a test. And LowerSETCC will be called later to produce a libcall and X86ISD::SETCC. We have combines that can merge the test and X86ISD::SETCC. We need to handle two cases for overflow ops. Either they are used directly or they have a seteq 0 or setne 1 to invert the overflow. The old code did not handle the setne 1 case, but I think some other combines were making up for it. If we fail to find a condition, we'll wrap an AND with 1 on the original condition and tell emitFlagsForSetcc to emit a compare with 0. This will pickup the LowerAndToBT and or the EmitTest case. I kept the isTruncWithZeroHighBitsInput call, but we might be able to fold that in to emitFlagsForSetcc. Differential Revision: https://reviews.llvm.org/D74750	2020-02-20 08:50:18 -08:00
Craig Topper	12cc105f80	[X86] Add DAG combines to form CVTPH2PS/CVTPS2PH from vXf16->vXf32/vXf64 fp_extends and vXf32->vXf16 fp_round. Only handle power of 2 element count for simplicity. Not sure what to do with vXf64->vXf16 fp_round to avoid double rounding Differential Revision: https://reviews.llvm.org/D74886	2020-02-20 08:26:17 -08:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Craig Topper	f559cecc3e	[X86] Add DCI.isBeforeLegalize() check to the v64i1 constant splitting code in combineStore. We only need to split after type legalization. If we're before we can just use a wide store and type legalization will split it. Add a v128i1 test to exercise it post type legalization.	2020-02-19 09:18:16 -08:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
Craig Topper	f69a29da5a	[X86] Remove vXi1 select optimization from LowerSELECT. Move it to DAG combine.	2020-02-19 00:00:55 -08:00
Craig Topper	0dbc4658d8	[X86] Handle splats in LowerBUILD_VECTORvXi1 by directly emitting scalar selects instead of deferring that to LowerSELECT. LoweSELECT will detect the constant inputs and convert to scalar selects, but we can do it directly here. I might remove some of the code from LowerSELECT and move it to DAG combine so doing this explicitly will make us less dependent on it happening in lowering.	2020-02-18 22:39:30 -08:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Craig Topper	89ab5c69c8	[X86] Add a helper function to pull some repeated code out of combineGatherScatter. NFC	2020-02-18 11:10:40 -08:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
Craig Topper	e90dc7c48b	[X86] Move avx512 code that forces zeros to the false side of vselects above a check for legal types. This helps this transform occur earlier so we can fold the not with setcc. If we delay it until after type legalization we might have introduced instructions to widen the mask if the vselect was widened. This can prevent the not from making it to the setcc. We could of course add more DAG combines to handle that, but moving this earlier is easier.	2020-02-17 22:24:21 -08:00
Craig Topper	b0840934a7	[X86] Use isScalarFPTypeInSSEReg to simplify code in LowerSELECT. NFC	2020-02-17 19:43:57 -08:00
Craig Topper	3f4490d384	[X86] Add one use check to '0-x == y --> x+y == 0' in EmitCmp. I failed to copy it when I moved this in `b62de210cf`	2020-02-17 18:16:42 -08:00
Craig Topper	43e948c4b7	[X86] Change how the alignment for the stack object is created in LowerFLT_ROUNDS_. We don't need FrameInfo's concept of the stack alignment. We just need to tell it the desired alignment. Which in this case is 2.	2020-02-17 11:27:34 -08:00
Craig Topper	b62de210cf	[X86] Move '0-x == y --> x+y == 0' and similar combines to EmitCmp. AArch64 handles this pattern in their lowering code. By emitting CMN. ARM handles it as an isel pattern.	2020-02-17 11:27:34 -08:00
Nikita Popov	80397d2d12	[IRBuilder] Delete copy constructor D73835 will make IRBuilder no longer trivially copyable. This patch deletes the copy constructor in advance, to separate out the breakage. Currently, the IRBuilder copy constructor is usually used by accident, not by intention. In rG7c362b25d7a9 I've fixed a number of cases where functions accepted IRBuilder rather than IRBuilder &, thus performing an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an IRBuilder was copied, while an InsertPointGuard should have been used instead. The only non-trivial use of the copy constructor is the getIRBForDbgInsertion() helper, for which I separated construction and setting of the insertion point in this patch. Differential Revision: https://reviews.llvm.org/D74693	2020-02-17 18:14:48 +01:00
Craig Topper	464729cf7c	[X86] Remove unnecessary check for null SDValue. NFC	2020-02-16 20:25:24 -08:00
Craig Topper	272d35aef5	[X86] Separate floating point handling out of EmitCmp and emitFlagsForSetcc. Both of those functions only have a single caller starting at LowerSETCC. Just handle floating point directly in LowerSETCC. This removes the need to pass Chain and IsSignaling all the way down.	2020-02-16 10:51:05 -08:00
Craig Topper	d26f11108b	[X86] Split X86ISD::CMP into an integer and FP opcode.	2020-02-16 10:10:19 -08:00
Simon Pilgrim	b85df2e185	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to PALIGNR	2020-02-16 16:13:26 +00:00
Simon Pilgrim	c9c1c2b335	[X86] combineX86ShuffleChain - add support for combining 512-bit shuffles to bit shifts	2020-02-16 16:13:25 +00:00
Sanjay Patel	e48b536be6	[x86] form broadcast of scalar memop even with >1 use The unseen logic diff occurs because MayFoldLoad() is defined like this: static bool MayFoldLoad(SDValue Op) { return Op.hasOneUse() && ISD::isNormalLoad(Op.getNode()); } The test diffs here all seem ok to me on screen/paper, but it's hard to know if that will lead to universally better perf for all targets. For example, if a target implements broadcast from mem as multiple uops, we would have to weigh the potential reduction of instructions and register pressure vs. possible increase in number of uops. I don't know if we can make a truly informed decision on this at compile-time. The motivating case that I'm looking at in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 ...resembles the diff in extract-concat.ll, but we're not going to change the larger example there without at least 1 other fix. Differential Revision: https://reviews.llvm.org/D74088	2020-02-16 10:32:56 -05:00
Simon Pilgrim	34a054ce71	[X86] combineX86ShuffleChain - add support for combining to X86ISD::ROTLI Refactors matchShuffleAsBitRotate to allow use by both lowerShuffleAsBitRotate and matchUnaryPermuteShuffle.	2020-02-15 20:04:54 +00:00
Simon Pilgrim	2492075add	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option. REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.	2020-02-14 11:55:18 +00:00
Craig Topper	c2e8a421ac	[X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL. If we widen the compare we might trigger a spurious exception from the garbage data. We have two choices here. Explicitly force the upper bits to zero. Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM result to mask register. I've chosen to go with the second option. I'm not sure which is really best. In some cases we could get rid of the zeroing since the producing instruction probably already zeroed it. But we lose the ability to fold a load. So which is best is dependent on surrounding code. Differential Revision: https://reviews.llvm.org/D74522	2020-02-13 13:26:40 -08:00
Amy Huang	de1d90299b	Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets" This reverts commit `11c16e7159` because it causes a crash in chromium code. See https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.	2020-02-12 17:00:37 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Simon Pilgrim	ff307c8120	[X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN. Differential Revision: https://reviews.llvm.org/D74231	2020-02-12 16:07:27 +00:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00

1 2 3 4 5 ...

7035 Commits