llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	7a4a98a9c4	[X86] Move canLowerByDroppingEvenElements earlier to be with matchShuffleWithPACK. NFCI. Make sure its defined earlier so more shuffle lowering methods can use it.	2020-03-31 10:56:35 +01:00
Guillaume Chatelet	bdf77209b9	[Alignment][NFC] Use Align version of getMachineMemOperand Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77059	2020-03-30 15:46:27 +00:00
Simon Pilgrim	e95d04f4f1	[X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.	2020-03-30 12:22:26 +01:00
Simon Pilgrim	9c8ec99c80	[X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.	2020-03-29 19:51:38 +01:00
Simon Pilgrim	8206c50cde	[X86] Add isAnyZero shuffle mask helper	2020-03-29 19:51:37 +01:00
Simon Pilgrim	7734e4b3a3	[X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half. I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.	2020-03-29 16:41:59 +01:00
Simon Pilgrim	da4c7db793	[X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC. This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.	2020-03-29 16:41:58 +01:00
Simon Pilgrim	10439f9e32	[X86][AVX] Add X86ISD::VALIGN target shuffle decode support Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.	2020-03-29 16:41:58 +01:00
Craig Topper	9f7d4150b9	[X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG. These transforms rely on a vector reduction flag on the SDNode set by SelectionDAGBuilder. This flag exists because SelectionDAG can't see across basic blocks so SelectionDAGBuilder is looking across and saving the info. X86 is the only target that uses this flag currently. By removing the X86 code we can remove the flag and the SelectionDAGBuilder code. This pass adds a dedicated IR pass for X86 that looks across the blocks and transforms the IR into a form that the X86 SelectionDAG can finish. An advantage of this new approach is that we can enhance it to shrink the phi nodes and final reduction tree based on the zeroes that we need to concatenate to bring the partially reduced reduction back up to the original width. Differential Revision: https://reviews.llvm.org/D76649	2020-03-26 14:10:20 -07:00
Simon Pilgrim	ad36491ebb	[X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets Extends rG9d1721ce3926 to support AVX2+ targets.	2020-03-26 20:46:24 +00:00
Simon Pilgrim	39a52a19ed	[X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns. We can improve computeKnownBits results by avoiding excess bitcasts. For this pattern we were doing: (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK)))) By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on. This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.	2020-03-26 19:59:57 +00:00
Simon Pilgrim	9d1721ce39	[X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles. The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern. We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.	2020-03-26 15:47:43 +00:00
Simon Pilgrim	e30d29ebc1	[X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT()) As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.	2020-03-26 11:57:45 +00:00
Simon Pilgrim	c6e5531f9b	[X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443. We could probably extend this further to handle non-VLX truncation cases.	2020-03-25 17:41:51 +00:00
Reid Kleckner	597718aae0	Re-land "Avoid emitting unreachable SP adjustments after `throw`" This reverts commit `4e0fe038f4`. Re-lands `65b21282c7`. After landing `5ff5ddd0ad` to add int3 into trailing unreachable blocks, we can now remove these extra stack adjustments without confusing the Win64 unwinder. See https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full explanation. Fixes PR45064.	2020-03-24 12:04:43 -07:00
Simon Pilgrim	714402147d	[X86][SSE1] Add support for logic+movmsk patterns (PR42870) rL368506 handled the basic case, but we need to account for boolean logic patterns as well.	2020-03-24 14:28:40 +00:00
Guillaume Chatelet	3ba550a05a	[Alignment][NFC] Use TFL::getStackAlign() Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76551	2020-03-23 13:48:29 +01:00
Craig Topper	e2cb121374	[X86] Remove maximum vector length limit from combineBasicSADPattern. createPSADBW uses SplitsOpsAndApply so should be able to handle any size. Restrict the extract result type to i32 or i64 since that's what we have coverage for today and probably matches what the isSimple() check gave us before. Differential Revision: https://reviews.llvm.org/D76560	2020-03-22 15:02:05 -07:00
Craig Topper	b89ae50795	[X86] Remove maximum vector width restriction from combineLoopSADPattern. SplitsOpsAndApply will take care of any needed splitting correctly. All that we need to check is that the vector element count is a power of 2. Differential Revision: https://reviews.llvm.org/D76558	2020-03-22 11:09:14 -07:00
Simon Pilgrim	25eb9056d7	[X86] getTargetShuffleAndZeroables - add insert_subvector(undef, sub, c) handling. We often widen xmm/ymm vectors to ymm/zmm by insertion into an undef base vector. By letting getTargetShuffleAndZeroables track the undef elts we can help avoid a lot of unnecessary cross-lane shuffles. Fixes PR44694	2020-03-21 19:11:42 +00:00
Simon Pilgrim	4ceade0428	[X86] Combine concat(shufps,shufps) -> shufps(concat,concat) Now that rG18c19441d105 has improved VPERM2X128 handling, we can perform this to improve x64->x32 truncation without poor cross-lane issues. Someday combineX86ShufflesRecursively will handle this, but we're still really bad at dealing with different vector widths.	2020-03-21 12:44:10 +00:00
Simon Pilgrim	f424d51c3e	Revert rGe6a7e3b5e3e7 "[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles." This reverts commit `e6a7e3b5e3`. Avoids register pressure regression reported at PR45263	2020-03-21 12:14:19 +00:00
Craig Topper	32fbea1548	[X86] Prevent (bitcast (broadcast_load)) combine from producing vXf16 broadcast instructions. The combine tries to put the broadcast in either the integer or fp domain to match the bitcast domain. But we can only do this if the broadcast size is 32 or larger.	2020-03-20 09:15:07 -07:00
Nico Weber	4e0fe038f4	Revert "Avoid emitting unreachable SP adjustments after `throw`" This reverts commit `65b21282c7`. Breaks sanitizer bots (https://reviews.llvm.org/D75712#1927668) and causes https://crbug.com/1062021 (which may or may not be a compiler bug, not clear yet).	2020-03-17 20:49:22 -04:00
Simon Pilgrim	c9656a3b31	[DAGCombiner] matchRotateSub - handle shift amount truncation Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well. This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.	2020-03-17 16:01:23 +00:00
Simon Pilgrim	ebb181cf40	[X86] matchScalarReduction - add support for partial reductions Add optional support for opt-in partial reduction cases by providing an optional partial mask to indicate which elements have been extracted for the scalar reduction.	2020-03-16 18:01:02 +00:00
Simon Pilgrim	e43a085781	[X86] X86::isConstantSplat - enable partial undef bit handling by default. We currently only ever use this for lowering constant uniform values (shift/rotate by immediate) so we can safely enable it by default (it treats the undef bits as zero when extracting constants). This is necessary for an upcoming patch that will use SimplifyDemandedBits more aggressively on funnel shift amounts and causes regressions in vXi64 constant without it.	2020-03-16 12:56:24 +00:00
Simon Pilgrim	ac4609cb1d	[X86] LowerRotate - use X86::isConstantSplat to detect constant splat rotation amounts. Avoid code duplication and matches what we do for the similar LowerFunnelShift and LowerScalarImmediateShift methods.	2020-03-16 12:56:23 +00:00
Simon Pilgrim	ee862adf60	Fix signed/unsigned comparison warning.	2020-03-14 18:42:27 +00:00
Simon Pilgrim	0cb2f089c1	[X86] getFauxShuffleMask - pull out repeated byte sizes varaibles. NFC.	2020-03-14 17:36:17 +00:00
Simon Pilgrim	f47f4c137b	[X86] getFauxShuffleMask - merge insertelement paths Merge the INSERT_VECTOR_ELT/SCALAR_TO_VECTOR and PINSRW/PINSRB shuffle mask paths - they both do the same thing (find source vector + handle implicit zero extension). The PINSRW/PINSRB path also handled in the insertion of zero case which needed to be added to the general case as well.	2020-03-14 13:11:03 +00:00
Craig Topper	755e00876c	[X86] Remove isel patterns for X86VBroadcast+trunc+extload. Replace with DAG combines. This is a little more complicated than I'd like it to be. We have to manually match a trunc+srl+load pattern that generic DAG combine won't do for us due to isTypeDesirableForOp.	2020-03-13 18:12:16 -07:00
Simon Pilgrim	05c0d34918	[X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0) If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB. The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4). This matches what we already do for PEXTRW. Differential Revision: https://reviews.llvm.org/D76138	2020-03-13 18:43:04 +00:00
Simon Pilgrim	846c614f54	[X86] combineExtractWithShuffle - pull out repeated getSizeInBits() call. NFC.	2020-03-13 15:36:04 +00:00
Simon Pilgrim	fe047fbccc	[X86] LowerEXTRACT_VECTOR_ELT - pull out repeated getOperand() calls. NFC. Also, cleanup LowerEXTRACT_VECTOR_ELT_SSE4 comments which had references to non-constant extraction indices.	2020-03-13 15:36:02 +00:00
Simon Pilgrim	4689eae820	[X86] combineOrShiftToFunnelShift - remove shift by immediate handling. Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.	2020-03-12 11:46:51 +00:00
Simon Pilgrim	b3b4727a3e	[X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic opcodes (PR39467) For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD. The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms. Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases. The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up. Differential Revision: https://reviews.llvm.org/D75748	2020-03-11 11:17:49 +00:00
Simon Pilgrim	c8ede5e485	[X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.	2020-03-10 15:42:37 +00:00
Simon Pilgrim	e6a7e3b5e3	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles. This causes one minor test change but is mainly necessary for an upcoming patch.	2020-03-10 15:42:36 +00:00
Simon Pilgrim	18c19441d1	[X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128 For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern. At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.	2020-03-10 10:44:28 +00:00
Craig Topper	ef4f939d38	[X86] Remove isel patterns for (X86VBroadcast (i16 (trunc (i32 (load))))). Replace with a DAG combine to form VBROADCAST_LOAD. isTypeDesirableForOp prevents loads from being shrunk to i16 by DAG combine. Because of this we can't just match the broadcast and a scalar load. So look for broadcast+truncate+load and form a vbroadcast_load during DAG combine. This replaces what was previously done as an isel pattern and I think fixes it so we won't change the size of a volatile load. But my main motivation is just to clean up our isel patterns.	2020-03-10 00:07:07 -07:00
Simon Pilgrim	4b130b883d	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - reduce vector width of X86ISD::BLENDI If we don't need the upper subvector elements of the BLENDI node then use a smaller vector size. This causes a couple of minor regressions in insertelement-ones.ll which are more examples of PR26018; given how cheap allones generation is I don't consider that a showstopper, just an annoyance (and there's plenty of other poor codegen cases in that file).	2020-03-09 18:29:28 +00:00
Craig Topper	3dcc0db15e	[X86] Teach combineToExtendBoolVectorInReg to create opportunities for using broadcast load instructions. If we're inserting a scalar that is smaller than the element size of the final VT, the value of the extra bits doesn't matter. Previously we any_extended in the scalar domain before inserting. This patch changes this to use a broadcast of the original scalar type and then a bitcast to the final type. This might enable the use of a broadcast load. This recovers regressions from `07d68c24aa` and `9fcd212e2f` without relying on alignment of the load. Differential Revision: https://reviews.llvm.org/D75835	2020-03-09 11:26:12 -07:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
Craig Topper	70e4fb8a53	[X86] Add DAG combine to turn (vzext_movl (vbroadcast_load)) -> vzext_load. If we're zeroing the other elements then we don't need the broadcast.	2020-03-08 00:35:40 -08:00
Craig Topper	d81d451442	[X86] Add DAG combine to replace vXi64 vzext_movl+scalar_to_vector with vYi32 vzext_movl+scalar_to_vector if the upper 32 bits of the scalar are zero. We can just use a 32-bit copy and zero in the SSE domain when we zero the upper bits. Remove an isel pattern that becomes dead with this.	2020-03-07 16:14:26 -08:00
Craig Topper	d41ea65ee8	[X86] Add DAG combines to enable removing of movddup/vbroadcast + simple_load isel patterns.	2020-03-07 15:22:02 -08:00
Craig Topper	bc65b68661	[X86] Add a DAG combine to turn vbroadcast(vzload X) -> vbroadcast_load Remove now unneeded isel patterns.	2020-03-07 15:22:02 -08:00
Craig Topper	ec1d1f6ae7	[X86] Use MVT instead of EVT in a couple shuffle lowering functions.	2020-03-07 09:50:53 -08:00
Reid Kleckner	65b21282c7	Avoid emitting unreachable SP adjustments after `throw` In `172eee9c`, we tried to avoid these by modelling the callee as internally resetting the stack pointer. However, for the majority of functions with reserved stack frames, this would lead LLVM to emit extra SP adjustments to undo the callee's internal adjustment. This lead us to fix the problem further on down the pipeline in eliminateCallFramePseudoInstr. In `5b79e603d3`, I added use a heuristic to try to detect when the adjustment would be unreachable. This heuristic is imperfect, and when exception handling is involved, it fails to fire. The new test is an example of this. Simply throwing an exception with an active cleanup emits dead SP adjustments after the throw. Not only are they dead, but if they were executed, they would be incorrect, so they are confusing. This change essentially reverts `172eee9c` and makes the `5b79e603d3` heuristic responsible for preventing unreachable stack adjustments. This means we may emit unreachable stack adjustments for functions using EH with unreserved call frames, but that is not very many these days. Back in 2016 when this change was added, we were focused on 32-bit, which we observed to have fewer reserved frames. Fixes PR45064 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D75712	2020-03-06 13:33:45 -08:00
Craig Topper	4c7c87f245	[X86] Simplify the code at the end of lowerShuffleAsBroadcast. The original code could create a bitcast from f64 to i64 and back on 32-bit targets. This was only working because getBitcast was able to fold the casts away to avoid leaving the illegal i64 type. Now we handle the scalar case directly by broadcasting using the scalar type as the element type. Then bitcasting to the final VT. This works since we ensure the scalar type is the same size as the final VT element type. No more casts to i64. For the vector case, we cast to VT or subvector of VT. And then do the broadcast. I think this all matches what we generated before, just in a more readable way.	2020-03-04 20:45:02 -08:00
Craig Topper	eadea7868f	[X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file Previously we tried to promote these to xmm/ymm/zmm by promoting in the X86CallingConv.td file. But this breaks when we run out of xmm/ymm/zmm registers and need to fall back to memory. We end up trying to create a non-sensical scalar to vector. This lead to an assertion. The new tests in avx512-calling-conv.ll all trigger this assertion. Since we really want to treat these types like we do on avx2, it seems better to promote them before the calling convention code gets involved. Except when the calling convention is one that passes the vXi1 type in a k register. The changes in avx512-regcall-Mask.ll are because we indicated that xmm/ymm/zmm types should be passed indirectly for the Win64 ABI before we go to the common lines that promoted the vXi1 types. This caused the promoted types to be picked up by the default calling convention code. Now we promote them earlier so they get passed indirectly as though they were xmm/ymm/zmm. Differential Revision: https://reviews.llvm.org/D75154	2020-03-04 15:02:32 -08:00
Craig Topper	06de426426	[X86] Directly form VBROADCAST_LOAD in lowerShuffleAsBroadcast on AVX targets. If we would emit a VBROADCAST node, we can instead directly emit a VBROADCAST_LOAD. This allows us to get rid of the special case to use an f64 load on 32-bit targets for vXi64. I believe there is more cleanup we can do later in this function, but I'll do that in follow ups.	2020-03-04 09:11:57 -08:00
Craig Topper	9284abd004	[X86] Directly form VBROADCAST_LOAD for BUILD_VECTOR of splat loads in lowerBuildVectorAsBroadcast.	2020-03-03 22:27:34 -08:00
Craig Topper	3c4e635593	[X86] Always emit an integer vbroadcast_load from lowerBuildVectorAsBroadcast regardless of AVX vs AVX2 If we go with D75412, we no longer depend on the scalar type directly. So we don't need to avoid using i64. We already have AVX1 fallback patterns with i32 and i64 scalar types so we don't need to avoid using integer types on AVX1. Differential Revision: https://reviews.llvm.org/D75413	2020-03-03 10:39:11 -08:00
Craig Topper	56cd3bc209	[X86] Directly emit VBROADCAST_LOAD from constant pool in lowerBuildVectorAsBroadcast Also add a DAG combine to combine different sized broadcasts from constant pool to avoid a regression. Differential Revision: https://reviews.llvm.org/D75412	2020-03-03 10:39:10 -08:00
Craig Topper	68aeaab888	[X86] Don't count the chain uses when forming broadcast loads in lowerBuildVectorAsBroadcast. The build_vector needs to be the only user of the data, but the chain will likely have another use. So we can't make sure the build_vector is the only user of the node.	2020-03-03 08:41:31 -08:00
Craig Topper	2f4f8fcf64	[X86] Don't add DELETED_NODES to DAG combine worklist after calling SimplifyDemandedBits/SimplifyDemandedVectorElts. These AddToWorklist calls were added in `84cd968f75`. It's possible the SimplifyDemandedBits/SimplifyDemandedVectorElts triggered CSE that deleted N. Detect that and avoid adding N to the worklist. Fixes PR45067.	2020-03-01 00:06:32 -08:00
Craig Topper	f2d45e5097	[X86] Canonicalize (bitcast (vbroadcast_load)) so that the cast and vbroadcast_load are both integer or fp. Helps a little with some isel pattern matching. Especially on 32-bit targets where we sometimes use f64 loads.	2020-02-28 15:07:49 -08:00
Craig Topper	b68eeff05c	[X86] Cleanup a comment around bitcasting X86ISD::VBROADCAST_LOAD and add an assert to make sure memory VT size doesn't change.	2020-02-28 15:07:49 -08:00
Craig Topper	c0d0e6b198	[X86] Recognize CVTPH2PS from STRICT_FP_EXTEND This should avoid scalarizing the cvtph2ps intrinsics with D75162 Differential Revision: https://reviews.llvm.org/D75304	2020-02-28 10:19:57 -08:00
Simon Pilgrim	f90cc633de	Fix cppcheck definition/declaration arg mismatch warnings. NFCI.	2020-02-27 14:35:20 +00:00
Simon Pilgrim	fe6bcfaf3b	[X86] Use Subtarget.useSoftFloat() in X86TargetLowering constructor Avoid use of X86TargetLowering::useSoftFloat() in the constructor as its a virtual function	2020-02-27 14:35:20 +00:00
Simon Pilgrim	e61e7f0794	Fix shadow variable warning. NFC.	2020-02-27 14:23:05 +00:00
Simon Pilgrim	dc7ac563ac	Fix shadow variable warnings. NFC.	2020-02-27 14:21:30 +00:00
Simon Pilgrim	efe2f59ec4	[X86] LowerMSCATTER/MGATHER - reduce scope of MaskVT. NFCI. Fixes cppcheck warning.	2020-02-27 14:20:44 +00:00
Simon Pilgrim	fabe52a741	Fix uninitialized variable warning. NFC.	2020-02-27 14:20:43 +00:00
Simon Pilgrim	6bdd63dc28	[X86] createVariablePermute - handle case where recursive createVariablePermute call fails Account for the case where a recursive createVariablePermute call with a wider vector type fails. Original test case from @craig.topper (Craig Topper)	2020-02-27 13:52:31 +00:00
Craig Topper	82a21c1655	[X86] Add proper MachinePointerInfo to stack store created in LowerWin64_i128OP.	2020-02-26 16:55:24 -08:00
Craig Topper	870363a22d	[X86] Explicitly pass Destination VT and debug location to BuildFILD. NFC We'd already passed most everything else. Might was well pass these two things and stop passing Op.	2020-02-26 16:26:46 -08:00
Craig Topper	15e2831fcd	[X86] Explicitly pass Pointer, MachinePointerInfo and Alignment to BuildFILD. Previously this code was called into two ways, either a FrameIndexSDNode was passed in StackSlot. Or a load node was passed in the argument called StackSlot. This was determined by a dyn_cast to FrameIndexSDNode. In the case of a load, we had to go find the real pointer from operand 0 and cast the node to MemSDNode to find the pointer info. For the stack slot case, the code assumed that the stack slot was perfectly aligned despite not being the creator of the slot. This commit modifies the interface to make the caller responsible for passing all of the required information to avoid all the guess work and reverse engineering. I'm not aware of any issues with the original code after an earlier commit to fix the alignment of one of the stack objects. This is just clean up to make the code less surprising.	2020-02-26 16:26:26 -08:00
Craig Topper	77d9b7b2cd	[X86] Query constant pool object alignment instead of hardcoding.	2020-02-26 14:45:39 -08:00
Craig Topper	9c1a707ba3	[X86] Use proper alignment for stack temporary and correct MachinePointerInfo for stack accesses in LowerUINT_TO_FP.	2020-02-26 14:45:38 -08:00
Craig Topper	a8186935ae	[X86] Use correct MachineMemOperand for stack load in LowerFLT_ROUNDS_	2020-02-26 14:45:38 -08:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Craig Topper	9238dfb4d8	[X86] Remove mask output from X86 gather/scatter ISD opcodes. Instead add it when we make the machine nodes during instruction selections. This makes this ISD node closer to ISD::MGATHER. Trying to see if we remove the X86 specific ones.	2020-02-24 23:56:28 -08:00
Simon Pilgrim	daac8dba77	[X86] combineX86ShuffleChain - select X86ISD::FAND/ISD::AND based on MaskVT Noticed by inspection, we shouldn't use FloatDomain directly, we've already bitcast both inputs to MaskVT so select the opcode using that.	2020-02-24 18:24:44 +00:00
Simon Pilgrim	59d8d13c7b	[X86] getTargetShuffleInputs - check that the source inputs are all the right size. I'm hoping to begin improving shuffle combining across different vector sizes, but before that we must ensure that all existing getTargetShuffleInputs calls must bail if the inputs aren't the same size.	2020-02-24 16:26:10 +00:00
Craig Topper	7a7146cf72	[X86] When creating X86ISD::MGATHER nodes from AVX2 gather intrinsics, cast the mask to integer type. The gather intrinsics use a floating point mask when the result type is FP. But we call DemandedBits on the mask assuming its an integer type. We also use integer types when we create it from generic IR. So add a bitcast to the intrinsic path to guarantee the integer type.	2020-02-23 23:00:41 -08:00
Craig Topper	5a70518660	[X86] Remove most X86 specific subclasses of MemSDNode. Just use a MemIntrinsicSDNode as we usually do. Leave the gather/scatter subclasses, but make them inherit from MemIntrinsicSDNode and delete their constructor and destructor. This way we can still have the getIndex, getMask, etc. convenience functions.	2020-02-23 15:13:32 -08:00
Craig Topper	15b6aa7448	[X86] Enable the use of movlps for i64 atomic load on 32-bit targets with sse1. Still a little room for improvement by using movlps to store to the stack temporary needed to move data out of the xmm register after the load.	2020-02-23 15:11:38 -08:00
Craig Topper	2a10f8019d	[X86] Use FIST for i64 atomic stores on 32-bit targets without SSE.	2020-02-23 15:11:38 -08:00
Craig Topper	84cd968f75	[X86] Add AddToWorklist(N) after calls to SimplifyDemandedBits/SimplifyDemandedVectorElts that are called on an operand of N. If a simplication occurs the operand will be added to the worklist. But since the demanded mask was based on N, we need to make sure we revisit N in case there are more simplifications to be done. Returning SDValue(N, 0) as we do, only tells DAG combine that something changed, but that won't make it add anything to the worklist. Found while playing around with using VEXTRACT_STORE in more cases. But I guess this doesn't affect any of our existing tests.	2020-02-22 21:42:59 -08:00
Craig Topper	bdb1729c83	[X86] Teach EltsFromConsecutiveLoads that it's ok to form a v4f32 VZEXT_LOAD with a 64 bit memory size on SSE1 targets. We can use MOVLPS which will load 64 bits, but we need a v4f32 result type. We already have isel patterns for this. The code here is a little hacky. We can probably improve it with more isel patterns.	2020-02-22 18:50:52 -08:00
Craig Topper	e7a184fc7c	[X86] Use movlps for i64 atomic stores on 32-targets with sse1. This is similar to using movd which we do for sse2 targets. I've added a DAG combine for VEXTRACT_STORE to use SimplifyDemandedVectorElts to clean up some artifacts from type legalization.	2020-02-22 18:22:47 -08:00
Craig Topper	228a2bc9b7	[X86] Teach combineCVTPH2PS to shrink v8i16 loads when the output type is v4f32. Remove extra isel patterns. Similar to what do for other operations that use a subset of bits. Allows us to remove a pattern that shrinks a load. Which was incorrect if the load was volatile.	2020-02-21 18:11:07 -08:00
Nikita Popov	c90ea87cfd	[X86] Fix SDLoc initialization Fixes -Wparentheses warning, in this case indicating a genuine bug.	2020-02-21 18:26:05 +01:00
Craig Topper	97f11600e0	[X86] Don't bother avoiding illegal FCMOVs if we don't have the cmov subtarget feature. We'll be forced to emit branches so we might as well use the most direct condition.	2020-02-21 00:34:15 -08:00
Craig Topper	263bef2bbc	[X86] Make combineCMov not create unsupported FCMOVs when f32/f64 are using X87. This makes the behavior consistent with what's in LowerSELECT.	2020-02-21 00:34:15 -08:00
Craig Topper	4576606831	[X86] Remove unnecessary isNullConstant in LowerSelect. NFC At this point in the code we know that Op1 or Op2 is all ones. Y points to the other operand. In the case that Op2 is zero, Op1 must be all ones and Y is Op2. The OR ORs Y into Res. But if Y is 0 the OR will be folded away by getNode so we don't need to check for it.	2020-02-20 21:41:13 -08:00
Craig Topper	78be618717	[X86] Add CMOV_VR64 pseudo instruction for MMX. Remove mmx handling from combineSelect. The combineSelect code was casting to i64 without any check that i64 was legal. This can break after type legalization. It also required splitting the mmx register on 32-bit targets. It's not clear that this makes sense. Instead switch to using a cmov pseudo like we do for XMM/YMM/ZMM.	2020-02-20 20:30:56 -08:00
Craig Topper	e5782377f3	[X86] Add CMOV_VK1 pseudo so we don't crash on v1i1 ISD::SELECT	2020-02-20 15:13:48 -08:00
Craig Topper	7e92769862	[X86] Expand vselect of v1i1 under avx512. We already do this for v2i1, v4i1, etc.	2020-02-20 15:13:47 -08:00
Craig Topper	b00ef8951b	[X86] Custom legalize v1i1 UADDSAT/USUBSAT/SADDSAT/UADDSAT to match v2i1/v4i1/v8i1 etc.	2020-02-20 15:13:46 -08:00
Craig Topper	d95a10a7f9	[X86] Custom legalize v1i1 add/sub/mul to xor/xor/and with avx512. We already did this for v2i1, v4i1, v8i1, etc.	2020-02-20 15:13:44 -08:00
Craig Topper	c7b54a196e	Recommit "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT"" With the correct author this time	2020-02-20 12:28:54 -08:00
Craig Topper	1d8860f90b	Revert `714265dabb` "[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT" I accidentally messed up the author on the previous commit somehow.	2020-02-20 12:28:33 -08:00
Quentin Colombet	714265dabb	[X86] Replace a bad use of MVT::getVectorVT with EVT::getVectorVT The type here isn't guaranteed to be a simple type. Fixes PR44976	2020-02-20 12:25:37 -08:00
Sanjay Patel	064cd2ecdb	[x86] allow peeking through an extract_subvector to find a splatted operand The motivating case is seen in "splat4_v8f32_load_store" and based on code in PR42024: https://bugs.llvm.org/show_bug.cgi?id=42024 (I haven't stepped through the v8i32 sibling test yet to see why that diverged.) There are other potential improvements visible like allowing scalarization or vector narrowing. Differential Revision: https://reviews.llvm.org/D74909	2020-02-20 13:59:59 -05:00
Craig Topper	0ed7a61543	[X86] Fix a -Wparentheses warning. NFC	2020-02-20 09:32:03 -08:00

1 2 3 4 5 ...

7067 Commits