llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	1c8460d6e1	[X86] Remove dead code from combineStore. Leftovers from before we switched to widening legalization. Fixes PR43919.	2019-11-06 22:24:47 -08:00
Craig Topper	641d2e5232	[X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits. The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922	2019-11-06 13:03:18 -08:00
Dávid Bolvanský	ca7f5becf9	[X86ISelLowering] Fixed typo in assert. NFCI.	2019-11-06 20:04:15 +01:00
Sanjay Patel	8e34dd941c	[x86] avoid crashing when splitting AVX stores with non-simple type (PR43916) The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.	2019-11-06 09:28:41 -05:00
Simon Pilgrim	37cdac6344	[X86] LowerAVXExtend - fix dodgy self-comparison assert. PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".	2019-11-06 12:50:29 +00:00
Benjamin Kramer	5f158d8e21	[X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMath	2019-11-05 21:28:41 +01:00
Philip Reames	027aa27d95	[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.	2019-11-05 11:24:27 -08:00
Benjamin Kramer	00e53d912d	[X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZeros The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.	2019-11-05 19:34:06 +01:00
Simon Pilgrim	9ad9d1531b	[X86] Convert ShrinkMode to scoped enum class. NFCI.	2019-11-04 15:35:20 +00:00
Simon Pilgrim	31ed36d044	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3	2019-11-04 11:37:57 +00:00
Simon Pilgrim	3f087e38a2	[X86][SSE] combineX86ShufflesRecursively - at Depth==0, only resolve KnownZero if it removes an input. This stops infinite loops where KnownUndef elements are converted to Zeroable, resulting in KnownZero elements which are then simplified (via SimplifyDemandedElts etc.) back to KnownUndef elements........ Prep fix for PR43024 which will allow rL368307 to be re-applied.	2019-11-03 21:10:47 +00:00
Simon Pilgrim	8f29e4407c	[X86][SSE] combineX86ShufflesRecursively - don't bother merging shuffles with empty roots. NFCI. This doesn't affect actual codegen, but is a minor refactor toward fixing PR43024 where we need to avoid excess changes (folding zeroables etc.) to the shuffle mask at Depth == 0.	2019-11-03 17:46:00 +00:00
Simon Pilgrim	297d96bb60	Fix uninitialized variable warning. NFCI.	2019-11-03 11:15:55 +00:00
Simon Pilgrim	254b8461ac	[X86] Move computeZeroableShuffleElements before getTargetShuffleAndZeroables. NFCI. Prep work toward merging some of the functionality.	2019-11-02 13:38:35 +00:00
Craig Topper	eeeb18cd07	[X86] Change the behavior of canWidenShuffleElements used by lowerV2X128Shuffle to match the behavior in lowerVectorShuffle with regards to zeroable elements. Previously we marked zeroable elements in a way that prevented the widening check from recognizing that it could widen. Now we only mark them zeroable if V2 is an all zeros vector. This matches what we do for widening elements in lowerVectorShuffle. Fixes PR43866.	2019-11-01 13:06:03 -07:00
Simon Pilgrim	9b0dfdf5e1	[X86][AVX] Add support for and/or scalar bool reduction with AVX512 mask registers combineBitcastvxi1 only handles bitcast->MOVMSK combines, with mask registers we use BITCAST directly.	2019-11-01 17:55:31 +00:00
Simon Pilgrim	ea27d82814	[X86] isFNEG - use switch() instead of if-else tree. NFCI. In a future patch this will avoid some checks which don't need to be done for some opcodes.	2019-11-01 17:09:04 +00:00
Simon Pilgrim	a780b94cd1	[X86][SSE] Convert computeZeroableShuffleElements to emit KnownUndef and KnownZero	2019-10-31 11:21:39 +00:00
Simon Pilgrim	f25f3d39df	[X86] Add FIXME comment to merge more of computeZeroableShuffleElements and getTargetShuffleAndZeroables	2019-10-30 18:30:01 +00:00
Simon Pilgrim	94a4a2c97f	[X86][SSE] combineX86ShuffleChain - use resolveZeroablesFromTargetShuffle helper. NFCI.	2019-10-30 18:30:01 +00:00
Simon Pilgrim	81399002ae	[X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check FSHL/FSHR support Remove hard wired legality check.	2019-10-30 11:52:22 +00:00
Simon Pilgrim	26655376fe	[X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of hardwiring to MVT::i8	2019-10-30 11:52:22 +00:00
Guillaume Chatelet	119b436da1	[Alignment] Use Align for TFI.getStackAlignment() in X86ISelLowering Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034	2019-10-30 10:35:13 +01:00
David Zarzycki	f68925d450	[X86] Make memcmp vector lowering handle arbitrary expansions Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507	2019-10-30 09:12:57 +02:00
Philip Reames	2460989eab	[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219	2019-10-29 12:46:24 -07:00
Craig Topper	772533d921	[X86] Narrow i64 compares with constant to i32 when the upper 32-bits are known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499	2019-10-29 11:38:15 -07:00
Simon Pilgrim	501cf25839	[X86] Pull out combineOrShiftToFunnelShift helper. NFCI.	2019-10-29 15:29:51 +00:00
Craig Topper	3da269a248	[X86] Add a DAG combine to turn (and (bitcast (vXi1 (concat_vectors (vYi1 setcc), undef,))), C) into (bitcast (vXi1 (concat_vectors (vYi1 setcc), zero,))) The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND. With the AND removed we might be able to use a kortest on the result. Differential Revision: https://reviews.llvm.org/D69205	2019-10-28 11:27:01 -07:00
David Zarzycki	657e4240b1	[X86] Fix 48/96 byte memcmp code gen Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert it to ISD::INSERT_SUBVECTOR. https://reviews.llvm.org/D69464	2019-10-28 08:41:45 +02:00
David Zarzycki	11c920207a	[X86] Prefer KORTEST on Knights Landing or later for memcmp() PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting: KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0 Instead of: KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0 https://reviews.llvm.org/D69157	2019-10-26 21:14:57 +03:00
Craig Topper	3dd0a896b6	[X86] Add a check for SSE2 to the top of combineReductionToHorizontal. Without this, we can create a PSADBW node that isn't legal.	2019-10-25 11:11:32 -07:00
Simon Pilgrim	a4d55a2c36	[X86] combineX86ShufflesRecursively - assert the root mask is legal. NFCI.	2019-10-23 07:33:29 -07:00
Simon Pilgrim	b446356bf3	[X86][SSE] Add OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1))) -> MOVMSK+CMP reduction combine llvm-svn: 375463	2019-10-21 22:36:31 +00:00
Simon Pilgrim	7c15c4fb17	[X86] Rename matchBitOpReduction to matchScalarReduction. NFCI. This doesn't need to be just for bitops, but the ops do need to be fully associative. llvm-svn: 375445	2019-10-21 19:19:50 +00:00
Craig Topper	e78414622d	[X86] Check Subtarget.hasSSE3() before calling shouldUseHorizontalOp and emitting X86ISD::FHADD in LowerUINT_TO_FP_i64. This was a regression from r375341. Fixes PR43729. llvm-svn: 375381	2019-10-20 23:54:19 +00:00
Simon Pilgrim	10213b9073	[X86] Pulled out helper to decode target shuffle element sentinel values to 'Zeroable' known undef/zero bits. NFCI. Renamed 'resolveTargetShuffleAndZeroables' to 'resolveTargetShuffleFromZeroables' to match. llvm-svn: 375348	2019-10-19 16:58:24 +00:00
Simon Pilgrim	b5088aa944	[X86][SSE] lowerV16I8Shuffle - tryToWidenViaDuplication - undef unpack args tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles. llvm-svn: 375342	2019-10-19 13:18:02 +00:00
Simon Pilgrim	6ada70d1b5	[X86][SSE] LowerUINT_TO_FP_i64 - only use HADDPD for size/fast-hops We were always generating a single source HADDPD, but really we should only do this if shouldUseHorizontalOp says its a good idea. Differential Revision: https://reviews.llvm.org/D69175 llvm-svn: 375341	2019-10-19 11:53:48 +00:00
Simon Pilgrim	696794b66e	[X86] combineX86ShufflesRecursively - pull out isTargetShuffleVariableMask. NFCI. llvm-svn: 375253	2019-10-18 16:39:01 +00:00
David Zarzycki	7b9fd37fa1	[X86] Emit KTEST when possible https://reviews.llvm.org/D69111 llvm-svn: 375197	2019-10-18 03:45:52 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
Simon Pilgrim	50dc09dd16	[X86] combineX86ShufflesRecursively - split the getTargetShuffleInputs call from the resolveTargetShuffleAndZeroables call. Exposes an issue in getFauxShuffleMask where the OR(SHUFFLE,SHUFFLE) decode should always resolve zero/undef elements. Part of the fix for PR43024 where ideally we shouldn't call resolveTargetShuffleAndZeroables for Depth == 0 llvm-svn: 374928	2019-10-15 17:59:13 +00:00
David Zarzycki	59390efef2	[X86] Make memcmp() use PTEST if possible and also enable AVX1 llvm-svn: 374922	2019-10-15 17:40:12 +00:00
Simon Pilgrim	70778444c7	[X86] Resolve KnownUndef/KnownZero bits into target shuffle masks in helper. NFCI. llvm-svn: 374878	2019-10-15 11:13:51 +00:00
Craig Topper	b2661a2d15	[X86] Don't check for VBROADCAST_LOAD being a user of the source of a VBROADCAST when trying to share broadcasts. The only things VBROADCAST_LOAD uses is an address and a chain node. It has no vector inputs. So if its a user of the source of another broadcast that could only mean one of two things. The other broadcast is broadcasting the address of the broadcast_load. Or the source is a load and the use we're seeing is the chain result from that load. Neither of these cases make sense to combine here. This issue was reported post-commit r373871. Test case has not been reduced yet. llvm-svn: 374862	2019-10-15 06:10:11 +00:00
Craig Topper	f4d03213f3	[X86] Teach EmitTest to handle ISD::SSUBO/USUBO in order to use the Z flag from the subtract directly during isel. This prevents isel from emitting a TEST instruction that optimizeCompareInstr will need to remove later. In some of the modified tests, the SUB gets duplicated due to the flags being needed in two places and being clobbered in between. optimizeCompareInstr was able to optimize away the TEST that was using the result of one of them, but optimizeCompareInstr doesn't know to turn SUB into CMP after removing the TEST. It only knows how to turn SUB into CMP if the result was already dead. With this change the TEST never exists, so optimizeCompareInstr doesn't have to remove it. Then it can just turn the SUB into CMP immediately. Fixes PR43649. llvm-svn: 374755	2019-10-14 06:47:56 +00:00
Simon Pilgrim	11495e5acb	[X86] getTargetShuffleInputs - Control KnownUndef mask element resolution as well as KnownZero. We were already controlling whether the KnownZero elements were being written to the target mask, this extends it to the KnownUndef elements as well so we can prevent the target shuffle mask being manipulated at all. llvm-svn: 374732	2019-10-13 19:35:35 +00:00
Craig Topper	25eb219959	[X86] Enable use of avx512 saturating truncate instructions in more cases. This enables use of the saturating truncate instructions when the result type is less than 128 bits. It also enables the use of saturating truncate instructions on KNL when the input is less than 512 bits. We can do this by widening the input and then extracting the result. llvm-svn: 374731	2019-10-13 19:07:28 +00:00
Simon Pilgrim	3efafd6c38	[X86] SimplifyMultipleUseDemandedBitsForTargetNode - use getTargetShuffleInputs with KnownUndef/Zero results. llvm-svn: 374725	2019-10-13 17:03:11 +00:00
Simon Pilgrim	e4c58db8bc	[X86] getTargetShuffleInputs - add KnownUndef/Zero output support Adjust SimplifyDemandedVectorEltsForTargetNode to use the known elts masks instead of recomputing it locally. llvm-svn: 374724	2019-10-13 17:03:02 +00:00

1 2 3 4 5 ...

6684 Commits