llvm-project

Commit Graph

Author	SHA1	Message	Date
Liu, Chen3	bbf7860b93	add support for strict operation fpextend/fpround/fsqrt on X86 backend Differential Revision: https://reviews.llvm.org/D71184	2019-12-10 09:04:28 +08:00
Amara Emerson	84fdd9d7a5	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Craig Topper	28b573d249	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Reid Kleckner	c089f02898	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Craig Topper	8267be2995	[X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.	2019-12-05 17:54:21 -08:00
Liu, Chen3	3041434450	Add strict fp support for instructions fadd/fsub/fmul/fdiv Differential Revision: https://reviews.llvm.org/D68757	2019-12-06 09:44:33 +08:00
Craig Topper	3d43c73f26	[X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFC I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.	2019-12-04 17:58:10 -08:00
Amy Huang	9e978bb01c	Add support for lowering 32-bit/64-bit pointers Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639	2019-12-04 11:39:03 -08:00
Craig Topper	8f73a93b2d	[X86] Add support for STRICT_FP_TO_UINT/SINT from fp128.	2019-11-27 18:38:32 -08:00
Craig Topper	cfce8f2cfb	[X86] Add strict fp support for operations of X87 instructions This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857	2019-11-26 10:59:41 -08:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
Craig Topper	1b20908334	[X86] Return Op instead of SDValue() for lowering flags_read/write intrinsics Returning SDValue() means we didn't handle it and the common code should try to expand it. But its a target intrinsic so expanding won't do anything and just leave the node alone. But it will print confusing debug messages. By returning Op we tell the common code that the node is legal and shouldn't receive any further processing.	2019-11-25 23:13:30 -08:00
Craig Topper	c43b8ec735	[X86] Add support for STRICT_FP_ROUND/STRICT_FP_EXTEND from/to fp128 to/from f32/f64/f80 in 64-bit mode. These need to emit a libcall like we do for the non-strict version. 32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes. Differential Revision: https://reviews.llvm.org/D70504	2019-11-25 18:18:39 -08:00
Simon Pilgrim	5d9a259ad5	[X86][SSE] Split off generic isLaneCrossingShuffleMask helper. NFC. Avoid MVT dependency which will be needed in a future patch.	2019-11-23 12:41:03 +00:00
Craig Topper	b29e5cdb7c	[X86] Add test cases for most of the constrained fp libcalls with fp128. Add explicit setOperation actions for some to match their none strict counterparts. This isn't required, but makes the code self documenting that we didn't forget about strict fp. I've used LibCall instead of Expand since that's more explicitly what we want. Only lrint/llrint/lround/llround are missing now.	2019-11-21 18:17:59 -08:00
Craig Topper	fc4020dbbe	[X86] Mark fp128 FMA as LibCall instead of Expand. Add STRICT_FMA as well. The Expand code would fall back to LibCall, but this makes it more explicit.	2019-11-21 18:17:57 -08:00
Craig Topper	7696b99258	[LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into libcalls. Use it for fp128 on x86-64. This requires a minor hack for f32/f64 strict fadd/fsub to avoid turning those into libcalls.	2019-11-21 16:19:25 -08:00
Craig Topper	95f44cf44a	[X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to X86ISelDAGToDAG The prevents LegalizeVectorOps from scalarizing them. We'll need to remove the X86 mutation code when we add isel patterns.	2019-11-21 16:19:18 -08:00
Hiroshi Yamauchi	52e377497d	[PGO][PGSO] DAG.shouldOptForSize part. Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095	2019-11-21 14:16:00 -08:00
Craig Topper	1439059cc7	[X86] Change legalization action for f128 fadd/fsub/fmul/fdiv from Custom to LibCall. The custom code just emits a libcall, but we can do the same with generic code. The only difference is that the generic code can form tail calls where the custom code couldn't. This is responsible for the test changes. This avoids needing to modify the Custom handling for strict fp.	2019-11-21 11:44:29 -08:00
Craig Topper	27da569a7a	[X86] Fix i16->f128 sitofp to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:32 -08:00
Craig Topper	5f3bf5967b	[X86] Fix f128->i16 fptosi to promote the i16 to i32 before trying to form a libcall. Previously one of the test cases added here gave an error.	2019-11-20 17:09:31 -08:00
Craig Topper	7488c0a6f5	[X86] Mark vector STRICT_FP_ROUND as Legal instead of Custom. The Custom handler doesn't do anything for these nodes anyway. SelectionDAGISel won't mutate them if they are Legal or Custom. X86 has custom code for mutating them due to missing isel patterns. When the isel patterns are added Legal will be the right answer. So go ahead a change it now since that's where we'll end up.	2019-11-20 13:03:51 -08:00
Reid Kleckner	606a2bd621	[musttail] Don't forward AL on Win64 AL is only used for varargs on SysV platforms. Don't forward it on Windows. This allows control flow guard to set up an extra hidden parameter in RAX, as described in PR44049. This also has the effect of freeing up RAX for use in virtual member pointer thunks, which may also be a nice little code size improvement on Win64. Fixes PR44049 Reviewers: ajpaverd, efriedma, hans Differential Revision: https://reviews.llvm.org/D70413	2019-11-19 16:54:00 -08:00
Craig Topper	c4b41e8d1d	[LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted Differential Revision: https://reviews.llvm.org/D70220	2019-11-19 16:14:37 -08:00
Craig Topper	85589f8077	[X86] Add custom type legalization and lowering for scalar STRICT_FP_TO_SINT/UINT This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up. This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate. Still need to fix Promotion in the target independent code in LegalizeDAG. llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type. This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains. Differential Revision: https://reviews.llvm.org/D70214	2019-11-19 16:05:22 -08:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Simon Pilgrim	bbf4af3109	[X86][SSE] Remove XFormVExtractWithShuffleIntoLoad to prevent legalization infinite loops (PR43971) As detailed in PR43971/D70267, the use of XFormVExtractWithShuffleIntoLoad causes issues where we end up in infinite loops of extract(targetshuffle(vecload)) -> extract(shuffle(vecload)) -> extract(vecload) -> extract(targetshuffle(vecload)), there are just too many legalization checks at every stage that we can't guarantee that extract(shuffle(vecload)) -> scalarload can occur. At the moment we see a number of minor regressions as we don't fold extract(shuffle(vecload)) -> scalarload before legal ops, these can be addressed in future patches and extension of X86ISelLowering's combineExtractWithShuffle.	2019-11-19 11:55:44 +00:00
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Craig Topper	f7e9d81a8e	[X86] Don't set the operation action for i16 SINT_TO_FP to Promote just because SSE1 is enabled. Instead do custom promotion in the handler so that we can still allow i16 to be used with fp80. And f64 without sse2.	2019-11-13 14:07:56 -08:00
Craig Topper	787595b2e7	[X86] Fix typo in comment. NFC	2019-11-13 14:07:55 -08:00
Craig Topper	fee9067261	[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same !useSoftFloat block. Qualify all of the Promote actions for these with !useSoftFloat too. NFCI The Promote action doesn't apply until LegalizeDAG. By the time we get there, we would have already softened all the FP operations if useSoftFloat was true. So there wouldn't be any operation left to Promote.	2019-11-13 14:07:54 -08:00
Craig Topper	a4b7613a49	[X86] Remove setOperationAction for FP_TO_SINT v8i16. This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.	2019-11-12 22:45:52 -08:00
Craig Topper	3e1aee2ba7	[X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type. This avoids some nasty issues with argument passing and lowering of arbitrary v64i8 shuffles.	2019-11-12 14:56:02 -08:00
Craig Topper	0f04ffc073	[X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 type won't be split by prefer-vector-width=256 Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16. In reality this shouldn't happen because it means we have a 512-bit vector argument, but min-legal-vector-width says a value less than 512. But a 512-bit argument should have been factored into the preferred vector width.	2019-11-12 14:56:01 -08:00
Craig Topper	ff1504da6f	[X86] Update stale comment. NFC	2019-11-11 23:55:12 -08:00
Craig Topper	578f3b5dce	[X86] Remove setOperationAction lines that say to promote MVT::i1 MVT::i1 should be removed by type legalization before we reach any code that would act on the promote action. Mainly to avoid replicating this for strict FP versions of these operations.	2019-11-11 18:35:57 -08:00
Craig Topper	6c86d6efaf	[X86] Remove some else branches after checking for !useSoftFloat() that set operations to Expand. If we're using soft floats, then these operations shoudl be softened during type legalization. They'll never get to LegalizeVectorOps or LegalizeDAG so they don't need to be Expanded there.	2019-11-11 16:32:19 -08:00
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
Craig Topper	17eb12fa6d	[X86] Remove unused variable. NFC	2019-11-06 22:53:48 -08:00
Craig Topper	1c8460d6e1	[X86] Remove dead code from combineStore. Leftovers from before we switched to widening legalization. Fixes PR43919.	2019-11-06 22:24:47 -08:00
Craig Topper	641d2e5232	[X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits. The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922	2019-11-06 13:03:18 -08:00
Dávid Bolvanský	ca7f5becf9	[X86ISelLowering] Fixed typo in assert. NFCI.	2019-11-06 20:04:15 +01:00
Sanjay Patel	8e34dd941c	[x86] avoid crashing when splitting AVX stores with non-simple type (PR43916) The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.	2019-11-06 09:28:41 -05:00
Simon Pilgrim	37cdac6344	[X86] LowerAVXExtend - fix dodgy self-comparison assert. PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".	2019-11-06 12:50:29 +00:00
Benjamin Kramer	5f158d8e21	[X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMath	2019-11-05 21:28:41 +01:00
Philip Reames	027aa27d95	[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.	2019-11-05 11:24:27 -08:00
Benjamin Kramer	00e53d912d	[X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZeros The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.	2019-11-05 19:34:06 +01:00
Simon Pilgrim	9ad9d1531b	[X86] Convert ShrinkMode to scoped enum class. NFCI.	2019-11-04 15:35:20 +00:00
Simon Pilgrim	31ed36d044	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3	2019-11-04 11:37:57 +00:00

1 2 3 4 5 ...

6724 Commits