llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Jordan Rupprecht	6387fa2715	[NFC] Fix typos: preceeding -> preceding llvm-svn: 354715	2019-02-23 01:28:32 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Craig Topper	62619d064d	[LegalizeTypes] Use PromoteTargetBoolean in PromoteIntOp_ADDSUBCARRY instead of reimplementing it. NFCI llvm-svn: 354710	2019-02-23 00:38:19 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Craig Topper	fa6187d230	[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads. When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits. We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results. So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR. llvm-svn: 354655	2019-02-22 07:03:25 +00:00
Craig Topper	069cf05e87	[LegalizeVectorOps] Simplify the non-byte sized load handling VectorLegalizer::ExpandLoad. NFCI Remove an if that should always be true. Merge the body of another into the only block that could make the if true. llvm-svn: 354654	2019-02-22 06:18:33 +00:00
Matt Arsenault	0280a5e143	DAG: Add helper for creating shifts with correct type llvm-svn: 354649	2019-02-22 03:38:47 +00:00
Craig Topper	be22f329a9	[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract. Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those. But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results. By promoting the input at the same time we can create only a single any_extend or truncate. There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch. llvm-svn: 354647	2019-02-22 01:49:50 +00:00
Sanjay Patel	ba5ee817e9	[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793) This fold can occur during legalization, so it can fight with promotion to the larger type. It apparently takes a special sequence and subtarget to avoid more basic simplifications that would hide the problem. But there's a bigger question raised here: why does distributeTruncateThroughAnd() even exist? It duplicates functionality from a more minimal pattern that we already have. But getting rid of this function requires some preliminary steps. https://bugs.llvm.org/show_bug.cgi?id=40793 llvm-svn: 354594	2019-02-21 16:01:48 +00:00
Clement Courbet	a0321c23e8	Re-land part of r354244 "[DAGCombiner] Eliminate dead stores to stack." This part introduces the lifetime node. llvm-svn: 354578	2019-02-21 12:59:36 +00:00
Nirav Dave	48cf37b55c	[DAGCombine] Generalize Dead Store to overlapping stores. Summary: Remove stores that are immediately overwritten by larger stores. Reviewers: courbet, rnk Reviewed By: rnk Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58467 llvm-svn: 354518	2019-02-20 21:07:50 +00:00
Craig Topper	8d9c224a8c	[SelectionDAG] Teach GetDemandedBits to look at the known zeros of the LHS when handling ISD::AND If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits. This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask. Differential Revision: https://reviews.llvm.org/D58464 llvm-svn: 354514	2019-02-20 20:52:26 +00:00
Nikita Popov	c3b496de7a	[SDAG] Support vector UMULO/SMULO Second part of https://bugs.llvm.org/show_bug.cgi?id=40442. This adds an extra UnrollVectorOverflowOp() method to SDAG, because the general UnrollOverflowOp() method can't deal with multiple results. Additionally we need to expand UMULO/SMULO during vector op legalization, as it may result in unrolling, which may need additional type legalization. Differential Revision: https://reviews.llvm.org/D57997 llvm-svn: 354513	2019-02-20 20:41:44 +00:00
Craig Topper	f4923db5a3	Revert r354498 "[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros." I accidentally committed more than just the test. llvm-svn: 354499	2019-02-20 18:47:26 +00:00
Craig Topper	f8498a615b	[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros. If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits. This can prevent GetDemandedBits from recognizing that the AND is unnecessary. llvm-svn: 354498	2019-02-20 18:45:38 +00:00
Clement Courbet	62b3b91ab2	Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores to stack." This is an NFC. llvm-svn: 354476	2019-02-20 15:45:58 +00:00
Chen Zheng	b934fce613	[NFC] add/modify wrapper function for findRegisterDefOperand(). llvm-svn: 354438	2019-02-20 07:01:04 +00:00
Nikita Popov	04e45e9311	[SDAG] Use shift amount type in MULO promotion; NFC Directly use the correct shift amount type if it is possible, and future-proof the code against vectors. The added test makes sure that bitwidths that do not fit into the shift amount type do not assert. Split out from D57997. llvm-svn: 354359	2019-02-19 17:37:55 +00:00
Clement Courbet	292291fb90	Revert r354244 "[DAGCombiner] Eliminate dead stores to stack." Breaks some bots. llvm-svn: 354245	2019-02-18 08:24:29 +00:00
Clement Courbet	57f34dbd3e	[DAGCombiner] Eliminate dead stores to stack. Summary: A store to an object whose lifetime is about to end can be removed. See PR40550 for motivation. Reviewers: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57541 llvm-svn: 354244	2019-02-18 07:59:01 +00:00
Nikita Popov	f62aeda58d	[SelectionDAG] Extract [US]MULO expansion into TL method; NFC In preparation for supporting vector expansion. Add an isPostTypeLegalization flag to makeLibCall(), because this expansion relies on the legalized form using MERGE_VALUES. Drop the corresponding variant of ExpandLibCall, which is no longer used. Differential Revision: https://reviews.llvm.org/D58006 llvm-svn: 354226	2019-02-17 17:40:47 +00:00
Nirav Dave	7875841121	[X86] Fix LowerAsmOutputForConstraint. Summary: Update Flag when generating cc output. Fixes PR40737. Reviewers: rnk, nickdesaulniers, craig.topper, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58283 llvm-svn: 354163	2019-02-15 20:01:55 +00:00
Simon Pilgrim	a22814a399	Fix 80-column limit in SimplifyDemandedBits/SimplifyDemandedVectorElts. NFCI. llvm-svn: 354152	2019-02-15 18:15:58 +00:00
Jeremy Morse	897a9f8d00	Fix an accidentally flipped pair of arguments, NFCI While rebasing a refactor in r353950 I accidentally swapped two function arguments; one is SelectionDAGBuilders "current" DebugLoc, the other is the one from the "current" debug intrinsic. They're probably always identical, but I haven't proved that yet. llvm-svn: 354019	2019-02-14 11:09:24 +00:00
Philip Reames	e4cfb7dae8	[SelectionDAG] Inline a single use helper function, and remove last non-MMO interface [NFC] For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second. llvm-svn: 353989	2019-02-13 23:01:11 +00:00
Philip Reames	41f400c948	[SelectionDAG] Kill last uses of getAtomic w/o a MMO operand [NFC] The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear. llvm-svn: 353977	2019-02-13 20:42:59 +00:00
Jeremy Morse	291713a596	[DebugInfo][DAG] Either salvage dangling debug info or emit Undef DBG_VALUEs In this patch SelectionDAG tries to salvage any dbg.values that are going to be dropped, in case they can be recovered from Values in the current BB. It also strengthens SelectionDAGs handling of dangling debug data, so that dbg.values are always emitted (as Undef or otherwise) instead of dangling forever. The motivation behind this patch exists in the new test case: a memory address (here a bitcast and GEP) exist in one basic block, and a dbg.value referring to the address is left in the 'next' block. The base pointer is live across all basic blocks. In current llvm trunk the dbg.value cannot be encoded, and it isn't even emitted as an Undef DBG_VALUE. The change is simply: if we're definitely going to drop a dbg.value, repeatedly apply salvageDebugInfo to its operand until either we find something that can be encoded, or we can't salvage any further in which case we produce an Undef DBG_VALUE. To know when we're "definitely going to drop a dbg.value", SelectionDAG signals SelectionDAGBuilder when all IR instructions have been encoded to force salvaging. This ensures that any dbg.value that's dangling after DAG creation will have a corresponding DBG_VALUE encoded. Differential Revision: https://reviews.llvm.org/D57694 llvm-svn: 353954	2019-02-13 16:33:05 +00:00
Jeremy Morse	6d3cd3b4ec	[DebugInfo][DAG] Refactor dbg.value lowering into its own method This is a pure copy-and-paste job, moving the logic for lowering dbg.value intrinsics to SDDbgValues into its own function. This is ahead of adding some more users of this logic. Differential Revision: https://reviews.llvm.org/D57697 llvm-svn: 353950	2019-02-13 15:53:10 +00:00
Jeremy Morse	a9a11aac0f	[DebugInfo][DAG] Limit special-casing of dbg.values for Arguments SelectionDAGBuilder has special handling for dbg.value intrinsics that are understood to define the location of function parameters on entry to the function. To enable this, we avoid recording a dbg.value as a virtual register reference if it might be such a parameter, so that it later hits EmitFuncArgumentDbgValue. This patch reduces the set of circumstances where we avoid recording a dbg.value as a virtual register reference, to allow more "normal" variables to be recorded that way. We now only bypass for potential parameters if: * The dbg.value operand is an Argument, * The Variable is a parameter, and * The Variable is not inlined. meaning it's very likely that the dbg.value is a function-entry parameter location. Differential Revision: https://reviews.llvm.org/D57584 llvm-svn: 353948	2019-02-13 13:37:33 +00:00
Bjorn Pettersson	ecd0960718	[SelectionDAG] Clean up comments in SelectionDAGBuilder.h. NFC Remove redundant function/variable names from doxygen comments (as suggested in https://reviews.llvm.org/D57697). llvm-svn: 353886	2019-02-12 22:11:20 +00:00
Sanjay Patel	86fac11d5a	[DAGCombiner] convert logic-of-setcc into bit magic (PR40611) If we're comparing some value for equality against 2 constants and those constants have an absolute difference of just 1 bit, then we can offset and mask off that 1 bit and reduce to a single compare against zero: and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) --> setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq https://rise4fun.com/Alive/XslKj This transform is disabled by default using a TLI hook ("convertSetCCLogicToBitwiseLogic()"). That should be overridden for AArch64, MIPS, Sparc and possibly others based on the asm shown in: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353859	2019-02-12 17:07:47 +00:00
whitequark	77ccc2eba4	[SelectionDAG] Fix return calling convention in expansion of ?MULO Summary: The SMULO/UMULO DAG nodes, when not directly supported by the target, expand to a multiplication twice as wide. In case that the resulting type is not legal, the legalizer cannot directly call the intrinsic with the wide arguments; instead, it "pre-lowers" them by splitting them in halves. rL283203 made sure that on big endian targets, the legalizer passes the argument halves in the correct order. It did not do the same for the return value halves because the existing code used a hack; it put an illegal type into DAG and hoped that nothing would break and it would be correctly lowered elsewhere. rL307207 fixed this, handling return value halves similar to how argument handles are handled, but did not take big-endian targets into account. This commit fixes the expansion on big-endian targets, such as the out-of-tree OR1K target. Reviewers: eli.friedman, vadimcn Subscribers: george-hopkins, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D45355 llvm-svn: 353854	2019-02-12 16:41:50 +00:00
Philip Reames	b6dc6eb8bb	[Statepoint Lowering] Update misleading comments about chains llvm-svn: 353800	2019-02-12 06:25:58 +00:00
Ana Pazos	9a3dc3e60b	[LegalizeTypes] Expand FNEG to bitwise op for IEEE FP types Summary: Except for custom floating point types x86_fp80 and ppc_fp128, expand Y = FNEG(X) to Y = X ^ sign mask to avoid library call. Using bitwise operation can improve code size and performance. Reviewers: efriedma Reviewed By: efriedma Subscribers: efriedma, kpn, arsenm, eli.friedman, javed.absar, rbar, johnrusso, simoncook, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, asb, llvm-commits Differential Revision: https://reviews.llvm.org/D57875 llvm-svn: 353757	2019-02-11 22:10:08 +00:00
Bjorn Pettersson	4892f06e06	[SelectionDAGBuilder] Add restrictions to EmitFuncArgumentDbgValue Summary: This patch fixes PR40587. When a dbg.value instrinsic is emitted to the DAG by using EmitFuncArgumentDbgValue the resulting DBG_VALUE is hoisted to the beginning of the entry block. I think the idea is to be able to locate a formal argument already from the start of the function. However, EmitFuncArgumentDbgValue only checked that the value that was used to describe a variable was originating from a function parameter, not that the variable itself actually was an argument to the function. So when for example assigning a local variable "local" the value from an argument "a", the assocated DBG_VALUE instruction would be hoisted to the beginning of the function, even if the scope for "local" started somewhere else (or if "local" was mapped to other values earlier in the function). This patch adds some logic to EmitFuncArgumentDbgValue to check that the variable being described actually is an argument to the function. And that the dbg.value being lowered already is in the entry block. Otherwise we bail out, and the dbg.value will be handled as an ordinary dbg.value (not as a "FuncArgumentDbgValue"). A tricky situation is when both the variable and the value is related to function arguments, but not neccessarily the same argument. We make sure that we do not describe the same argument more than once as a "FuncArgumentDbgValue". This solution works as long as opt has injected a "first" dbg.value that corresponds to the formal argument at the function entry. Reviewers: jmorse, aprantl Subscribers: jyknight, hiraditya, fedor.sergeev, dstenb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57702 llvm-svn: 353735	2019-02-11 19:23:30 +00:00
Benjamin Kramer	711950c116	Move some classes into anonymous namespaces. NFC. llvm-svn: 353710	2019-02-11 15:16:21 +00:00
Chandler Carruth	3160734af1	[CallSite removal] Migrate the statepoint GC infrastructure to use the `CallBase` class rather than `CallSite` wrappers. I pushed this change down through most of the statepoint infrastructure, completely removing the use of CallSite where I could reasonably do so. I ended up making a couple of cut-points: generic call handling (instcombine, TLI, SDAG). As soon as it hit truly generic handling with users outside the immediate code, I simply transitioned into or out of a `CallSite` to make this a reasonable sized chunk. Differential Revision: https://reviews.llvm.org/D56122 llvm-svn: 353660	2019-02-11 07:42:30 +00:00
Nikita Popov	a0e96bd56d	[CodeGen][X86] Don't scalarize vector saturating add/sub Now that we have vector support for [US](ADD\|SUB)O we no longer need to scalarize when expanding [US](ADD\|SUB)SAT. This matches what the cost model already does. Differential Revision: https://reviews.llvm.org/D57348 llvm-svn: 353651	2019-02-10 19:06:38 +00:00
Simon Pilgrim	c5744d4d69	[DAG] Add optional AllowUndefs to isNullOrNullSplat No change in default behaviour (AllowUndefs = false) llvm-svn: 353646	2019-02-10 17:42:15 +00:00
Simon Pilgrim	5a82a788a2	[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero. Differential Revision: https://reviews.llvm.org/D58009 llvm-svn: 353645	2019-02-10 17:04:00 +00:00
Sanjay Patel	2f319420f9	[TargetLowering] refactor setcc folds to fix another miscompile (PR40657) SimplifySetCC still has much room for improvement, but this should fix the remaining problem examples from: https://bugs.llvm.org/show_bug.cgi?id=40657 The initial fix for this problem was rL353615. llvm-svn: 353639	2019-02-10 14:29:57 +00:00
Sanjay Patel	7467510453	[TargetLowering] add tests to show effect of setcc sub->shift; NFC There's effectively no difference for the cases with variables. We just trade a sub for an add on those. But the case with a subtract from constant would require an extra move instruction on x86, so this looks like a reasonable generic combine. llvm-svn: 353619	2019-02-09 17:03:59 +00:00
Sanjay Patel	887ac1b38c	[TargetLowering] avoid miscompile in setcc transform (PR40657) llvm-svn: 353615	2019-02-09 15:59:02 +00:00
Nikita Popov	37bce93e36	Revert "[SelectionDAG] Extract [US]MULO expansion into TL method; NFC" This reverts commit r353611. Triggers an assertion during the libcall expansion on ARM. llvm-svn: 353612	2019-02-09 13:54:02 +00:00
Nikita Popov	7de44ed945	[SelectionDAG] Extract [US]MULO expansion into TL method; NFC In preparation for supporting vector expansion. Also drop a variant of ExpandLibCall, of which the MULO expansions were the only user. llvm-svn: 353611	2019-02-09 13:29:22 +00:00
Craig Topper	784929d045	Implementation of asm-goto support in LLVM This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563	2019-02-08 20:48:56 +00:00
Nemanja Ivanovic	92a8c36735	[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)) The sqrt case is faster and we already do this for the case where the exponent is 0.25. This adds the 0.75 case which is also not sensitive to signed zeros. Patch by Whitney Tsang (Whitney) Differential revision: https://reviews.llvm.org/D57434 llvm-svn: 353557	2019-02-08 19:50:58 +00:00
Simon Pilgrim	eb6a47a462	[TargetLowering] Use ISD::FSHR in expandFixedPointMul Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720. llvm-svn: 353546	2019-02-08 18:57:38 +00:00
Simon Pilgrim	478bb90779	[TargetLowering] Add SimplifyDemandedBits funnel shift support llvm-svn: 353539	2019-02-08 17:19:01 +00:00
Nirav Dave	97011ccce0	Revert r353416 "[DAG] Cleanup unused nodes on failed store-to-load forward combine." This cleanup causes out-of-tree crashes. llvm-svn: 353527	2019-02-08 15:21:13 +00:00
Nikita Popov	9d7e86a978	[CodeGen] Handle vector UADDO, SADDO, USUBO, SSUBO This is part of https://bugs.llvm.org/show_bug.cgi?id=40442. Vector legalization is implemented for the add/sub overflow opcodes. UMULO/SMULO are also handled as far as legalization is concerned, but they don't support vector expansion yet (so no tests for them). The vector result widening implementation is suboptimal, because it could result in a legalization loop. Differential Revision: https://reviews.llvm.org/D57639 llvm-svn: 353464	2019-02-07 21:02:22 +00:00
Simon Pilgrim	fe3ac70b18	[DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111) Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111. Differential Revision: https://reviews.llvm.org/D57754 llvm-svn: 353457	2019-02-07 20:14:43 +00:00
Nirav Dave	9332fc2e19	Revert "[DAG] Cleanup of unused node in SimplifySelectCC." Causes ASAN use-after-poison errors. llvm-svn: 353442	2019-02-07 18:31:05 +00:00
Sanjay Patel	2d4b186844	[DAGCombiner] fold add/sub with bool operand based on target's boolean contents I noticed that we are missing this canonicalization in IR: rL352515 ...and then realized that we don't get this right in SDAG either, so this has to be fixed first regardless of what we choose to do in IR. The existing fold was limited to scalars and using the wrong predicate to guard the transform. We have a boolean contents TLI query that can be used to decide which direction to fold. This may eventually lead back to the problems/question in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but it makes no difference to that yet. Differential Revision: https://reviews.llvm.org/D57401 llvm-svn: 353433	2019-02-07 17:43:34 +00:00
Nirav Dave	24e60819f6	[DAG] Cleanup of unused node in SimplifySelectCC. llvm-svn: 353428	2019-02-07 17:13:55 +00:00
Nirav Dave	4b12236f7d	[DAG] Cleanup unused node on failed SELECT Combine. llvm-svn: 353426	2019-02-07 16:57:50 +00:00
Nirav Dave	724b81087d	[DAG] Cleanup unused nodes on failed store-to-load forward combine. llvm-svn: 353416	2019-02-07 15:38:14 +00:00
Nirav Dave	b3506bf985	[DAG] Immediately cleanup unused nodes from extend-based combines. llvm-svn: 353338	2019-02-06 20:12:03 +00:00
Bjorn Pettersson	350352c8a5	[SelectionDAG] Cleanup some code comments. NFC Don't repeat the function name in some doxygen comments. (Just a minor cleanup, while testing to push from the git monorepo setup.) llvm-svn: 353317	2019-02-06 17:36:18 +00:00
Nirav Dave	e5c37958f9	[InlineAsm][X86] Add backend support for X86 flag output parameters. Allow custom handling of inline assembly output parameters and add X86 flag parameter support. llvm-svn: 353307	2019-02-06 15:26:29 +00:00
Nirav Dave	54511076d4	[SelectionDAGBuilder] Refactor Inline Asm output check. NFCI. llvm-svn: 353305	2019-02-06 15:12:46 +00:00
Clement Courbet	5a6712b633	[DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode. GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with static typing instead of runtime cast. llvm-svn: 353291	2019-02-06 12:36:17 +00:00
Clement Courbet	17b51b655e	[DAG] BaseIndexOffset: FrameIndexSDNodes with the same FrameIndex compare equal. Reviewers: niravd Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57692 llvm-svn: 353143	2019-02-05 07:36:20 +00:00
Craig Topper	d4e37afe45	[DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant Summary: If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size. This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store. This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size. I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain. Reviewers: efriedma Reviewed By: efriedma Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57616 llvm-svn: 353124	2019-02-05 00:22:23 +00:00
Leonard Chan	68d428e578	[Intrinsic] Unsigned Fixed Point Multiplication Intrinsic Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55625 llvm-svn: 353059	2019-02-04 17:18:11 +00:00
Simon Pilgrim	a536b89fe0	[DAGCombine] Add ADD(SUB,SUB) combines Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case. We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage. llvm-svn: 353044	2019-02-04 13:44:49 +00:00
Clement Courbet	1bb0e5ccfb	[SelectionDAG] Add a BaseIndexOffset::print() method for debugging. llvm-svn: 353028	2019-02-04 09:30:43 +00:00
Simon Pilgrim	bd42f97946	[SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI. We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts....... I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary. llvm-svn: 352961	2019-02-02 17:35:06 +00:00
Mandeep Singh Grang	70d484d94e	[COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present. Reviewers: rnk, efriedma, ssijaric, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D57183 llvm-svn: 352923	2019-02-01 21:41:33 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Sanjay Patel	6502b1444d	[SDAG] improve variable names; NFC The version of FoldConstantArithmetic() that takes arbitrary nodes was confusingly naming those nodes as constants when they might not be; also "Cst" reads like "Cast". llvm-svn: 352884	2019-02-01 16:06:53 +00:00
Sanjay Patel	0279b5b0b8	[TargetLowering] try harder to determine undef elements of vector binops This might be the start of tracking all vector element constants generally if we take it to its logical conclusion, but let's stop here and make sure this is correct/beneficial so far. The affected tests require a convoluted path before they get simplified currently because we don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands directly in SimplifyDemandedVectorElts(). That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So something like this is happening: 1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that originates from a shuffle. 2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes. 3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again. 4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing those to become full undef constants. 5. Simplify the binop now that it has a full undef operand. As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution. We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to track NaN for FP ops too (assuming we don't have fast-math-flags set). Differential Revision: https://reviews.llvm.org/D57066 llvm-svn: 352880	2019-02-01 15:35:12 +00:00
Alex Bradbury	32b77383ec	[SelectionDAG] Support promotion of the FPOWI integer operand For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the integer operand of ISD::FPOWI. As this is a signed value, this should be sign-extended. This patch enables all tests in test/CodeGen/RISCVfloat-intrinsics.ll for RV64, as prior to this patch that file couldn't be compiled for RV64 due to an assertion when performing codegen for fpowi. Differential Revision: https://reviews.llvm.org/D54574 llvm-svn: 352832	2019-02-01 03:46:28 +00:00
Guozhi Wei	0bed9e0453	[DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT This patch fixes pr39098. For the attached test case, CombineZExtLogicopShiftLoad can optimize it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 But later visitANDLike transforms it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t61: i32 = truncate t57 t63: i32 = srl t61, Constant:i8<1> t64: i32 = and t63, Constant:i32<524287> t65: i64 = zero_extend t64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop. Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free. Differential Revision: https://reviews.llvm.org/D57491 llvm-svn: 352792	2019-01-31 20:46:42 +00:00
Nirav Dave	4061b44057	[DAG] Aggressively cleanup dangling node in CombineZExtLogicopShiftLoad. While dangling nodes will eventually be pruned when they are considered, leaving them disables combines requiring single-use. Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D57520 llvm-svn: 352784	2019-01-31 19:35:14 +00:00
Leonard Chan	ae527ac603	[Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments r zero scale SMULFIX, expand into MUL which produces better code for X86. For vector arguments, expand into MUL if SMULFIX is provided with a zero scale. Otherwise, expand into MULH[US] or [US]MUL_LOHI. Differential Revision: https://reviews.llvm.org/D56987 llvm-svn: 352783	2019-01-31 19:15:37 +00:00
Sjoerd Meijer	f7cc34cae8	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736	2019-01-31 08:07:30 +00:00
Thomas Lively	9510adafe6	[LegalizeVectorTypes] Allow illegal indices when splitting extract_vector_elt Summary: Fixes PR40267, in which the removed assertion was triggering on perfectly valid IR. As far as I can tell, constant out of bounds indices should be allowed when splitting extract_vector_elt, since they will simply be propagated as out of bounds indices in the resulting split vector and handled appropriately elsewhere. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya Differential Revision: https://reviews.llvm.org/D57471 llvm-svn: 352702	2019-01-31 00:35:37 +00:00
Craig Topper	49c4c68919	[LegalizeTypes] Use report_fatal_error instead of llvm_unreachable in the default case of some type legalization handlers that can be reached with intrinsics with result or operands that aren't legal types. These can be triggered by mistakenly using a 64-bit mode only intrinsics with a -mtriple=i686. Using report_fatal_error gives a better experience for this mistake in release builds instead of probably crashing. We already do this for some of the vector type legalization handles. llvm-svn: 352699	2019-01-31 00:04:48 +00:00
Sanjay Patel	9ab23101a8	[DAGCombiner] sub X, 0/1 --> add X, 0/-1 This extends the existing transform for: add X, 0/1 --> sub X, 0/-1 ...to allow the sibling subtraction fold. This pattern could regress with the proposed change in D57401. llvm-svn: 352680	2019-01-30 22:41:35 +00:00
Heejin Ahn	d6f487863d	[WebAssembly] Exception handling: Switch to the new proposal Summary: This switches the EH implementation to the new proposal: https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md (The previous proposal was https://github.com/WebAssembly/exception-handling/blob/master/proposals/old/Exceptions.md) - Instruction changes - Now we have one single `catch` instruction that returns a except_ref value - `throw` now can take variable number of operations - `rethrow` does not have 'depth' argument anymore - `br_on_exn` queries an except_ref to see if it matches the tag and branches to the given label if true. - `extract_exception` is a pseudo instruction that simulates popping values from wasm stack. This is to make `br_on_exn`, a very special instruction, work: `br_on_exn` puts values onto the stack only if it is taken, and the # of values can vay depending on the tag. - Now there's only one `catch` per `try`, this patch removes all special handling for terminate pad with a call to `__clang_call_terminate`. Before it was the only case there are two catch clauses (a normal `catch` and `catch_all` per `try`). - Make `rethrow` act as a terminator like `throw`. This splits BB after `rethrow` in WasmEHPrepare, and deletes an unnecessary `unreachable` after `rethrow` in LateEHPrepare. - Now we stop at all catchpads (because we add wasm `catch` instruction that catches all exceptions), this creates new `findWasmUnwindDestinations` function in SelectionDAGBuilder. - Now we use `br_on_exn` instrution to figure out if an except_ref matches the current tag or not, LateEHPrepare generates this sequence for catch pads: ``` catch block i32 br_on_exn $__cpp_exception end_block extract_exception ``` - Branch analysis for `br_on_exn` in WebAssemblyInstrInfo - Other various misc. changes to switch to the new proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D57134 llvm-svn: 352598	2019-01-30 03:21:57 +00:00
Sanjay Patel	a61d586f74	[DAGCombiner] fold extract_subvector of extract_subvector This is the sibling fold for insert-of-insert that was added with D56604. Now that we have x86 shuffle narrowing (D57156), this change shows improvements for lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise). There's a small regression in some of the partial-permute tests (extracting followed by splat). That is tracked by PR40500: https://bugs.llvm.org/show_bug.cgi?id=40500 Differential Revision: https://reviews.llvm.org/D57336 llvm-svn: 352528	2019-01-29 19:13:39 +00:00
James Y Knight	5d71fc5d7b	Adjust documentation for git migration. This fixes most references to the paths: llvm.org/svn/ llvm.org/git/ llvm.org/viewvc/ github.com/llvm-mirror/ github.com/llvm-project/ reviews.llvm.org/diffusion/ to instead point to https://github.com/llvm/llvm-project. This is not a trivial substitution, because additionally, all the checkout instructions had to be migrated to instruct users on how to use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of checking out various projects into various subdirectories. I've attempted to not change any scripts here, only documentation. The scripts will have to be addressed separately. Additionally, I've deleted one document which appeared to be outdated and unneeded: lldb/docs/building-with-debug-llvm.txt Differential Revision: https://reviews.llvm.org/D57330 llvm-svn: 352514	2019-01-29 16:37:27 +00:00
Nirav Dave	1527c0e727	[SelectionDAGBuilder] Remove redundant variable. NFCI. llvm-svn: 352506	2019-01-29 15:14:07 +00:00
Ayonam Ray	a1f6973ade	Reversing the checkin for version 352484 as tests are failing. llvm-svn: 352504	2019-01-29 15:00:50 +00:00
Ayonam Ray	4272af9b3e	[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review ID: D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 352484	2019-01-29 12:01:32 +00:00
Jeremy Morse	66ac86b58d	[DebugInfo][DAG] Process FrameIndex dbg.values unconditionally A FrameIndex should be valid throughout a block regardless of what instructions get selected in that block -- therefore we shouldn't harness dbg.values that refer to FrameIndexes to an SDNode. There are numerous codegen reasons why an SDNode never appears or doesn't become a location that a DBG_VALUE can refer to. None of them actually affect the variable location. Therefore, before any other tests to encode dbg_values in a SelectionDAG, identify FrameIndex operands and encode them unattached to any SDNode. Differential Revision: https://reviews.llvm.org/D57328 llvm-svn: 352467	2019-01-29 09:40:05 +00:00
Craig Topper	390ac61b93	Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This did not cause the buildbot failure it was previously reverted for. Original commit message: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. llvm-svn: 352433	2019-01-28 21:38:47 +00:00
Nikita Popov	8e1a464e6a	[CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409	2019-01-28 19:19:09 +00:00
Michael Berg	685d5f675e	[NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target control llvm-svn: 352396	2019-01-28 18:03:08 +00:00
Jeremy Morse	8ebffb4b82	[DebugInfo][DAG] Avoid re-ordering of DBG_VALUEs This patch improves the placement of DBG_VALUEs when by SelectionDAG, which as documented in PR40427 can go very wrong. At the core of this is ProcessSourceNode, which assumes the last instruction in a BB is the start of the last processed IR instruction, which isn't always true. Instead, use a helper function to call InstrEmitter::EmitNode, that records before-and-after iterators and determines the first of any new instruction created during emission. This is passed to ProcessSourceNode, which can then make more elightened decisions about ordering for DBG_VALUE placement. Differential revision: https://reviews.llvm.org/D57163 llvm-svn: 352350	2019-01-28 12:08:31 +00:00
Craig Topper	58e6b37e62	Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This might be breaking an lldb windows buildbot. llvm-svn: 352268	2019-01-26 02:44:58 +00:00
Craig Topper	b1d3457c03	[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer Summary: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization. On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57186 llvm-svn: 352255	2019-01-26 00:26:37 +00:00
James Y Knight	2c36240a82	Fix emission of _fltused for MSVC. It should be emitted when any floating-point operations (including calls) are present in the object, not just when calls to printf/scanf with floating point args are made. The difference caused by this is very subtle: in static (/MT) builds, on x86-32, in a program that uses floating point but doesn't print it, the default x87 rounding mode may not be set properly upon initialization. This commit also removes the walk of the types pointed to by pointer arguments in calls. (To assist in opaque pointer types migration -- eventually the pointee type won't be available.) That latter implies that it will no longer consider a call like `scanf("%f", &floatvar)` as sufficient to emit _fltused on its own. And without _fltused, `scanf("%f")` will abort with error R6002. This new behavior is unlikely to bite anyone in practice (you'd have to read a float, and do nothing with it!), and also, is consistent with MSVC. Differential Revision: https://reviews.llvm.org/D56548 llvm-svn: 352076	2019-01-24 18:34:00 +00:00
Nirav Dave	58e9833e98	[SelectionDAGBuilder] Simplify HasSideEffect calculation. NFC. llvm-svn: 352067	2019-01-24 17:56:03 +00:00
Nirav Dave	b41a198472	[InlineAsm] Don't calculate registers for inline asm memory operands. NFCI. llvm-svn: 352066	2019-01-24 17:47:18 +00:00
Simon Pilgrim	2f018de6a3	[TargetLowering] Rename getExpandedFixedPointMultiplication to expandFixedPointMul. NFCI. Match the (much shorter) name used in various legalization methods. llvm-svn: 352056	2019-01-24 15:46:54 +00:00
Nirav Dave	bd069f424f	[SelectionDAGBuilder] Fuse inline asm input operand loops passes. NFCI. llvm-svn: 352053	2019-01-24 15:15:32 +00:00
Craig Topper	1e718429c1	[X86] Update SelectionDAGDumper to print the extension type and expanding flag for masked loads. Add truncating and compressing for masked stores. llvm-svn: 352029	2019-01-24 07:51:34 +00:00
Sam Parker	9a2a89d58f	[DAGCombine] Enable more pre-indexed stores The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933	2019-01-23 09:11:49 +00:00
Craig Topper	f0eac9f247	[LegalizeTypes] Add debug prints to the top of PromoteFloatOperand and PromoteFloatResult. Also add debug prints in the default case of the switches in these routines. Most if not all of the type legalization handlers already do this so this makes promoting floats consistent llvm-svn: 351890	2019-01-22 22:33:55 +00:00
Nirav Dave	d0418341fd	[SelectionDAGBuilder] Defer C_Register Assignments to be in line with those of C_RegisterClass. NFCI. llvm-svn: 351854	2019-01-22 18:57:49 +00:00
Matt Arsenault	a5840c3c39	Codegen support for atomicrmw fadd/fsub llvm-svn: 351851	2019-01-22 18:36:06 +00:00
Sanjay Patel	effee52c59	[DAGCombiner] narrow vector binop with 2 insert subvector operands vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825	2019-01-22 14:24:13 +00:00
Sanjay Patel	e713c47d49	[DAGCombiner] fix crash when converting build vector to shuffle The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753	2019-01-21 17:30:14 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Bjorn Pettersson	d4023bd2cb	[SelectionDAG] Updates for -dag-dump-verbose Summary: This patch makes some changes related to -dag-dump-verbose. Main use case has been when debugging how SelectionDAG is dealing with debug info (SDDbgValue nodes). 1) We now print the number of DbgValues that are mapped to each SDNode. 2) Removed duplicated printing of DebugLoc (nowadays DebugLoc is printed also when not using -dag-dump-verbose). 3) Renamed SDDbgValue::dump to SDDbgValue::print, and added a new SDDbgValue::dump that will start a new line after calling print. 4) SDDbgValue::print now prints "Order", and it also prints some additional information when kind is CONST/FRAMEIX/VREG. 5) SelectionDAG::dump() now dumps all SDDbgValue nodes after the list of SDNodes (both "regular" and "ByVal" SDDbgValue:s). Invalidated nodes are not printed. 6) Prohibit inline printing of SDNode operands that has SDDbgValue nodes associated to them. Reviewers: jmorse, aprantl Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56793 llvm-svn: 351581	2019-01-18 20:06:13 +00:00
Florian Hahn	dc4e154720	[SelectionDAG] Split very large token factors for chained stores to 64k chunks. Similar to D55073. Without this change, the DAG combiner crashes on code with more than 64k of stores in a single basic block that form parallelizable chains. No test case, as it would be very IR file. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D56740 llvm-svn: 351571	2019-01-18 18:37:38 +00:00
Nirav Dave	0a45bf0e55	[SelectionDAGBuilder] Cleanup InlineAsm Output generation. NFCI. Defer inline asm's output fixup work until after we've generated the inline asm node itself. Remove StoresToEmit, IndirectStoresToEmit, and RetValRegs in favor of using ConstraintOperands. llvm-svn: 351558	2019-01-18 15:57:13 +00:00
Florian Hahn	d2c733b429	[SelectionDAG] Add getTokenFactor, which splits nodes with > 64k operands. This functionality is required at multiple places which potentially create large operand lists, like SelectionDAGBuilder or DAGCombiner. Differential Revision: https://reviews.llvm.org/D56739 llvm-svn: 351552	2019-01-18 14:05:59 +00:00
Florian Hahn	1b81772328	[SelectionDAG] Add static getMaxNumOperands function to SDNode. Summary: Use this helper to make sure we use the same value at various places. This will likely be needed at more places were we currently crash because we use more operands than possible. Also makes it easier to change in the future. Reviewers: RKSimon, craig.topper, efriedma, aemerson Reviewed By: RKSimon Subscribers: hiraditya, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D56859 llvm-svn: 351537	2019-01-18 10:00:38 +00:00
Shiva Chen	e84c729aca	[ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent We should not pre-scheduled the node has ADJCALLSTACKDOWN parent, or else, when bottom-up scheduling, ADJCALLSTACKDOWN and ADJCALLSTACKUP may hold CallResource too long and make other calls can't be scheduled. If there's no other available node to schedule, the scheduler will try to rename the register by creating copy to avoid the conflict which will fail because CallResource is not a real physical register. llvm-svn: 351527	2019-01-18 08:36:06 +00:00
Matt Arsenault	0cb08e448a	Allow FP types for atomicrmw xchg llvm-svn: 351427	2019-01-17 10:49:01 +00:00
Mandeep Singh Grang	33c49c0c82	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370	2019-01-16 19:52:59 +00:00
Jeremy Morse	7dcea5ae3b	[DebugInfo] Allow creation of DBG_VALUEs in blocks where the operand is not used dbg.value intrinsics can appear in blocks where their operand is not used, meaning the operand never receives an SDNode, and thus no DBG_VALUE will be created. Get around this by looking to see whether the operand has already been allocated a virtual register. This allows dbg.values of Phi node and Values that are used across basic blocks to successfully be translated into DBG_VALUEs. Differential Revision: https://reviews.llvm.org/D56678 llvm-svn: 351358	2019-01-16 17:25:27 +00:00
Florian Hahn	e94470f1cc	[SelectionDAG] Update check in createOperands to reflect max() is a valid value. The value returned by max() is the last valid value, adjust the comparison accordingly. The code added in D55073 creates TokenFactors with max() operands. Reviewers: aemerson, efriedma, RKSimon, craig.topper Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D56738 llvm-svn: 351318	2019-01-16 10:06:04 +00:00
Sam Parker	dd8cd6d26b	[DAGCombine] Fix ReduceLoadWidth for shifted offsets ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310	2019-01-16 08:40:12 +00:00
Nikita Popov	d3b86b79fa	Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors" Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219	2019-01-15 18:43:41 +00:00
Nirav Dave	fcb4a492db	[SelectionDAG] Check membership of register in class for single register constraints. NFCI. Now that X86's ST(7) constraints are fixed this check can be reinstated. llvm-svn: 351207	2019-01-15 17:09:23 +00:00
Sanjay Patel	fad5bdaf95	[DAGCombiner] reduce buildvec of zexted extracted element to shuffle The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'. That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops. We aren't producing optimal shuffle code in some cases though, so the patch is limited to reduce regressions. Differential Revision: https://reviews.llvm.org/D56281 llvm-svn: 351198	2019-01-15 16:11:05 +00:00
Nikita Popov	5885eec35a	Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors" This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129	2019-01-14 22:18:39 +00:00
Nikita Popov	8e9a8432a8	[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125	2019-01-14 21:43:30 +00:00
Nirav Dave	3badfe74a2	Reland "Refactor GetRegistersForValue. NFCI." Remove over-strictification class membership check. llvm-svn: 351074	2019-01-14 17:09:45 +00:00
Simon Pilgrim	a1bd4a6ba4	[DAGCombiner] Add (sub_sat x, x) -> 0 combine llvm-svn: 351073	2019-01-14 15:43:34 +00:00
Simon Pilgrim	fa1f518748	[DAGCombiner] Enable sub saturation constant folding llvm-svn: 351072	2019-01-14 15:28:53 +00:00
Simon Pilgrim	7fc6882374	[DAGCombiner] Add add/sub saturation undef handling Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070	2019-01-14 14:16:24 +00:00
Simon Pilgrim	cfa5f06dde	[DAGCombiner] Enable add saturation constant folding llvm-svn: 351060	2019-01-14 12:34:31 +00:00
Simon Pilgrim	67610926fc	[DAGCombiner] Add add saturation constant folding tests. Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057	2019-01-14 12:12:42 +00:00
Simon Pilgrim	3d42815cd8	[SelectionDAG] Add type sanity assertions for add/sub saturation node creation. llvm-svn: 351055	2019-01-14 11:56:59 +00:00
Simon Pilgrim	56ba1db933	[DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y) NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026	2019-01-13 22:08:26 +00:00
Simon Pilgrim	888fa8680c	Fix unused variable warning. NFCI. llvm-svn: 351025	2019-01-13 21:53:12 +00:00
Simon Pilgrim	897d4c6fe9	[DAGCombiner] Some very basic add/sub saturation combines. Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024	2019-01-13 21:50:24 +00:00
Craig Topper	4978de36e4	[LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with an assert. I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work. It calls 'getNOT' on a node that should be a CondCode. I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC? llvm-svn: 351022	2019-01-13 19:33:30 +00:00
Nikita Popov	0400e50445	[X86] Rename overly verbose method; NFC As suggested on D56636. llvm-svn: 351021	2019-01-13 16:41:26 +00:00
Sanjay Patel	625d5aef62	[DAGCombiner] fold insert_subvector of insert_subvector This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008	2019-01-12 15:12:28 +00:00
Simon Pilgrim	0d92c4debc	Use getShiftAmountTy for shift amounts. llvm-svn: 351005	2019-01-12 12:00:43 +00:00
Simon Pilgrim	ca0de0363b	[X86][AARCH64] Improve ISD::ABS support This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998	2019-01-12 09:59:32 +00:00
Pirama Arumuga Nainar	cc07dabdaa	[Legalizer] Use correct ValueType of SELECT_CC node during Float promotion Summary: When legalizing the result of a SELECT_CC node by promoting the floating-point type, use the promoted-to type rather than the original type. Fix PR40273. Reviewers: efriedma, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56566 llvm-svn: 350951	2019-01-11 18:46:02 +00:00
Martin Storsjo	114ad37c1d	Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI." This reverts commit r350841, as it actually had functional changes and broke compilation. See PR40290. llvm-svn: 350921	2019-01-11 07:31:17 +00:00
Sanjay Patel	9b368f39a9	[DAGCombiner] simplify code; NFC llvm-svn: 350844	2019-01-10 16:47:42 +00:00
Nirav Dave	cd18977add	[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI. llvm-svn: 350841	2019-01-10 16:25:47 +00:00
Nirav Dave	4817c0e46c	[SelectionDAGBuilder] Fix formatting. NFC. llvm-svn: 350839	2019-01-10 16:22:19 +00:00
Nirav Dave	57f2c14860	[SelectionDAGBuilder] Refactor visitInlineAsm. NFC. llvm-svn: 350837	2019-01-10 16:18:18 +00:00
James Y Knight	62df5eed16	[opaque pointer types] Remove some calls to generic Type subtype accessors. That is, remove many of the calls to Type::getNumContainedTypes(), Type::subtypes(), and Type::getContainedType(N). I'm not intending to remove these accessors -- they are useful/necessary in some cases. However, removing the pointee type from pointers would potentially break some uses, and reducing the number of calls makes it easier to audit. llvm-svn: 350835	2019-01-10 16:07:20 +00:00
Stanislav Mekhanoshin	ed0d6c60af	Remove check for single use in ShrinkDemandedConstant This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 llvm-svn: 350684	2019-01-09 02:24:22 +00:00
Craig Topper	826f44b550	[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560	2019-01-07 19:30:43 +00:00
Craig Topper	57fc891c1b	[LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that should be handled. The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too. llvm-svn: 350490	2019-01-06 07:06:35 +00:00
Stanislav Mekhanoshin	35a3a3bd11	Added single use check to ShrinkDemandedConstant Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475	2019-01-05 19:20:00 +00:00
Craig Topper	cfeb1cf9af	[X86] Add INSERT_SUBVECTOR to ComputeNumSignBits This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432	2019-01-04 20:50:59 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Craig Topper	8dd7bd2cd7	[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239	2019-01-02 18:19:07 +00:00
Craig Topper	3109f3a4ab	[LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236	2019-01-02 17:58:30 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Ayonam Ray	e00606a1b2	Reversing the commit in revision 350186. Revision causes regression in 4 tests. llvm-svn: 350187	2019-01-01 07:28:55 +00:00
Ayonam Ray	c471bb2e67	Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186	2019-01-01 06:37:50 +00:00
Craig Topper	ed3ffae4a4	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits. Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179	2018-12-31 19:09:30 +00:00
Craig Topper	802c4979ae	[DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172	2018-12-31 05:40:46 +00:00
Kang Zhang	4aa6453767	[PowerPC] Fix ADDE, SUBE do not know how to promote operator Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161	2018-12-30 07:48:09 +00:00
Richard Trieu	a87b70d1db	Add vtable anchor to classes. llvm-svn: 350142	2018-12-29 02:02:13 +00:00
Justin Lebar	49fac56ea3	[NVPTX] Allow libcalls that are defined in the current module. The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069	2018-12-26 19:12:31 +00:00
Craig Topper	0229da8f07	[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ. This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059	2018-12-24 19:40:20 +00:00
George Burgess IV	610c76534f	[SelectionDAGBuilder] Use ::precise LocationSizes; NFC More migration so we can disable the implicit int -> LocationSize conversion. All of these are either scatter/gather'ed vector instructions, or direct loads. Hence, they're all precise. Perhaps if we see way more getTypeStoreSize calls, we can make a getTypeStoreLocationSize (or similar) as a wrapper that applies this ::precise. Doesn't appear that it's a good idea to make getTypeStoreSize return a LocationSize itself, however. llvm-svn: 350042	2018-12-24 05:34:21 +00:00
Sanjay Patel	93f1074677	[DAGCombiner] limit shuffle to extend transform (PR40146) It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034	2018-12-23 20:48:31 +00:00
Sanjay Patel	9933574ac3	[DAGCombiner] allow hoisting vector bitwise logic ahead of extends llvm-svn: 350032	2018-12-23 19:58:16 +00:00
Sanjay Patel	4b537aaf6d	[DAGCombiner] allow narrowing of add followed by truncate trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006	2018-12-22 17:10:31 +00:00
Sanjay Patel	47a6129e26	[DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC llvm-svn: 349958	2018-12-21 21:26:30 +00:00
Simon Pilgrim	911dce2f30	[SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907	2018-12-21 14:56:18 +00:00
Eli Friedman	b1bbd5dca3	[ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857	2018-12-20 23:39:54 +00:00
Simon Pilgrim	b208255fe0	[SelectionDAGBuilder] Enable funnel shift building to custom rotates This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations. AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK. Differential Revision: https://reviews.llvm.org/D55747 llvm-svn: 349765	2018-12-20 14:56:44 +00:00
Craig Topper	bd788ce5db	[DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand. llvm-svn: 349726	2018-12-20 05:28:06 +00:00
Simon Pilgrim	2ae3a91656	[SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629	2018-12-19 14:09:38 +00:00
Simon Pilgrim	47ff0431e9	[SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 1 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349628	2018-12-19 14:09:09 +00:00
Simon Pilgrim	6c95bea072	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091) As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625	2018-12-19 13:37:59 +00:00
Simon Pilgrim	2072b5afbe	[SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs. This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument. I've updated SelectionDAG::simplifyShift to demonstrate its use. Differential Revision: https://reviews.llvm.org/D55819 llvm-svn: 349616	2018-12-19 10:41:06 +00:00
Pete Cooper	f86db5ce9e	Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG. SelectionDAG currently changes these intrinsics to function calls, but that won't work for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming from the front-end which we can't do in SelectionDAG. llvm-svn: 349552	2018-12-18 22:20:03 +00:00
Nikita Popov	a7d2a235bb	[SelectionDAG][X86] Fix [US](ADD\|SUB)SAT vector legalization, add tests Integer result promotion needs to use the scalar size, and we need support for result widening. This is in preparation for D55787. llvm-svn: 349480	2018-12-18 13:22:53 +00:00
Simon Pilgrim	af6fbbf18b	[TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well. llvm-svn: 349466	2018-12-18 09:33:25 +00:00
Krzysztof Parzyszek	5852aa44ae	[SDAG] Clarify the origin of chain in REG_SEQUENCE in comment, NFC llvm-svn: 349391	2018-12-17 20:30:20 +00:00
Craig Topper	15b7246935	[SelectionDAG] Fix noop detection for vectors in AssertZext/AssertSext in getNode The assertion type is always supposed to be a scalar type. So if the result VT of the assertion is a vector, we need to get the scalar VT before we can compare them. Similarly for the assert above it. I don't have a test case because I don't know of any place we violate this today. A coworker found this while trying to use r347287 on the 6.0 branch without also having r336868 llvm-svn: 349390	2018-12-17 20:29:13 +00:00
JF Bastien	1811217e4d	NFC: remove unused variable D55768 removed its use. llvm-svn: 349377	2018-12-17 19:03:24 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Tim Northover	256a16d031	FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365	2018-12-17 17:25:53 +00:00
Sanjay Patel	f24900b934	[DAGCombiner] allow hoisting vector bitwise logic ahead of truncates The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303	2018-12-16 14:57:04 +00:00
Simon Pilgrim	0ef977b83d	[SelectionDAG] Add FSHL/FSHR support to computeKnownBits Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298	2018-12-16 13:33:37 +00:00
Simon Pilgrim	1e1fd9c761	[TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264	2018-12-15 11:36:36 +00:00
Krzysztof Parzyszek	6b01d35497	[SDAG] Ignore chain operand in REG_SEQUENCE when emitting instructions llvm-svn: 349186	2018-12-14 20:14:12 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Sanjay Patel	093ab45d4c	[DAGCombiner] clean up visitEXTRACT_VECTOR_ELT This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved. llvm-svn: 349095	2018-12-14 00:09:08 +00:00
Sanjay Patel	791ae69afe	[DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058	2018-12-13 17:05:01 +00:00
Sanjay Patel	c56f5728ee	revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056	2018-12-13 16:32:44 +00:00
Sanjay Patel	a7b115b392	[DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051	2018-12-13 15:44:26 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Clement Courbet	76f4ae1092	[CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 349016	2018-12-13 09:56:19 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Simon Pilgrim	f6c898e12f	[TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef). Differential Revision: https://reviews.llvm.org/D55558 llvm-svn: 348926	2018-12-12 13:43:07 +00:00
Leonard Chan	118e53fd63	[Intrinsic] Signed Fixed Point Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D54719 llvm-svn: 348912	2018-12-12 06:29:14 +00:00
Clement Courbet	8b6434bbb9	Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores." Breaks ARM/memcpy-inline.ll llvm-svn: 348844	2018-12-11 13:38:43 +00:00
Clement Courbet	93b3445770	[CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 348843	2018-12-11 13:15:56 +00:00
Simon Pilgrim	f6371f5f23	[TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction. Part of PR39689. llvm-svn: 348839	2018-12-11 11:08:40 +00:00
Simon Pilgrim	fc2c9af99c	[TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node. Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values. Differential Revision: https://reviews.llvm.org/D55511 llvm-svn: 348784	2018-12-10 18:29:46 +00:00
Simon Pilgrim	c73a955370	[DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now. llvm-svn: 348781	2018-12-10 18:18:50 +00:00
Francis Visoiu Mistrih	753efe3584	[DAGCombiner] Use the result value type in visitCONCAT_VECTORS This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759	2018-12-10 14:31:34 +00:00
Jeremy Morse	a06b163d5c	[DebugInfo] Don't drop dbg.value's of nullptr Currently, dbg.value's of "nullptr" are dropped when entering a SelectionDAG -- apparently just because of an oversight when recognising Values that are constant (see PR39787). This patch adds ConstantPointerNull to the list of constants that can be turned into DBG_VALUEs. The matter of what bit-value a null pointer constant in LLVM has was raised in this mailing list thread: http://lists.llvm.org/pipermail/llvm-dev/2018-December/128234.html Where it transpires LLVM relies on (IR) null pointers being zero valued, thus I've baked this assumption into the patch. Differential Revision: https://reviews.llvm.org/D55227 llvm-svn: 348753	2018-12-10 12:04:08 +00:00
Jeremy Morse	045c67769d	[DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out This is a fix for PR39896, where dbg.value's of SDNodes that have been optimised out do not lead to "DBG_VALUE undef" instructions being created. Such undef instructions are necessary to terminate earlier variable ranges, otherwise variable values leak past the point where they're valid. The "invalidated" flag of SDDbgValue is currently being abused to mean two things: * The corresponding SDNode is now invalid * This SDDbgValue should not be emitted Of which there are several legitimate combinations of meaning: * The SDNode has been invalidated and we should emit "DBG_VALUE undef" * The SDNode has been invalidated but the debug data was salvaged, don't emit anything for this SDDbgValue * This SDDbgValue has been emitted This patch introduces distinct "Emitted" and "Invalidated" fields to the SDDbgValue class, updates users accordingly, and generates "undef" DBG_VALUEs for invalidated records. Awkwardly, there are circumstances where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll which I've preserved. Differential Revision: https://reviews.llvm.org/D55372 llvm-svn: 348751	2018-12-10 11:20:47 +00:00
Sanjay Patel	e767bf4468	[DAGCombiner] re-enable truncation of binops This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706	2018-12-08 16:07:38 +00:00
Craig Topper	b4c96f5a32	[SelectionDAG] Remove ISD::ADDC/ADDE from some undef handling code in getNode. NFCI These nodes should have two results. A real VT and a Glue. But this code would have returned Undef which would only be a single result. But we're in the single result version of getNode so these opcodes should never be seen by this function anyway. llvm-svn: 348670	2018-12-08 00:27:34 +00:00
Pete Cooper	782a490dfb	Follow-up from r348441 to add the rest of the objc ARC intrinsics. This adds the other intrinsics used by ARC and codegen's them to their respective runtime methods. llvm-svn: 348646	2018-12-07 21:28:47 +00:00
Sanjay Patel	bc47ff86fe	[DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627	2018-12-07 18:51:08 +00:00
Sanjay Patel	3af4ae9735	[DAGCombiner] disable truncation of binops by default As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604	2018-12-07 15:47:52 +00:00
Sanjay Patel	bb796cd61c	[DAGCombiner] remove explicit calls to AddToWorkList; NFCI As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597	2018-12-07 15:00:56 +00:00
Simon Pilgrim	d498dee7a2	[SelectionDAG] Don't pass on DemandedElts when handling SCALAR_TO_VECTOR Fixes an assertion: llc: lib/CodeGen/SelectionDAG/SelectionDAG.cpp:2200: llvm::KnownBits llvm::SelectionDAG::computeKnownBits(llvm::SDValue, const llvm::APInt&, unsigned int) const: Assertion `(!Op.getValueType().isVector() \|\| NumElts == Op.getValueType().getVectorNumElements()) && "Unexpected vector size"' failed. Committed on behalf of: @pendingchaos (Rhys Perry) Differential Revision: https://reviews.llvm.org/D55223 llvm-svn: 348574	2018-12-07 09:18:44 +00:00
Sanjay Patel	c6441c8547	[DAGCombiner] use root SDLoc for all nodes created by logic fold If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552	2018-12-07 00:01:57 +00:00
Sanjay Patel	86cb679851	[DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550	2018-12-06 23:53:58 +00:00
Sanjay Patel	276cef343c	[DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC This code can still misbehave. llvm-svn: 348547	2018-12-06 23:39:28 +00:00
Sanjay Patel	70af85b0ac	[DAGCombiner] don't group bswap with casts in logic hoisting fold This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534	2018-12-06 22:10:44 +00:00
Sanjay Patel	03a3ef2a0c	[DAGCombiner] reduce indent; NFC Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523	2018-12-06 20:02:47 +00:00
Andrea Di Biagio	52a2bac583	[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef. This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522	2018-12-06 19:55:38 +00:00
Sanjay Patel	bfc7ffa40f	[DAGCombiner] don't hoist logic op if operands have other uses, part 2 The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518	2018-12-06 19:18:56 +00:00
Sanjay Patel	c3717cd0d5	[DAGCombiner] don't hoist logic op if operands have other uses The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508	2018-12-06 18:16:32 +00:00
Sanjay Patel	e9bf78fa23	[DAGCombiner] refactor function that hoists bitwise logic; NFCI Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html llvm-svn: 348501	2018-12-06 17:08:03 +00:00
Simon Pilgrim	105a366254	DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI. llvm-svn: 348494	2018-12-06 15:39:25 +00:00
Pete Cooper	e13d0992dc	Add objc.* ARC intrinsics and codegen them to their runtime methods. Reviewers: erik.pilkington, ahatanak Differential Revision: https://reviews.llvm.org/D55233 llvm-svn: 348441	2018-12-06 00:52:54 +00:00
Sanjay Patel	33a448f935	[DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383	2018-12-05 17:10:30 +00:00
Simon Pilgrim	8fdaf5c915	[TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases. llvm-svn: 348361	2018-12-05 12:20:05 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Simon Pilgrim	cd8a152b18	Remove superfluous comments. NFCI. As requested in D54698. llvm-svn: 348350	2018-12-05 10:45:44 +00:00
Simon Pilgrim	d24730cdda	[TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches. llvm-svn: 348348	2018-12-05 10:37:45 +00:00
Amara Emerson	814a6794ba	[SelectionDAG] Split very large token factors for loads into 64k chunks. There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function. rdar://45196621 Differential Revision: https://reviews.llvm.org/D55073 llvm-svn: 348324	2018-12-05 00:41:30 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Simon Pilgrim	666261cdc8	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246	2018-12-04 10:41:06 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Sanjay Patel	b205606d3e	[SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090	2018-12-02 13:48:42 +00:00
Sanjay Patel	2daceedf92	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Simon Pilgrim	e017ed3245	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073	2018-12-01 12:08:55 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Sanjay Patel	1901a12e76	[SelectionDAG] fold FP binops with 2 undef operands to undef llvm-svn: 348016	2018-11-30 18:38:52 +00:00
Than McIntosh	0e0a8a3fee	[CodeGen] Prefer static frame index for STATEPOINT liveness args Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998	2018-11-30 16:22:41 +00:00
Nicolai Haehnle	445b0b6260	TableGen/ISel: Allow PatFrag predicate code to access captured operands Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992	2018-11-30 14:15:13 +00:00
Alex Bradbury	fca95cfee9	[SelectionDAG] Support result type promotion for FLT_ROUNDS_ For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986	2018-11-30 13:18:33 +00:00
Alex Bradbury	bd24c7b045	[SelectionDAG] Support promotion of PREFETCH operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980	2018-11-30 10:06:31 +00:00
Alex Bradbury	36e0fd1d39	[SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978	2018-11-30 10:02:06 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Sanjay Patel	8d27144251	[DAGCombiner] narrow truncated binops The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917	2018-11-29 20:58:26 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00

... 3 4 5 6 7 ...

9796 Commits