llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Simon Pilgrim	666261cdc8	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246	2018-12-04 10:41:06 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Sanjay Patel	b205606d3e	[SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090	2018-12-02 13:48:42 +00:00
Sanjay Patel	2daceedf92	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Simon Pilgrim	e017ed3245	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073	2018-12-01 12:08:55 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Sanjay Patel	1901a12e76	[SelectionDAG] fold FP binops with 2 undef operands to undef llvm-svn: 348016	2018-11-30 18:38:52 +00:00
Than McIntosh	0e0a8a3fee	[CodeGen] Prefer static frame index for STATEPOINT liveness args Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998	2018-11-30 16:22:41 +00:00
Nicolai Haehnle	445b0b6260	TableGen/ISel: Allow PatFrag predicate code to access captured operands Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992	2018-11-30 14:15:13 +00:00
Alex Bradbury	fca95cfee9	[SelectionDAG] Support result type promotion for FLT_ROUNDS_ For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986	2018-11-30 13:18:33 +00:00
Alex Bradbury	bd24c7b045	[SelectionDAG] Support promotion of PREFETCH operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980	2018-11-30 10:06:31 +00:00
Alex Bradbury	36e0fd1d39	[SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978	2018-11-30 10:02:06 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Sanjay Patel	8d27144251	[DAGCombiner] narrow truncated binops The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917	2018-11-29 20:58:26 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Craig Topper	923f463ef2	[SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591	2018-11-26 20:16:33 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Sanjay Patel	04435677d0	[SelectionDAG] move constant or splat functions to common location rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523	2018-11-25 16:09:32 +00:00
Sanjay Patel	7e119c0400	[DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502	2018-11-23 20:05:12 +00:00
Craig Topper	0ec17884de	[LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482	2018-11-23 02:32:13 +00:00
Craig Topper	b239763384	[LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479	2018-11-22 22:56:52 +00:00
Sanjay Patel	3e80019275	[DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478	2018-11-22 19:24:10 +00:00
Sanjay Patel	20935e0ab5	[DAGCombiner] refactor select-of-FP-constants transform This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424	2018-11-21 20:54:47 +00:00
Sanjay Patel	1c74747478	[DAGCombiner] reduce code duplication; NFC llvm-svn: 347410	2018-11-21 20:00:32 +00:00
Simon Pilgrim	5448339889	[TargetLowering] SimplifyDemandedBits - only reduce known bits for integer constants Avoids fuzzing crash found by Mikael Holmén. llvm-svn: 347393	2018-11-21 14:26:19 +00:00
Nikita Popov	c5b152bdef	Test commit: Delete trailing space in comment llvm-svn: 347385	2018-11-21 10:57:22 +00:00
Sanjay Patel	357053f289	[DAGCombiner] look through bitcasts when trying to narrow vector binops This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356	2018-11-20 22:26:35 +00:00
Simon Pilgrim	3735105961	[DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313	2018-11-20 15:23:50 +00:00
Simon Pilgrim	b356d0463e	[TargetLowering] Improve SimplifyDemandedVectorElts/SimplifyDemandedBits support For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask. I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem. Differential Revision: https://reviews.llvm.org/D54679 llvm-svn: 347301	2018-11-20 12:02:16 +00:00
Craig Topper	4954c66430	[SelectionDAG] Compute known bits and num sign bits for live out vector registers. Use it to add AssertZExt/AssertSExt in the live in basic blocks Summary: We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply. This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362 Reviewers: spatel, efriedma, RKSimon, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D54725 llvm-svn: 347287	2018-11-20 04:30:26 +00:00
Sanjay Patel	a36c444471	[DAGCombiner] reduce code duplication in visitXOR; NFC llvm-svn: 347278	2018-11-20 00:51:45 +00:00
Stanislav Mekhanoshin	54ebfe8aee	Implement computeKnownBits for scalar_to_vector Differential Revision: https://reviews.llvm.org/D54728 llvm-svn: 347274	2018-11-19 23:34:07 +00:00
Simon Pilgrim	740122fb8c	[DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207) Consistently use (!LegalOperations \|\| isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255	2018-11-19 19:37:59 +00:00
Simon Pilgrim	de3605f56b	[TargetLowering] expandFP_TO_UINT - improve fp16 support As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode. I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary. Differential Revision: https://reviews.llvm.org/D54703 llvm-svn: 347251	2018-11-19 19:16:13 +00:00
Sanjay Patel	b25adf5edb	[SelectionDAG] simplify vector select with undef operand(s) llvm-svn: 347227	2018-11-19 17:06:05 +00:00
Sanjay Patel	a1dca3553e	[SelectionDAG] simplify select FP with undef condition llvm-svn: 347212	2018-11-19 14:42:28 +00:00
Sanjay Patel	c036d844be	[SelectionDAG] add simplifySelect() to reduce code duplication; NFC This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210	2018-11-19 14:35:22 +00:00
Sanjay Patel	8c0cd77bff	[DAG] add undef simplifications for select nodes Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170	2018-11-18 17:36:23 +00:00
Sanjay Patel	42c22a1f87	[SelectionDAG] simplify code; NFC llvm-svn: 347160	2018-11-18 14:39:03 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Stanislav Mekhanoshin	0ff7c8309d	DAG combiner: fold (select, C, X, undef) -> X Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110	2018-11-16 23:13:38 +00:00
Craig Topper	9e97054211	[LegalizeVectorOps] After custom legalizing an extending load or a truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests. llvm-svn: 347094	2018-11-16 21:04:58 +00:00
Simon Pilgrim	3c8baf4f90	[TargetLowering] Cleanup more of the EXTEND demanded bits cases so that they match. NFCI. Use the same variable names etc. llvm-svn: 347045	2018-11-16 12:26:26 +00:00
Sam Parker	ab99cfab21	[DAGCombine] Fix non-deterministic debug output PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037	2018-11-16 08:35:19 +00:00
Craig Topper	bac7d9735a	[LegalizeVectorTypes] Teach WidenVecRes_Convert to turn ANY_EXTEND into ANY_EXTEND_VECTOR_INREG when the input and output types need to be widened to the same width. If we don't do it here, DAGCombine will just end up creating it from the scalar any_extend+build_vector so might as well save a step. llvm-svn: 347034	2018-11-16 07:13:34 +00:00
Stanislav Mekhanoshin	35de877e8c	Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handling Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795	2018-11-13 20:26:27 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Cameron McInally	cbde0d9c7b	[IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774	2018-11-13 18:15:47 +00:00
Craig Topper	0b33b468a1	[DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728	2018-11-13 01:59:32 +00:00
Nirav Dave	a395e2df56	[DAGCombiner] Fix load-store forwarding of indexed loads. Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654	2018-11-12 14:05:40 +00:00
Craig Topper	d23cdbbeb2	[DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600	2018-11-10 23:46:03 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Craig Topper	f2e65f8636	[SelectionDAG] Fix a -Wparentheses warning from gcc in an assert. NFC gcc wants parentheses around the logical OR since there is a logical AND for the string. llvm-svn: 346564	2018-11-09 23:11:30 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Serge Guelton	86f8b70f1b	Type safe version of MachinePassRegistry Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision: https://reviews.llvm.org/D54203 llvm-svn: 346522	2018-11-09 17:19:45 +00:00
Alexandros Lamprineas	e15c982f6d	[SelectionDAG] swap select_cc operands to enable folding The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484	2018-11-09 11:09:40 +00:00
Craig Topper	8cca8bd4aa	[SelectionDAG] Assert on the width of DemandedElts argument to computeKnownBits for all vector typed operations not just build_vector. Fix AArch64 unit test that fails with the assertion added. llvm-svn: 346437	2018-11-08 20:29:17 +00:00
Nirav Dave	6ce9f72f76	[DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432	2018-11-08 19:14:20 +00:00
James Y Knight	72f76bf230	Add support for llvm.is.constant intrinsic (PR4898) This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322	2018-11-07 15:24:12 +00:00
Cameron McInally	9757d5d6c1	[FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsics Differential Revision: https://reviews.llvm.org/D53411 llvm-svn: 346141	2018-11-05 15:59:49 +00:00
Simon Pilgrim	6bd468bd8b	[TargetLowering] Begin generalizing TargetLowering::expandFP_TO_SINT support. NFCI. Prior to initial work to add vector expansion support, remove assumptions that we're working on scalar types. llvm-svn: 346139	2018-11-05 15:49:09 +00:00
Craig Topper	8f2f2a76b9	[DAGCombiner] Use tryFoldToZero to simplify some code and make it work correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119	2018-11-05 05:53:06 +00:00
Craig Topper	8d64abddd1	[DAGCombiner] Remove an unused argument from tryFoldToZero. NFC llvm-svn: 346118	2018-11-05 05:53:03 +00:00
Craig Topper	3292ea03d3	[DAGCombiner] Remove 'else' after return. NFC This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090	2018-11-04 06:56:32 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Simon Pilgrim	cdcbeb4997	[DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964	2018-11-02 11:06:18 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Craig Topper	e2483020f2	[DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908	2018-11-01 23:21:45 +00:00
Simon Pilgrim	b34a052852	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655) This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869	2018-11-01 18:22:11 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Sanjay Patel	c5fe3ce2ec	[DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842	2018-11-01 15:41:12 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Stanislav Mekhanoshin	222e9c11f7	Check shouldReduceLoadWidth from SimplifySetCC SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778	2018-10-31 21:24:30 +00:00
Scott Linder	92bb783cfe	[SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt lowerRangeToAssertZExt currently relies on something like EarlyCSE having eliminated the constant range [0,1). At -O0 this leads to an assert. Differential Revision: https://reviews.llvm.org/D53888 llvm-svn: 345770	2018-10-31 19:57:36 +00:00
Craig Topper	eeac12af6d	[SelectionDAGISel] Suppress a -Wunused-but-set-variable warning in release builds. NFC llvm-svn: 345761	2018-10-31 18:46:15 +00:00
Simon Pilgrim	077a9adb00	Fix comment typo. NFCI. llvm-svn: 345758	2018-10-31 18:19:52 +00:00
Simon Pilgrim	805cdcfe73	[SelectionDAG] SelectionDAGLegalize::ExpandBITREVERSE - ensure we use ShiftTy We should be using the getShiftAmountTy value type for shift amounts. llvm-svn: 345756	2018-10-31 18:14:14 +00:00
David Bolvansky	d0080c3a5f	[DAGCombiner] Fold 0 div/rem X to 0 Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721	2018-10-31 14:18:57 +00:00
Cameron McInally	2ad870e785	[FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM Differential Revision: https://reviews.llvm.org/D53216 llvm-svn: 345650	2018-10-30 21:01:29 +00:00
Bjorn Pettersson	fe09a20f09	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636	2018-10-30 20:16:39 +00:00
Nirav Dave	eedd2ccd02	[DAG] Add const variants for BaseIndexOffset functions. llvm-svn: 345623	2018-10-30 18:26:43 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
Sanjay Patel	680c9227ca	[SelectionDAG] fix build warning for mismatched signs in compare; NFC llvm-svn: 345598	2018-10-30 13:47:19 +00:00
Simon Pilgrim	858303b827	[SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578	2018-10-30 10:32:11 +00:00
David Bolvansky	dfdbb038e8	[DAGCombiner] Improve X div/rem Y fold if single bit element type Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575	2018-10-30 09:07:22 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Leonard Chan	905abe5b5d	[Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics Add an intrinsic that takes 2 integers and perform saturation subtraction on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53783 llvm-svn: 345512	2018-10-29 16:54:37 +00:00
Craig Topper	7a18b4bc51	[SelectionDAG] Fix bad indentation. NFC llvm-svn: 345481	2018-10-28 21:24:20 +00:00
Simon Pilgrim	3497d536f7	[TargetLowering] Move i64/vXi64 to f32/vXf32 UINT_TO_FP handling to TargetLowering::expandUINT_TO_FP. llvm-svn: 345478	2018-10-28 15:34:35 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Craig Topper	c4b785ae1e	[DAGCombiner] Better constant vector support for FCOPYSIGN. Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469	2018-10-28 01:32:49 +00:00
Simon Pilgrim	3cf33fcdd6	[TargetLowering] Move LegalizeDAG FP_TO_UINT handling to TargetLowering::expandFP_TO_UINT. NFCI. First step towards fixing PR17686 and adding vector support. llvm-svn: 345452	2018-10-27 12:15:58 +00:00
Sanjay Patel	0eddd4730f	[DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC We can extend this code to handle many more cases if an extract is cheap, so prepping for that change. llvm-svn: 345430	2018-10-26 21:32:04 +00:00
Craig Topper	7bf85f5c8d	[LegalizeTypes] Stop DAGTypeLegalizer::getSETCCWidenedResultTy from creating illegal setccs. Add checks for valid setccs The DAGTypeLegalizer::getSETCCWidenedResultTy was widening the MaskVT, but the code in convertMask called after getSETCCWidenedResultTy had no idea this widening had occurred. So none of the operands were widened when convertMask created new setccs with the widened VT. This patch removes the widening and adds some asserts to getNode to validate the types of setccs to prevent issues like this in the future. Differential Revision: https://reviews.llvm.org/D53743 llvm-svn: 345428	2018-10-26 20:59:55 +00:00
Eli Friedman	2ac1162917	[ARM] Make InstrEmitter mark CPSR defs dead for Thumb1. The "dead" markings allow existing target-independent optimizations, like MachineSink, to trigger more frequently. The CPSR defs would have eventually been marked dead by LiveVariables, so this only affects optimizations before regalloc. The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible with this change: the transform adds a use to an otherwise dead def of CPSR. This is covered by existing regression tests. thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the generated code; I'll fix it in D53452. Differential Revision: https://reviews.llvm.org/D53453 llvm-svn: 345420	2018-10-26 19:32:24 +00:00
Heejin Ahn	24faf859e5	Reland "[WebAssembly] LSDA info generation" Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 345345	2018-10-25 23:55:10 +00:00
Cameron McInally	384a74b0e6	[FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes Replacing BinaryOperator::isFNeg(...) to avoid regressions when we separate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53650 llvm-svn: 345295	2018-10-25 18:09:33 +00:00
Simon Pilgrim	f02c0f8af6	[LegalizeDAG] Remove dead SINT_TO_FP legalization code As noticed on D52965, the SINT_TO_FP i64 to f32 legalization code has been dead for years - protected by an assert. Differential Revision: https://reviews.llvm.org/D53703 llvm-svn: 345290	2018-10-25 17:43:36 +00:00
Simon Pilgrim	49d79a864c	Missing semicolon. llvm-svn: 345257	2018-10-25 11:38:17 +00:00
Simon Pilgrim	838eb24014	[TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226) As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256	2018-10-25 11:15:57 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Thomas Lively	43bc46207a	[SelectionDAG] DAG combiner for fminnan and fmaxnan Summary: Depends on D52765. Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52768 llvm-svn: 345210	2018-10-24 22:18:54 +00:00
Tim Northover	05fe8f918b	[DAG] check more operands for cycles when merging stores. Until now, we've only checked whether merging stores would cause a cycle via the value argument, but the address and indexed offset arguments are also capable of creating cycles in some situations. The addresses are all base+offset with notionally the same base, but the base SDNode may still be different (e.g. via an indexed load in one case, and an ISD::ADD elsewhere). This allows cycles to creep in if one of these sources depends on another. The indexed offset is usually undef (representing a non-indexed store), but on some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value, again allowing dependency cycles to creep in. llvm-svn: 345200	2018-10-24 21:36:34 +00:00
Simon Pilgrim	6f53b38fd4	[TargetLowering] Add SimplifyDemandedBitsForTargetNode callback Add a SimplifyDemandedBitsForTargetNode callback to handle target nodes. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345179	2018-10-24 19:00:56 +00:00
Simon Pilgrim	c8c7451063	[LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f32 expansion. Use SrcVT/DestVT types and correct shift type. Part of prep work for D52965 llvm-svn: 345158	2018-10-24 16:35:01 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Simon Pilgrim	8c4796deb4	[LegalizeDAG] Share Vector/Scalar CTPOP Expansion As suggested on D53258, this patch move the CTPOP expansion code from SelectionDAGLegalize to TargetLowering to allow it to be reused by the VectorLegalizer. Proper vector support will be added by D53258. llvm-svn: 345066	2018-10-23 18:28:24 +00:00
Simon Pilgrim	d705ba97dd	[LegalizeDAG] Share Vector/Scalar CTLZ Expansion As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. Extension to D53474 llvm-svn: 345060	2018-10-23 17:48:30 +00:00
Sanjay Patel	ad12df829c	[SelectionDAG] use 'match' to simplify code; NFC Vector types are not possible here because this code only starts matching from the scalar bool value of a conditional branch, but this is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345041	2018-10-23 15:46:10 +00:00
Benjamin Kramer	1e212e8abb	[LegalizeDAG] Remove unused variable llvm-svn: 345040	2018-10-23 15:43:36 +00:00
Simon Pilgrim	b975ff4700	[LegalizeDAG] Share Vector/Scalar CTTZ Expansion As suggested on D53258, this patch demonstrates sharing common CTTZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering. I intend to move CTLZ and (scalar) CTPOP over as well and then update D53258 accordingly. Differential Revision: https://reviews.llvm.org/D53474 llvm-svn: 345039	2018-10-23 15:37:19 +00:00
Leonard Chan	0acfc6be38	[Intrinsic] Unigned Saturation Addition Intrinsic Add an intrinsic that takes 2 integers and perform unsigned saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53340 llvm-svn: 344971	2018-10-22 23:08:40 +00:00
Craig Topper	c8e183f9ee	Recommit r344877 "[X86] Stop promoting integer loads to vXi64" I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965	2018-10-22 22:14:05 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Sanjay Patel	e439cc2745	[DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016) This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872	2018-10-21 20:13:29 +00:00
Krasimir Georgiev	547d824da6	Revert "[WebAssembly] LSDA info generation" This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639	2018-10-16 18:50:09 +00:00
Leonard Chan	699b3b54da	[Intrinsic] Signed Saturation Addition Intrinsic Add an intrinsic that takes 2 integers and perform saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53053 llvm-svn: 344629	2018-10-16 17:35:41 +00:00
Simon Pilgrim	ac58636ec5	[LegalizeDAG] ExpandLegalINT_TO_FP - cleanup UINT_TO_FP i64 -> f64 expansion. Use SrcVT/DestVT types, correct shift type and AND instead of ZERO_EXTEND_IN_REG. Part of prep work for D52965 llvm-svn: 344602	2018-10-16 10:06:15 +00:00
Heejin Ahn	0981eaab47	[WebAssembly] LSDA info generation Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575	2018-10-16 00:09:12 +00:00
Sanjay Patel	4cf1da0e02	[SelectionDAG] allow FP binops in SimplifyDemandedVectorElts This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541	2018-10-15 18:05:34 +00:00
Sanjay Patel	8bd74785f0	[DAGCombiner] allow undef elts in vector fmul matching llvm-svn: 344534	2018-10-15 16:54:07 +00:00
Sanjay Patel	89e2197c33	[DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI The transform doesn't work if the vector constant has undef elements. llvm-svn: 344532	2018-10-15 16:47:01 +00:00
Sanjay Patel	9e7e0fd828	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344528	2018-10-15 15:56:39 +00:00
Sanjay Patel	4e970ff022	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344525	2018-10-15 15:38:38 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Simon Pilgrim	a0590a4f7a	[LegalizeDAG] Don't bother with final MUL+SRL stage for byte CTPOP. The final stage of CTPOP expansion (v = (v * 0x01010101...) >> (Len - 8)) is completely pointless for the byte (Len = 8) case as it reduces to (v = (v * 0x01...) >> 0), but annoyingly this doesn't always get optimized away. Found while investigating generic vector CTPOP expansion (PR32655). llvm-svn: 344477	2018-10-14 15:56:28 +00:00
Simon Pilgrim	28a143f738	Pull out repeated variables from SelectionDAGLegalize::ExpandBitCount. The CTPOP case has been changed from VT.getSizeInBits to VT.getScalarSizeInBits - but this fits in with future work for vector support (PR32655) and doesn't affect any current (scalar) uses. llvm-svn: 344461	2018-10-13 18:40:48 +00:00
Craig Topper	189e5b4ab6	[LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy Summary: getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant. This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use. Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets. Reviewers: eli.friedman, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53232 llvm-svn: 344460	2018-10-13 17:47:20 +00:00
Simon Pilgrim	c5d7c6e5f6	[X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457	2018-10-13 16:11:15 +00:00
Simon Pilgrim	1c2051ead7	[X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453	2018-10-13 15:16:55 +00:00
Thomas Lively	16c349d892	[Intrinsic] Add llvm.minimum and llvm.maximum instrinsic functions Summary: These new intrinsics have the semantics of the `minimum` and `maximum` operations specified by the latest draft of IEEE 754-2018. Unlike llvm.minnum and llvm.maxnum, these new intrinsics propagate NaNs and always treat -0.0 as less than 0.0. `minimum` and `maximum` lower directly to the existing `fminnan` and `fmaxnan` ISel DAG nodes. It is safe to reuse these DAG nodes because before this patch were only emitted in situations where there were known to be no NaN arguments or where NaN propagation was correct and there were known to be no zero arguments. I know of only four backends that lower fminnan and fmaxnan: WebAssembly, ARM, AArch64, and SystemZ, and each of these lowers fminnan and fmaxnan to instructions that are compatible with the IEEE 754-2018 semantics. Reviewers: aheejin, dschuff, sunfish, javed.absar Subscribers: kristof.beyls, dexonsmith, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D52764 llvm-svn: 344437	2018-10-13 07:21:44 +00:00
Craig Topper	a796580903	[LegalizeVectorTypes] Use TLI.getVectorIdxTy instead of DAG.getIntPtrConstant. There's no guarantee that vector indices should use pointer types. So use the correct query method. llvm-svn: 344428	2018-10-12 22:55:17 +00:00
Craig Topper	435e38a5df	[LegalizeVectorTypes] When widening the result of a bitcast from a scalar type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs. This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup. The MIPS test change just looks to be an instruction ordering change. llvm-svn: 344422	2018-10-12 21:59:55 +00:00
Craig Topper	1bb0c6041a	[LegalizeVectorTypes] When widening the operands to a concat_vectors, see if we can use the widened operand 0 if the width matches and the other operands are undef. This saves a conversion to extracts and build_vector. We already do this when both the result and the input need to be widened to the same type. This changed the sse-intrinsics-fast-isel test because we don't lower (insert_vector_elt (scalar_to_vector X), Y, 1) well. We turn it into (vector_shuffle (scalar_to_vector X), (scalar_to_vector Y), <0, 4, 2, 3>) losing track of the fact that the upper elts could be undef. We should probably find a way to prevent the scalarization of the <2 x f32> load on these tests. llvm-svn: 344404	2018-10-12 19:37:49 +00:00
Craig Topper	05f014a684	[LegalizeVectorTypes] When unrolling in WidenVecRes_Convert, make sure we use the original vector element count. Not min of the widened result type and the possibly widened input type. If the input type is widened as well, but we still were forced to unroll, we shouldn't be considering the widened input element count. We should only create as many scalar operations as the original type called for. This will be important for an upcoming patch. llvm-svn: 344403	2018-10-12 19:37:47 +00:00
Simon Pilgrim	c046b6856e	Pull out repeated value types. NFCI. llvm-svn: 344355	2018-10-12 15:49:19 +00:00
Simon Pilgrim	b926fd7b36	Pull out repeated value types. NFCI. llvm-svn: 344354	2018-10-12 15:48:47 +00:00
Simon Pilgrim	b8339c0167	[SelectionDAG] Move VectorLegalizer::ExpandCTLZ codegen into SelectionDAGLegalize Generalize SelectionDAGLegalize's CTLZ expansion to handle vectors - lets VectorLegalizer::ExpandCTLZ to just pass the expansion on instead of repeating the same codegen. llvm-svn: 344349	2018-10-12 14:45:57 +00:00
Sanjay Patel	56b6660d2e	[DAGCombiner] rearrange extract_element+bitcast fold; NFC I want to add another pattern here that includes scalar_to_vector, so this makes that patch smaller. I was hoping to remove the hasOneUse() check because it shouldn't be necessary for common codegen, but an AMDGPU test has a comment suggesting that the extra check makes things better on one of those targets. llvm-svn: 344320	2018-10-11 23:56:56 +00:00
Nirav Dave	f1f2a2a31a	[DAG] Fix Big Endian in Load-Store forwarding Summary: Correct offset calculation in load-store forwarding for big-endian targets. Reviewers: rnk, RKSimon, waltl Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53147 llvm-svn: 344272	2018-10-11 18:28:59 +00:00
Sanjay Patel	4875662e57	[DAGCombiner] move comment closer to the corresponding code; NFC llvm-svn: 344255	2018-10-11 16:07:25 +00:00
Nirav Dave	07acc992dc	[DAGCombine] Improve Load-Store Forwarding Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142	2018-10-10 14:15:52 +00:00
Simon Pilgrim	bc7b6251b6	[TargetLowering] SimplifyDemandedBits - rename demanded mask args. NFCI. Help stop bugs like rL343935 by making the 'original' DemandedBits arg more obviously not the mask that is actually used. llvm-svn: 344138	2018-10-10 13:00:49 +00:00
Simon Pilgrim	53a503f6ac	[TargetLowering] SimplifyDemandedBits - pull out repeated getOperands. NFCI. Part of a minor cleanup to make all the switch statements more consistent prior to improving vector support. llvm-svn: 344136	2018-10-10 12:32:13 +00:00
Simon Pilgrim	5cb3a82892	[TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132	2018-10-10 10:44:15 +00:00
Nemanja Ivanovic	72d4866e57	[DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093	2018-10-09 23:20:11 +00:00
Simon Pilgrim	23f880317a	[SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG and CONCAT_VECTORS support to SimplifyDemandedBits Fix for AVX1 masked load/store regression on D52964 llvm-svn: 344043	2018-10-09 13:13:35 +00:00
Sanjay Patel	b64c0d7b53	[DAGCombiner] simplify code for fmul with constant fold; NFCI llvm-svn: 343997	2018-10-08 21:17:20 +00:00
Alex Bradbury	f27c67af12	[SelectionDAGBuilder][NFC] Pass LHSTy to getShiftAmountTy rather than RHSTy r126518 introduced a a type parameter to the getShiftAmountTy target hook. It produces the type of the shift (RHSTy), parameterised by the type of the value being shifted (LHSTy). SelectionDAGBuilder::visitShift passed RHSTy rather than LHSTy and this patch corrects this. The change is a no-op because in LLVM IR the LHS and RHS types for a shift must be equal anyway. llvm-svn: 343955	2018-10-08 06:24:59 +00:00
Craig Topper	98dd9d6896	Revert r343948 "[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC" The assert is failing some asan tests on the bots. llvm-svn: 343950	2018-10-08 03:12:12 +00:00
Craig Topper	c058a68784	[LegalizeDAG] Make one of the ReplaceNode signatures take an ArrayRef instead a pointer to an array. Add assert on size of array. NFC llvm-svn: 343948	2018-10-08 02:02:08 +00:00
Craig Topper	cd38de8b15	[LegalizeDAG] Move legalization of scatter and masked store from LegalizeVectorOps to LegalizeDAG. This is where we legalize gather and masked load so this is consistent. Since these ops are always on vectors I've chosen to go with LegalizeDAG since that's what we do for other vector only ops like BUILD_VECTOR, VECTOR_SHUFFLE, etc. The ScalarizeMaskedMemIntrinsic pass should take care of scalarizing these before SelectionDAG so hopefully we don't need to worry about illegally typed scalar ops being emitted in the legalizing. If we did we would need to do this in LegalizeVectorOps so we could get the second type legalization that runs between LegalizeVectorOps and LegalizeDAG. llvm-svn: 343947	2018-10-08 00:04:55 +00:00
Sanjay Patel	ecc8af61e7	[DAGCombiner] allow undef elts in vector fadd matching llvm-svn: 343945	2018-10-07 16:30:42 +00:00
Sanjay Patel	ef76e27985	[DAGCombiner] allow undefs when matching vector splats for fmul folds llvm-svn: 343942	2018-10-07 16:05:37 +00:00
Sanjay Patel	0b74c840dd	[DAGCombiner] allow undef elts in vector fabs/fneg matching This change is proposed as a part of D44548, but we need this independently to avoid regressions from improved undef propagation in SimplifyDemandedVectorElts(). llvm-svn: 343940	2018-10-07 15:32:06 +00:00
Sanjay Patel	46a9dc2e3e	[DAGCombiner] shorten code for bitcast+fabs fold; NFC llvm-svn: 343939	2018-10-07 15:18:30 +00:00
Simon Pilgrim	3b04a4e322	[SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....). Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ. Thanks to Jan Vesely (@jvesely) for bisecting the bug. llvm-svn: 343935	2018-10-07 11:45:46 +00:00
Craig Topper	e4d199e360	[LegalizeVectorOps] Make ExpandStrictFPOp return the result corresponding to the result number of the SDValue passed in. It was always returning the chain which seems to be the result number of the SDValue in the lit tests we have. But I don't know if that's guaranteed. llvm-svn: 343933	2018-10-07 07:16:44 +00:00
Simon Pilgrim	9c9c97bcf4	[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector. There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify. As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.). Fixes PR39178 Differential Revision: https://reviews.llvm.org/D52935 llvm-svn: 343913	2018-10-06 10:20:04 +00:00
Sanjay Patel	f6a160a102	[SelectionDAG] allow undefs when matching splat constants And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865	2018-10-05 17:42:19 +00:00
Craig Topper	7d2155e3f9	[X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result. Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled. This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine. llvm-svn: 343817	2018-10-04 21:24:24 +00:00
Craig Topper	08ae6774eb	[LegalizeIntegerTypes] Fix typo in comment. NFC llvm-svn: 343750	2018-10-04 02:40:35 +00:00
Matthias Braun	004fe6bf83	DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector\|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493	2018-10-01 16:25:50 +00:00
Simon Pilgrim	818cfc40ff	[DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalization The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true No test case to hand, noticed when investigating PR38226 + PR38970 llvm-svn: 343405	2018-09-30 12:46:42 +00:00
David Bolvansky	8e90bad63d	[DAGCombiner] [NFC] Improve X div/rem 1 fold Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52661 llvm-svn: 343349	2018-09-28 18:40:30 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Simon Pilgrim	b0189289bf	[DAG] SelectionDAGLegalize::ExpandLegalINT_TO_FP - use getFPExtendOrRound helper. NFCI. Handles SrcVT == DstVT as well. llvm-svn: 343121	2018-09-26 16:24:07 +00:00
Simon Pilgrim	e2437689a8	[DAG] ExpandLegalINT_TO_FP - pull out repeated getValueType() call. NFCI. llvm-svn: 343101	2018-09-26 12:42:19 +00:00
David Green	353cb3d4e5	[CodeGen] Enable tail calls for functions with NonNull attributes. Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091	2018-09-26 10:46:18 +00:00
Mikael Nilsson	9c8e35174e	Run VerifyDAGDiverence in debug only VerifyDAGDiverence costs compilation time, avoid running it in non-debug builds. Differential Revision: https://reviews.llvm.org/D52454 llvm-svn: 343086	2018-09-26 09:25:45 +00:00
Craig Topper	b2a00acb24	[DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike returning a UDIVREM or SDIVREM node. This shouldn't be possible and is a leftover from when we used to recursively call combine here. llvm-svn: 343049	2018-09-25 23:52:07 +00:00
Heejin Ahn	e41be38efd	Unify landing pad information adding routines (NFC) Summary: We have `llvm::addLandingPadInfo` and `MachineFunction::addLandingPad`, both of which add landing pad information to populate `LandingPadInfo` but are called from different locations, which was confusing. This patch unifies them with one `MachineFunction::addLandingPad` function, which now has functionlities of both functions. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52428 llvm-svn: 343018	2018-09-25 19:56:44 +00:00
Sanjay Patel	10c11b867a	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449) This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008	2018-09-25 19:09:34 +00:00
Nirav Dave	a2f514d672	[LegalizeDAG] Prune Predecessor check in ExpandExtractFromVectorThroughStack. NFCI. llvm-svn: 342985	2018-09-25 15:29:57 +00:00
Nirav Dave	f445a67be4	[DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI. Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984	2018-09-25 15:29:30 +00:00
Nirav Dave	7373d5e646	[DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. NFCI. llvm-svn: 342983	2018-09-25 15:29:04 +00:00
Nirav Dave	46ab89a0d0	[DAGCombine] Don't fold dependent loads across SELECT_CC. DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980	2018-09-25 14:43:05 +00:00
Sanjay Patel	2c901742ca	[DAGCombiner] use UADDO to optimize saturated unsigned add This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886	2018-09-24 14:47:15 +00:00
Hans Wennborg	83d15dfe2d	Remove debug printf leftover from r342397 llvm-svn: 342863	2018-09-24 08:18:47 +00:00
Craig Topper	5bef27e808	[DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTOR This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015. llvm-svn: 342856	2018-09-24 02:03:11 +00:00
Craig Topper	b3b94a8e8b	[DAGCombiner] Clarify a comment. NFC This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed. llvm-svn: 342851	2018-09-23 21:17:56 +00:00
Craig Topper	bec5967176	[LegalizeTypes] Fix bad indentation. NFC llvm-svn: 342850	2018-09-23 21:17:55 +00:00
Sanjay Patel	0027946915	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844	2018-09-23 18:41:38 +00:00
Craig Topper	81f67f7afb	[DAGCombiner] Simplify some code in visitBITCAST. NFCI llvm-svn: 342826	2018-09-22 23:12:34 +00:00
Craig Topper	e79a588cac	[DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCI llvm-svn: 342809	2018-09-22 18:03:14 +00:00
Sanjay Patel	8a1227ccc8	[SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669	2018-09-20 17:34:08 +00:00
Sanjay Patel	fdc0de19cb	[SelectionDAG] allow vector types with isBitwiseNot() The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594	2018-09-19 21:48:30 +00:00
Michael Berg	894c39f770	Copy utilities updated and added for MI flags Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw. Reviewers: spatel, wristow, arsenm Reviewed By: arsenm Subscribers: wdng Differential Revision: https://reviews.llvm.org/D52006 llvm-svn: 342576	2018-09-19 18:52:08 +00:00
Sanjay Patel	4fd2e2a498	[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554	2018-09-19 15:57:40 +00:00
Matthias Braun	726e12cf0c	ScheduleDAG: Cleanup dumping code; NFC - Instead of having both `SUnit::dump(ScheduleDAG)` and `ScheduleDAG::dumpNode(ScheduleDAG)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520	2018-09-19 00:23:35 +00:00
Amara Emerson	91c2913522	Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397	2018-09-17 14:40:13 +00:00
Sanjay Patel	3eaf500a6d	[DAGCombiner] try to convert pow(x, 1/3) to cbrt(x) This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348	2018-09-16 16:50:26 +00:00
Reid Kleckner	4d1b75c6b7	Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265	2018-09-14 19:39:40 +00:00
Adrian Prantl	16f58d1850	Fix debug info for SelectionDAG legalization of DAG nodes with two results. This patch fixes the debug info handling for SelectionDAG legalization of DAG nodes with two results. When an replaced SDNode has more than one result, transferDbgValues was always copying the SDDbgValue from the first result and attaching them to all members. In reality SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes (though the type signature doesn't make this obvious (cf. the call site code in ReplaceNode()). rdar://problem/44162227 Differential Revision: https://reviews.llvm.org/D52112 llvm-svn: 342264	2018-09-14 19:38:45 +00:00
Adrian Prantl	66945cf6e3	fix noasserts build llvm-svn: 342247	2018-09-14 17:32:52 +00:00
Adrian Prantl	55b8756b8a	SelectionDAG: Add compact SDDbgValue representation to -dag-dump-verbose output llvm-svn: 342245	2018-09-14 17:08:02 +00:00

... 2 3 4 5 6 ...

9513 Commits