llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	986d73cc1d	[SelectionDAG] Pull out repeated getValueType calls. NFCI. Noticed in D32391. llvm-svn: 301308	2017-04-25 13:39:07 +00:00
Simon Pilgrim	7d65b66962	[DAGCombiner] Add vector support for (srl (trunc (srl x, c1)), c2) combine. llvm-svn: 301305	2017-04-25 12:40:45 +00:00
Simon Pilgrim	ab0446332e	[SelectionDAG] Recognise splat vector isKnownToBeAPowerOfTwo one/sign bit shift cases. llvm-svn: 301303	2017-04-25 12:29:07 +00:00
Simon Pilgrim	96611aa30c	[DAGCombiner] Use SDValue::getConstantOperandVal helper where possible. NFCI. llvm-svn: 301300	2017-04-25 10:47:35 +00:00
Simon Pilgrim	93da6660a2	[DAGCombiner] Use APInt::intersects to avoid tmp variable. NFCI. llvm-svn: 301258	2017-04-24 21:43:21 +00:00
Krzysztof Parzyszek	c8e8e2a046	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301234	2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek	98ab4c64c4	Revert r301231: Accidentally committed stale files I forgot to commit local changes before commit. llvm-svn: 301232	2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek	c0197066d7	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301231	2017-04-24 19:43:45 +00:00
Yaxun Liu	fd23a0c095	CodeGen: Add a hook for getFenceOperandTy Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215	2017-04-24 18:26:27 +00:00
Simon Pilgrim	f60f57e6e8	[DAGCombiner] Updated bswap byte offset variable names to be more descriptive. NFC As discussed on D32039, use MaskByteOffset to describe the variable and also pull out repeated getOpcode() calls. llvm-svn: 301193	2017-04-24 17:05:14 +00:00
Nirav Dave	c799f3a809	[SDAG] Teach Chain Analysis about BaseIndexOffset addressing. While we use BaseIndexOffset in FindBetterNeighborChains to appropriately realize they're almost the same address and should be improved concurrently we do not use it in isAlias using the non-index understanding FindBaseOffset instead. Adding a BaseIndexOffset check in isAlias like should allow indexed stores to be merged. FindBaseOffset to be excised in subsequent patch. Reviewers: jyknight, aditya_nandakumar, bogner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31987 llvm-svn: 301187	2017-04-24 15:37:20 +00:00
Renato Golin	4abfb3d741	Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111	2017-04-23 12:15:30 +00:00
Artyom Skrobov	53cf1897cc	[ARM] ScheduleDAGRRList::DelayForLiveRegsBottomUp must consider OptionalDefs Summary: D30400 has enabled tADC and tSBC instructions to be unglued, thereby allowing CPSR to remain live between Thumb1 scheduling units. Most Thumb1 instructions have an OptionalDef for CPSR; but the scheduler ignored the OptionalDefs, and could unwittingly insert a flag-setting instruction in between an ADDS and the corresponding ADC. Reviewers: javed.absar, atrick, MatzeB, t.p.northover, jmolloy, rengolin Reviewed By: javed.absar Subscribers: rogfer01, efriedma, aemerson, rengolin, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D31081 llvm-svn: 301106	2017-04-23 06:58:08 +00:00
Craig Topper	474e5de72d	[APInt] Fix a few places that use APInt::getRawData to operate within the normal API. getRawData exposes the internal type of the APInt class directly to its users. Ideally we wouldn't expose such an implementation detail. This patch fixes a few of the easy cases by using truncate, extract, or a rotate. llvm-svn: 301105	2017-04-23 06:41:11 +00:00
Craig Topper	cdd5ae6676	[APInt] Use operator<<= where possible. NFC llvm-svn: 301104	2017-04-23 05:43:02 +00:00
Craig Topper	5f68af0806	[APInt] Use operator<<= instead of shl where possible. NFC llvm-svn: 301103	2017-04-23 05:18:31 +00:00
Craig Topper	ae9672c96d	[APInt] Use ashInPlace where possible. llvm-svn: 301101	2017-04-23 03:45:59 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Benjamin Kramer	997fd5eeb4	[Recycler] Add asan/msan annotations. This enables use after free and uninit memory checking for memory returned by a recycler. SelectionDAG currently relies on the opcode of a free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison for SDNode opcodes. This means that we won't find some issues, but only in SDag. llvm-svn: 300868	2017-04-20 18:29:37 +00:00
Yaxun Liu	5d977f8ed4	CodeGen: Let frame index value type match alloca addr space Recently alloca address space has been added to data layout. Due to this change, pointer returned by alloca may have different size as pointer in address space 0. However, currently the value type of frame index is assumed to be of the same size as pointer in address space 0. This patch fixes that. Most targets assume alloca returning pointer in address space 0, which is the default alloca address space. Therefore it is NFC for them. AMDGCN target with amdgiz environment requires this change since it assumes alloca returning pointer to addr space 5 and its size is 32, which is different from the size of pointer in addr space 0 which is 64. Differential Revision: https://reviews.llvm.org/D32021 llvm-svn: 300864	2017-04-20 18:15:34 +00:00
Sanjay Patel	13985cd111	[DAGCombiner] use more local variables in isAlias(); NFCI llvm-svn: 300860	2017-04-20 18:02:27 +00:00
Craig Topper	bcfd2d1789	[APInt] Rename getSignBit to getSignMask getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856	2017-04-20 16:56:25 +00:00
Sanjay Patel	2d0e88fb9b	[DAGCombiner] fix variable names in isAlias(); NFCI We started with zero-based params and switched to one-based locals... Also, variables start with a capital and functions do not. llvm-svn: 300854	2017-04-20 16:36:37 +00:00
Sanjay Patel	b7701bc9af	[DAGCombiner] give names to repeated calcs in isAlias(); NFCI llvm-svn: 300850	2017-04-20 16:15:08 +00:00
Amara Emerson	23e79ec2b3	[MVT][SVE] Scalable vector MVTs (3/3) Adds MVT::ElementCount to represent the length of a vector which may be scalable, then adds helper functions that work with it. Patch by Graham Hunter. Differential Revision: https://reviews.llvm.org/D32019 llvm-svn: 300842	2017-04-20 13:54:09 +00:00
Amara Emerson	5054782052	[MVT][SVE] Scalable vector MVTs (1/3) This patch adds a few helper functions to obtain new vector value types based on existing ones without needing to care about whether they are scalable or not. I've confined their use to a few common locations right now, and targets that don't have scalable vectors should never need to care about these. Patch by Graham Hunter. Differential Revision: https://reviews.llvm.org/D32017 llvm-svn: 300838	2017-04-20 13:08:17 +00:00
Craig Topper	9ce5ef9475	[SelectionDAG] Fix another place that was passing a large value to APInt::lshrInPlace. llvm-svn: 300821	2017-04-20 04:55:01 +00:00
Craig Topper	d3884b8402	[SelectionDAG] Use getActiveBits() and countTrailingZeros() to avoid creating temporary APInts with lshr and trunc. NFCI llvm-svn: 300819	2017-04-20 04:23:43 +00:00
Craig Topper	4db0c69373	Recommit "[APInt] Add back the asserts that check that the APInt shift methods aren't called with values larger than BitWidth." This includes a fix to clamp a right shift of larger than BitWidth in DAG combining. llvm-svn: 300816	2017-04-20 03:49:18 +00:00
Galina Kistanova	2cc97d92ce	Temporarily revert r299221 to fix nondeterminism in ThinLTO builder. llvm-svn: 300783	2017-04-19 23:16:14 +00:00
Sanjay Patel	0658a95a35	[DAG] add splat vector support for 'or' in SimplifyDemandedBits I've changed one of the tests to not fold away, but we didn't and still don't do the transform that the comment claims we do (and I don't know why we'd want to do that). Follow-up to: https://reviews.llvm.org/rL300725 https://reviews.llvm.org/rL300763 llvm-svn: 300772	2017-04-19 22:00:00 +00:00
Sanjay Patel	ae382bb6af	[DAG] add splat vector support for 'xor' in SimplifyDemandedBits This allows forming more 'not' ops, so we get improvements for ISAs that have and-not. Follow-up to: https://reviews.llvm.org/rL300725 llvm-svn: 300763	2017-04-19 21:23:09 +00:00
Craig Topper	9b71a402c2	[APInt] Cast calls to add/sub/mul overflow methods to void if only their overflow bool out param is used. This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207. We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param. llvm-svn: 300758	2017-04-19 21:09:45 +00:00
Sanjay Patel	ded7d59f0e	[DAG] add splat vector support for 'and' in SimplifyDemandedBits The patch itself is simple: stop discriminating against vectors in visitAnd() and again in SimplifyDemandedBits(). Some notes for reference: 1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions. Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in. 2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could make similar simultaneous changes in other places if we think that would be better. 3. I don't know what the intent of the changed tests in this patch was supposed to be, but since they wiggled in a positive way, I'm just going with that. :) 4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253. 5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards improving the vector codegen in that patch without writing any actual new code. Differential Revision: https://reviews.llvm.org/D32230 llvm-svn: 300725	2017-04-19 18:05:06 +00:00
Nirav Dave	8563fc4664	[DAG] Loop over remaining candidates on successful merge of stores of extracted vectors types. NFCI. llvm-svn: 300688	2017-04-19 13:52:38 +00:00
Chih-Hung Hsieh	877923a87f	[X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64. Android x86_64 target uses f128 type and stores f128 values in %xmm* registers. SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value from f128 to i128. Differential Revision: http://reviews.llvm.org/D32102 llvm-svn: 300583	2017-04-18 20:15:18 +00:00
Craig Topper	fc947bcfba	[APInt] Use lshrInPlace to replace lshr where possible This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566	2017-04-18 17:14:21 +00:00
Nirav Dave	855ef45602	[DAG] Improve store merge candidate pruning. Remove non-consecutive stores from store merge candidate search as they cannot be merged and will prevent us from finding subsequent mergeable store cases. Reviewers: jyknight, bogner, javed.absar, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32086 llvm-svn: 300561	2017-04-18 15:36:34 +00:00
Adrian Prantl	6825fb64e9	PR32382: Fix emitting complex DWARF expressions. The DWARF specification knows 3 kinds of non-empty simple location descriptions: 1. Register location descriptions - describe a variable in a register - consist of only a DW_OP_reg 2. Memory location descriptions - describe the address of a variable 3. Implicit location descriptions - describe the value of a variable - end with DW_OP_stack_value & friends The existing DwarfExpression code is pretty much ignorant of these restrictions. This used to not matter because we only emitted very short expressions that we happened to get right by accident. This patch makes DwarfExpression aware of the rules defined by the DWARF standard and now chooses the right kind of location description for each expression being emitted. This would have been an NFC commit (for the existing testsuite) if not for the way that clang describes captured block variables. Based on how the previous code in LLVM emitted locations, DW_OP_deref operations that should have come at the end of the expression are put at its beginning. Fixing this means changing the semantics of DIExpression, so this patch bumps the version number of DIExpression and implements a bitcode upgrade. There are two major changes in this patch: I had to fix the semantics of dbg.declare for describing function arguments. After this patch a dbg.declare always takes the address of a variable as the first argument, even if the argument is not an alloca. When lowering a DBG_VALUE, the decision of whether to emit a register location description or a memory location description depends on the MachineLocation — register machine locations may get promoted to memory locations based on their DIExpression. (Future) optimization passes that want to salvage implicit debug location for variables may do so by appending a DW_OP_stack_value. For example: DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8 DBG_VALUE, RAX --> DW_OP_reg0 +0 DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0 All testcases that were modified were regenerated from clang. I also added source-based testcases for each of these to the debuginfo-tests repository over the last week to make sure that no synchronized bugs slip in. The debuginfo-tests compile from source and run the debugger. https://bugs.llvm.org/show_bug.cgi?id=32382 <rdar://problem/31205000> Differential Revision: https://reviews.llvm.org/D31439 llvm-svn: 300522	2017-04-18 01:21:53 +00:00
Reid Kleckner	fb502d2f5e	[IR] Make paramHasAttr to use arg indices instead of attr indices This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367	2017-04-14 20:19:02 +00:00
Nirav Dave	642ed1ef7e	Reorder StoreMergeCandidates to run faster. NFCI. llvm-svn: 300321	2017-04-14 13:34:30 +00:00
Andrew V. Tischenko	4e7bcd5216	Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG. Patch by Dinar Temirbulatov llvm-svn: 300314	2017-04-14 09:17:09 +00:00
Nirav Dave	9acd2fd9d9	[DAG] Fold away temporary vector in store candidate merge NFC. llvm-svn: 300241	2017-04-13 20:00:27 +00:00
Craig Topper	8b459c24f3	[SelectionDAG] Use APInt move assignment to avoid 2 memory allocations and copies when bit width is larger than 64-bits. llvm-svn: 300091	2017-04-12 18:39:27 +00:00
Serge Pavlov	2757afdb85	Remove redundant type casts llvm-svn: 300063	2017-04-12 14:13:00 +00:00
Nirav Dave	a55dad3c33	[SDAG] Factor CandidateMatch check into lambda. NFC. llvm-svn: 299939	2017-04-11 13:41:19 +00:00
Nirav Dave	83defd1902	[SDAG] Factor ChainMerge into helper function NFCI. llvm-svn: 299938	2017-04-11 13:41:17 +00:00
Nirav Dave	233eb7a636	[SDAG] Reorder expensive StoreMerge Check after cheaper one. NFC llvm-svn: 299937	2017-04-11 13:41:16 +00:00
Sam Parker	4fc5f3c02e	[SelectionDAG] Check CALLSEQ_BEGIN nodes in DelayForLiveRegs A fix for the bug reported in PR30911. The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as the last node to be unscheduled will gain access to the CallResource register. But when a node is being picked, only CALLSEQ_END nodes are checked against the CallResource and have their chains evaluated. This then means that other CALLSEQ_BEGIN nodes can be scheduled before the existing call sequence has been finalised. This patch adds a check against the FrameSetup nodes in DelayForLiveRegs to prevent this from happening. Differential Revision: https://reviews.llvm.org/D31536 llvm-svn: 299926	2017-04-11 08:43:32 +00:00
Craig Topper	3606e732dd	[SelectionDAG] TargetLowering::SimplifyDemandedBits how to properly calculate KnownZero bits for ISD::SETCC and ISD::AssertZExt Summary: For SETCC we aren't calculating the KnownZero bits at all. I've copied the code from computeKnownZero over for this. For AssertZExt we were only setting KnownZero for bits that were demanded. But the upper bits are zero whether they were demanded or not. I'm interested in fixing this because my belief is the first part of the ISD::AND handling code in SimplifyDemandedBits largely exists because of these two bugs. In that code we go to computeKnownBits for the LHS and optimize a RHS constant. Because computeKnownBits handles SETCC and AssertZExt correctly we get better information sometimes than when we call SimplifyDemandedBits on the LHS later. With these two issues fixed in SimplifyDemandedBits I was able to remove that computeKnownBits call and still pass all X86 tests. I'll submit that change in a separate patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31715 llvm-svn: 299839	2017-04-10 07:06:44 +00:00
Simon Dardis	f7e4388e3b	Revert "[SelectionDAG] Enable target specific vector scalarization of calls and returns" This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investigate. Revert "[mips] Remove usage of debug only variable (NFC)" This reverts commit r299769. Follow up commit. llvm-svn: 299788	2017-04-07 17:25:05 +00:00
Simon Dardis	6470ff0b24	[SelectionDAG] Enable target specific vector scalarization of calls and returns By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 299766	2017-04-07 13:03:52 +00:00
Nirav Dave	974f7c23ae	[SDAG] Fix visitAND optimization to deal with vector extract case again. Summary: Fix case elided by rL298920. Fixes PR32545. Reviewers: eli.friedman, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31759 llvm-svn: 299688	2017-04-06 19:05:41 +00:00
Jonas Paulsson	45c936ef86	[SelectionDAG] NFC patch removing a redundant check. Since the BUILD_VECTOR has already been checked by isBuildVectorOfConstantSDNodes() in SelectionDAG::getNode() for a SIGN_EXTEND_INREG, it can be assumed that Op is always either undef or a ConstantSDNode, and Ops.size() will always equal VT.getVectorNumElements(). llvm-svn: 299647	2017-04-06 13:00:37 +00:00
Craig Topper	2ca72f4971	Revert accidental commit of r299619. llvm-svn: 299622	2017-04-06 04:04:10 +00:00
Craig Topper	6b15606051	Revert accidental commit of r299618 llvm-svn: 299621	2017-04-06 04:03:34 +00:00
Craig Topper	5d7ece8895	bar llvm-svn: 299619	2017-04-06 04:02:31 +00:00
Craig Topper	faf5a8553c	foo llvm-svn: 299618	2017-04-06 04:02:28 +00:00
Adam Nemet	d5ffdd3605	[DAGCombine] Support FMF contract in fused multiple-and-sub too This is a follow-on to r299096 which added support for fmadd. Subtract does not have the case where with two multiply operands we commute in order to fuse with the multiply with the fewer uses. llvm-svn: 299572	2017-04-05 17:58:48 +00:00
Adam Nemet	99e347fc35	[DAGCombine] Remove commented-out code from r299096 llvm-svn: 299571	2017-04-05 17:58:44 +00:00
Sanjay Patel	b2f1621bb1	[DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542	2017-04-05 14:09:39 +00:00
Jonas Paulsson	38a2da92bc	[DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type. When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with constant operands, a new BUILD_VECTOR node will be created transformed constants. Llvm-stress found a case where the new BUILD_VECTOR had constant operands of an illegal type, because the (legal) element type is in fact not a legal scalar type. This patch changes this so that the new BUILD_VECTOR has the same operand type as the old one. Review: Eli Friedman, Nirav Dave https://bugs.llvm.org//show_bug.cgi?id=32422 llvm-svn: 299540	2017-04-05 13:45:37 +00:00
Matt Arsenault	c82768290d	DAG: Fix missing legalization for any_extend_vector_inreg operands llvm-svn: 299389	2017-04-03 21:28:13 +00:00
Craig Topper	3882613956	[DAGCombine][InstCombine] Fix inverted if condition in equivalent comments in DAGCombine and InstCombine. NFC llvm-svn: 299378	2017-04-03 19:18:48 +00:00
Zvi Rackover	d76a4d0ac6	Revert "[DAGCombine] A shuffle of a splat is always the splat itself" This reverts commit r299047 which is incorrect because the simplification may result in incorrect propogation of undefs to users of the folded shuffle. Thanks to Andrea Di Biagio for pointing this out. llvm-svn: 299368	2017-04-03 17:41:19 +00:00
Craig Topper	d33ee1b960	[APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation. Differential Revision: https://reviews.llvm.org/D31565 llvm-svn: 299362	2017-04-03 16:34:59 +00:00
Simon Pilgrim	9daf9c047d	[DAGCombiner] Check limits before accessing array element (PR32502) llvm-svn: 299361	2017-04-03 15:27:49 +00:00
Sanjay Patel	665021e7ee	[DAGCombiner] enable vector transforms for any/all {sign} bits set/clear The code already allowed vector types in via "isInteger" (which might want a more specific name), so use splat-friendly constant predicates to match those types. llvm-svn: 299304	2017-04-01 15:05:54 +00:00
Craig Topper	73250168e7	[DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector. This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros. Fixes PR32484. llvm-svn: 299291	2017-04-01 04:26:20 +00:00
Quentin Colombet	35a47010b1	Revert "Instrument SDISel C++ patterns" This reverts commit r299284. Didn't intend to commit this :( llvm-svn: 299286	2017-04-01 01:26:17 +00:00
Quentin Colombet	b43da15602	Instrument SDISel C++ patterns llvm-svn: 299284	2017-04-01 01:21:32 +00:00
Sanjay Patel	16d458ea0d	[DAGCombiner] refactor and/or-of-setcc to get rid of duplicated code; NFCI llvm-svn: 299266	2017-03-31 21:30:50 +00:00
Sanjay Patel	34da36e74f	[DAGCombiner] add fold for 'All sign bits set?' (and (setlt X, 0), (setlt Y, 0)) --> (setlt (and X, Y), 0) We have 7 similar folds, but this one got away. The fact that the x86 test with a branch didn't change is probably a separate bug. We may also be missing this and the related folds in instcombine. llvm-svn: 299252	2017-03-31 20:28:06 +00:00
Sanjay Patel	61d3409535	[DAGCombiner] remove redundant code and add comments; NFCI llvm-svn: 299241	2017-03-31 18:18:58 +00:00
Simon Pilgrim	1cdbfe44b1	[DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT Followup to D31311 llvm-svn: 299221	2017-03-31 14:21:50 +00:00
Simon Pilgrim	3c81c34d8d	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219	2017-03-31 13:54:09 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Adam Nemet	edaec6de73	[DAGCombiner] Initial support for the fast-math flag contract Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD can also use the per operation FMF to allow fusion. The idea here is not to port everything to the new scheme (e.g. fused multiply-and-sub will be ported later) but that this work all the way from clang. The transformation is conditionalized on both the FADD and the FMUL having the FMF contract flag. Differential Revision: https://reviews.llvm.org/D31169 llvm-svn: 299096	2017-03-30 18:53:04 +00:00
Ahmed Bougacha	6dd6082472	[CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks. In the long-term, we want to replace statistics with something finer-grained that lets us gather per-function data. Remarks are that replacement. Create an ORE instance in SelectionDAGISel, and pass it to SelectionDAG. SelectionDAG was used so that we can emit remarks from all SelectionDAG-related code, including TargetLowering and DAGCombiner. This isn't used in the current patch but Adam tells me he's interested for the fp-contract combines. Use the ORE instance to emit FastISel failures as remarks (instead of the mix of dbgs() dumps and statistics that we currently have). Eventually, we want to have an API that tells us whether remarks are enabled (http://llvm.org/PR32352) so that we don't emit expensive remarks (in this case, dumping IR) when it's not needed. For now, use 'isEnabled' as a crude replacement. This does mean that the replacement for '-fast-isel-verbose' is now '-pass-remarks-missed=isel'. Additionally, clang users also need to enable remark diagnostics, using '-Rpass-missed=isel'. This also removes '-fast-isel-verbose2': there are no static statistics that we want to only enable in asserts builds, so we can always use the remarks regardless of the build type. Differential Revision: https://reviews.llvm.org/D31405 llvm-svn: 299093	2017-03-30 17:49:58 +00:00
Sanjay Patel	6d5ba061f8	[DAGCombiner] add helper function for visitORLike; NFCI This combines all of the equivalent clean-ups for foldAndOfSetCCs: https://reviews.llvm.org/rL298938 https://reviews.llvm.org/rL298940 https://reviews.llvm.org/rL298944 https://reviews.llvm.org/rL298949 https://reviews.llvm.org/rL298950 https://reviews.llvm.org/rL299002 https://reviews.llvm.org/rL299013 The sins of code duplication are on full display here: each function is missing a fold that wasn't copied over from its logical sibling. llvm-svn: 299091	2017-03-30 17:32:42 +00:00
Craig Topper	eafcbe2d10	[APInt] Remove references to integerPartWidth outside of APFloat implentation. Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes. llvm-svn: 299059	2017-03-30 05:49:03 +00:00
Zvi Rackover	7569436f81	[DAGCombine] A shuffle of a splat is always the splat itself Summary: Add a simplification: shuffle (splat-shuffle), undef, M --> splat-shuffle Fixes pr32449 Patch by Sanjay Patel Reviewers: eli.friedman, RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31426 llvm-svn: 299047	2017-03-30 01:42:57 +00:00
Davide Italiano	cdcdc97879	[DAGCombiner] Remove else after return. NFCI. llvm-svn: 299022	2017-03-29 19:39:46 +00:00
Sanjay Patel	ff211bb5a6	[DAGCombiner] unify type checks and add asserts; NFCI We had a mix of type checks and usage that wasn't very clear. llvm-svn: 299013	2017-03-29 18:08:01 +00:00
Sanjay Patel	087e922328	[DAGCombiner] reduce code duplication by rearranging checks; NFCI llvm-svn: 299002	2017-03-29 15:37:33 +00:00
Adam Nemet	92a5cf4366	[SDAG] Remove -enable-fmf-dag This is no longer needed as spotted by Sanjay in https://reviews.llvm.org/D31165. llvm-svn: 298963	2017-03-28 23:46:14 +00:00
Adam Nemet	6820f391eb	[SDAG] Add AllowContract to SNodeFlags Properly propagate the FMF from the LLVM IR to this flag. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. Differential Revision: https://reviews.llvm.org/D31165 llvm-svn: 298961	2017-03-28 23:46:08 +00:00
Sanjay Patel	a41a5c29f0	[DAGCombiner] reduce code duplication with local variables; NFCI llvm-svn: 298954	2017-03-28 22:45:53 +00:00
Sanjay Patel	9747d8070b	[DAG] fix formatting; NFC llvm-svn: 298950	2017-03-28 22:25:25 +00:00
Sanjay Patel	d832eddde5	[DAGCombiner] remove redundant conditions and duplicated code; NFCI llvm-svn: 298949	2017-03-28 22:22:50 +00:00
Sanjay Patel	d2a26db991	[DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCI llvm-svn: 298944	2017-03-28 21:40:41 +00:00
Sanjay Patel	3230e4be11	[DAGCombiner] clean up foldAndOfSetCCs; NFCI 1. Fix bogus comment. 2. Early exit to reduce indent. 3. Change node pointer param to what it really is: an SDLoc. llvm-svn: 298940	2017-03-28 20:28:16 +00:00
Sanjay Patel	16af53a395	[DAGCombiner] add helper function for and-of-setcc folds; NFC This is just a cut and paste followed by clang-format. Clean up to follow. llvm-svn: 298938	2017-03-28 19:58:46 +00:00
Sanjay Patel	f01a1dad7f	[x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933	2017-03-28 17:23:49 +00:00
Nirav Dave	472b5efc8b	[SDAG] Deal with deleted node in PromoteIntShiftOp Deal with case that initial node is deleted during dag-combine leading to an assertional failure in promoteIntShiftOp. Fixes PR32420. Reviewers: spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31403 llvm-svn: 298931	2017-03-28 17:09:49 +00:00
Nirav Dave	5b414ebe63	[SDAG] Avoid deleted SDNodes PromoteIntBinOp Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from being used. Fixes PR32340 and PR32345. Reviewers: hfinkel, dbabokin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31148 llvm-svn: 298923	2017-03-28 15:41:12 +00:00
Nirav Dave	9b5563c52c	[SDAG] Fix Stale SDNode usage in visitAND Reorder CombineTo Calls to prevent potential use of deleted node. Fixes PR32372. Reviewers: jnspaulsson, RKSimon, uweigand, jonpa Reviewed By: jonpa Subscribers: jonpa, llvm-commits Differential Revision: https://reviews.llvm.org/D31346 llvm-svn: 298920	2017-03-28 14:11:20 +00:00
Nirav Dave	423b24ae76	[SDAG] Minor cleanup of variable usage. NFC. llvm-svn: 298916	2017-03-28 13:39:50 +00:00
Sanjay Patel	9ebb68843e	[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775	2017-03-25 16:05:33 +00:00
Simon Pilgrim	dbc94db3f3	Apply clang-format as commented in D31311. NFCI. llvm-svn: 298751	2017-03-24 23:47:41 +00:00
Nirav Dave	e9ca32ae52	[SDAG] Fix zeroExtend assertion error Move CombineTo preventing deleted node from being returned in visitZERO_EXTEND. Fixes PR32284. Reviewers: RKSimon, bogner Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31254 llvm-svn: 298604	2017-03-23 15:01:50 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Matt Arsenault	dce313c3cf	DAG: Fold bitcast/extract_vector_elt of undef to undef Fixes not eliminating store when intrinsic is lowered to undef. llvm-svn: 298385	2017-03-21 16:20:16 +00:00
Jonas Paulsson	54c7680e1f	[DAGTypeLegalizer] Handle widening truncate to vector of i1. Previously, PromoteIntRes_TRUNCATE() did not handle the case where the operand needs widening, which resulted in llvm_unreachable(). This patch adds the needed handling, along with a test case. Review: Eli Friedman, Simon Pilgrim. https://reviews.llvm.org/D31077 llvm-svn: 298357	2017-03-21 10:24:14 +00:00
Simon Pilgrim	8424df7dea	Fix constant folding of fp2int to large integers We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result. Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes. This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases. Differential Revision: https://reviews.llvm.org/D31074 llvm-svn: 298226	2017-03-19 16:50:25 +00:00
Nirav Dave	ac6081cb67	Make library calls sensitive to regparm module flag (Fixes PR3997). Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179	2017-03-18 00:44:07 +00:00
Nirav Dave	6de2c77944	Capitalize ArgListEntry fields. NFC. llvm-svn: 298178	2017-03-18 00:43:57 +00:00
Eli Friedman	46ddab3810	[SelectionDAG] Remove redundant stores more aggressively. Handle TokenFactors more aggressively in SDValue::reachesChainWithoutSideEffects. This isn't really a very effective change anymore because of other changes to chain handling, but it's a cheap check, and the expanded comments are still useful. It might be possible to loosen the hasOneUse() requirement with a deeper analysis, but a naive implementation of that check would be expensive. Differential Revision: https://reviews.llvm.org/D29845 llvm-svn: 298156	2017-03-17 22:15:50 +00:00
Simon Pilgrim	5a68d401c7	[SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS llvm-svn: 298108	2017-03-17 17:45:36 +00:00
Reid Kleckner	45707d4d5a	Remove getArgumentList() in favor of arg_begin(), args(), etc Users often call getArgumentList().size(), which is a linear way to get the number of function arguments. arg_size(), on the other hand, is constant time. In general, the fact that arguments are stored in an iplist is an implementation detail, so I've removed it from the Function interface and moved all other users to the argument container APIs (arg_begin(), arg_end(), args(), arg_size()). Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D31052 llvm-svn: 298010	2017-03-16 22:59:15 +00:00
Jonas Paulsson	84319bfc40	[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types. Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930	2017-03-16 07:17:12 +00:00
Zvi Rackover	48cdde0e59	[DAGCombine] Bail out if can't create a vector with at least two elements Summary: Fixes pr32278 Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30978 llvm-svn: 297878	2017-03-15 19:48:36 +00:00
Craig Topper	bcb6093610	[CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom. In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent. Differential Revision: https://reviews.llvm.org/D30965 llvm-svn: 297860	2017-03-15 16:53:53 +00:00
Simon Pilgrim	018eedd9a5	[SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273) llvm-svn: 297852	2017-03-15 16:22:24 +00:00
Nuno Lopes	ae455c562d	fix gcc -Wmisleading-indentation [NFC] llvm-svn: 297816	2017-03-15 09:33:33 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Sanjay Patel	8dd99dce6c	[DAG] vector div/rem with any zero element in divisor is undef This is the backend counterpart to: https://reviews.llvm.org/rL297390 https://reviews.llvm.org/rL297409 and follow-up to: https://reviews.llvm.org/rL297384 It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those someday. Differential Revision: https://reviews.llvm.org/D30826 llvm-svn: 297762	2017-03-14 18:06:28 +00:00
Sam Parker	916b1ba617	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Nirav Dave	4fc8401abf	Recommitting Craig Topper's patch now that r296476 has been recommitted. When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 297698	2017-03-14 01:42:23 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Amaury Sechet	d1ec5d54cf	Use setBits in SelectionDAG Summary: As per title. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30836 llvm-svn: 297559	2017-03-11 11:24:03 +00:00
Simon Pilgrim	7dedbfa89d	[SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits llvm-svn: 297492	2017-03-10 18:36:46 +00:00
Amaury Sechet	62e0759d56	[SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC. Summary: Depends on D30379 This improves the state of things for the sub class of operation. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30436 llvm-svn: 297482	2017-03-10 17:26:44 +00:00
Amaury Sechet	69fa16c810	[SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO. Summary: As per title. This is extracted from D29872 and I threw SADDO in. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30379 llvm-svn: 297479	2017-03-10 17:06:52 +00:00
Simon Pilgrim	b02667c469	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Amaury Sechet	e7d102cf02	[DAGCombiner] Do various combine on uaddo. Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 llvm-svn: 297416	2017-03-09 22:47:00 +00:00
Amaury Sechet	10425de063	[DAGCombiner] Do various combine on usubo. Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404	2017-03-09 19:28:00 +00:00
Sanjay Patel	df21979db7	[DAG] recognize div/rem by 0 as undef before trying constant folding As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384	2017-03-09 15:02:25 +00:00
Matt Arsenault	9a3fd87523	DAG: Check no signed zeros instead of unsafe math attribute llvm-svn: 297354	2017-03-09 01:36:39 +00:00
Eli Friedman	c2c2e21d77	[DAGCombine] Simplify ISD::AND in GetDemandedBits. This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249	2017-03-08 00:56:35 +00:00
Sanjay Patel	7f18ec50ba	[DAG] refactor related div/rem folds; NFCI This is known incomplete and not called in the right order relative to other folds, but that's the current behavior. I'm just trying to clean this up before making actual functional changes to make the patch smaller. The logic here should mimic the IR equivalents that are in InstSimplify's simplifyDivRem(). llvm-svn: 297086	2017-03-06 22:32:40 +00:00
Sanjay Patel	7f7947bf41	[DAGCombiner] simplify div/rem-by-0 Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. llvm-svn: 297026	2017-03-06 16:36:42 +00:00
Sanjay Patel	6b029a5380	[DAG] fix formatting; NFC llvm-svn: 297015	2017-03-06 15:27:57 +00:00
Sanjay Patel	5273afd4bb	[DAG] fix typo in comment; NFC llvm-svn: 297011	2017-03-06 15:07:43 +00:00
Simon Pilgrim	584d6d9d91	[SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions Found by fuzz testing after rL296985 landed llvm-svn: 296989	2017-03-05 15:52:18 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Craig Topper	6ffc044b2f	[DAGCombine] Use APInt::operator\|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator\| support rvalue references similar to add/sub. llvm-svn: 296981	2017-03-05 01:08:16 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Florian Hahn	6406f98342	[legalize-types] Remove stale entries from SoftenedFloats. Summary: When replacing a SDValue, we should remove the replaced value from SoftenedFloats (and possibly the other maps as well?). When we revisit a Node because it needs analyzing again, we have to remove all result values from SoftenedFloats (and possibly other maps?). This fixes the fp128 test failures with expensive checks for X86. I think we probably should also remove the values from the other maps (PromotedIntegers and so on), let me know what you think. Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon Reviewed By: chh Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D29265 llvm-svn: 296964	2017-03-04 12:00:35 +00:00
Simon Pilgrim	6dfab414db	Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call llvm-svn: 296886	2017-03-03 17:03:52 +00:00
Simon Pilgrim	cf12b5e1a6	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296879	2017-03-03 16:35:57 +00:00
Simon Pilgrim	10754abe7e	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296871	2017-03-03 14:25:46 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Taewook Oh	96c6415697	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825	2017-03-02 21:58:35 +00:00
Sanjay Patel	7884dcb788	[DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI llvm-svn: 296824	2017-03-02 21:56:43 +00:00
Sanjay Patel	209b0f9aad	[DAG] improve documentation comments; NFC llvm-svn: 296808	2017-03-02 20:48:08 +00:00
Sanjay Patel	fffa179837	[DAGCombiner] avoid assertion when folding binops with opaque constants This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768	2017-03-02 17:18:56 +00:00
Sanjay Patel	f7aba7ba22	fix typo in comment; NFC llvm-svn: 296760	2017-03-02 16:37:24 +00:00
Amaury Sechet	71f511fd1e	[DAGCombiner] mulhi + 1 never overflow. Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 llvm-svn: 296711	2017-03-01 23:44:17 +00:00
Sanjay Patel	92938657a0	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699	2017-03-01 22:51:31 +00:00
Benjamin Kramer	0e429606b0	[DAGCombiner] Remove non-ascii character and reflow comment. llvm-svn: 296690	2017-03-01 22:10:43 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Nirav Dave	0a4703b5ec	[DAG] Prevent Stale nodes from entering worklist Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 llvm-svn: 296668	2017-03-01 20:19:38 +00:00
Artur Pilipenko	e1b2d31468	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651	2017-03-01 18:12:29 +00:00
Ahmed Bougacha	20b3e9a835	[CodeGen] Remove dead FastISel code after SDAG emitted a tailcall. When SDAGISel (top-down) selects a tail-call, it skips the remainder of the block. If, before that, FastISel (bottom-up) selected some of the (no-op) next few instructions, we can end up with dead instructions following the terminator (selected by SDAGISel). We need to erase them, as we know they aren't necessary (in addition to being incorrect). We already do this when FastISel falls back on the tail-call itself. Also remove the FastISel-emitted code if we fallback on the instructions between the tail-call and the return. llvm-svn: 296552	2017-03-01 00:43:42 +00:00
Sanjay Patel	ea61ea9f19	[DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC llvm-svn: 296502	2017-02-28 18:41:49 +00:00
Craig Topper	419f145ebb	[DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 296486	2017-02-28 16:52:05 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Eugene Zelenko	fa912a7151	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 296404	2017-02-27 22:45:06 +00:00
Arnold Schwaighofer	b2605f31ed	ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock Otherwise, it will insert instructions before it. rdar://30536186 llvm-svn: 296395	2017-02-27 22:12:06 +00:00
Matt Arsenault	4a7cc16e89	Revert "DAG: Check if extract_vector_elt is legal or custom" This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. llvm-svn: 296393	2017-02-27 21:59:07 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Artur Pilipenko	f7196c8d9e	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336	2017-02-27 13:04:23 +00:00
Artur Pilipenko	c43b20a43b	[DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296332	2017-02-27 11:42:54 +00:00
Artur Pilipenko	f2c26e0bf2	[DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296331	2017-02-27 11:40:14 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Artyom Skrobov	ac56719231	No need to copy the variable [NFC] llvm-svn: 296259	2017-02-25 17:18:09 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Sanjay Patel	832b1622d8	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137	2017-02-24 17:17:33 +00:00
Sanjay Patel	4a4fbe162f	[DAG] add convenience function to get -1 constant; NFCI llvm-svn: 296004	2017-02-23 19:02:33 +00:00
Bill Seurer	8e48f416ad	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Matt Arsenault	f0a4823b91	DAG: Check if extract_vector_elt is legal or custom Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782	2017-02-21 22:47:27 +00:00
Simon Pilgrim	c0dc9a4913	Strip trailing whitespace. llvm-svn: 295653	2017-02-20 11:56:43 +00:00
Simon Pilgrim	50b958c07a	[SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes. Thanks to Mikael Holmén for the initial test case llvm-svn: 295652	2017-02-20 11:55:58 +00:00
Artyom Skrobov	be31754094	Remove redundant call to GluedNodes.back() [NFC] llvm-svn: 295607	2017-02-19 16:56:18 +00:00
Sanjay Patel	7f2e58972c	[DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI I can't find any tests of the non-i1 code path, so it may be unnecessary at this point. llvm-svn: 295463	2017-02-17 17:13:27 +00:00
Simon Pilgrim	0429c0cf8b	Fix signed/unsigned comparison warning. llvm-svn: 295453	2017-02-17 16:01:16 +00:00
Simon Pilgrim	511d788a95	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451	2017-02-17 15:14:48 +00:00
Sanjay Patel	5573042035	[DAGCombiner] improve readability; NFCI llvm-svn: 295447	2017-02-17 14:21:59 +00:00
Artur Pilipenko	85d758299e	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Matt Arsenault	5de8dc9cf5	DAG: Do not scalarize fsub if fneg is legal Tests will be included with future commit. llvm-svn: 295242	2017-02-15 22:02:42 +00:00
Michael Kuperstein	ba80db39d7	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213	2017-02-15 18:37:26 +00:00
Craig Topper	96ec7a23e3	[SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask Summary: The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract. This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract. Reviewers: zvi, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29926 llvm-svn: 295152	2017-02-15 05:57:16 +00:00
Aditya Nandakumar	bb0483bc8e	[Tablegen] Instrumenting table gen DAGGenISelDAG To help assist in debugging ISEL or to prioritize GlobalISel backend work, this patch adds two more tables to <Target>GenISelDAGISel.inc - one which contains the patterns that are used during selection and the other containing include source location of the patterns Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV llvm-svn: 295081	2017-02-14 18:32:41 +00:00
Artyom Skrobov	dc66a82dc7	Removing a redundant assignment llvm-svn: 295055	2017-02-14 14:44:01 +00:00
Arnold Schwaighofer	8f3df731dc	swiftcc: Don't emit tail calls from callers with swifterror parameters Backends don't support this yet. They would have to move to the swifterror register before the tail call to make sure it is live-in to the call. rdar://30495920 llvm-svn: 294982	2017-02-13 19:58:28 +00:00
Quentin Colombet	fbae5fcb96	[FastISel] Add a diagnostic to warm on fallback. This is consistent with what we do for GlobalISel. That way, it is easy to see whether or not FastISel is able to fully select a function. At some point we may want to switch that to an optimization remark. llvm-svn: 294970	2017-02-13 17:38:59 +00:00
Craig Topper	3668bde371	[DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932	2017-02-13 04:53:33 +00:00
Craig Topper	aa46204ed9	[DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR. This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors. llvm-svn: 294930	2017-02-12 23:49:49 +00:00
Sanjay Patel	0557a44287	[TargetLowering] fix SETCC SETLT folding with FP types The bug was introduced with: https://reviews.llvm.org/rL294863 ...and manifests as a selection failure in x86, but that's actually another bug. This fix prevents wrong codegen with -0.0, but in the more common case when we have NSZ and NNAN (-ffast-math), we should still be able to fold this setcc/compare. llvm-svn: 294924	2017-02-12 23:07:52 +00:00
Craig Topper	b633adedc7	[DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats. llvm-svn: 294875	2017-02-11 22:57:09 +00:00
Sanjay Patel	63499b61c9	[TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits I don't know if anything other than x86 vectors is affected by this change, but this may allow us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises from the fact that blendv* instructions only use the sign-bit when deciding which vector element to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already exists in x86 lowering; this demanded bits change just enables the transform to fire more often. The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, but I've explained the likely steps in this series here: https://llvm.org/bugs/show_bug.cgi?id=11210 Differential Revision: https://reviews.llvm.org/D29687 llvm-svn: 294863	2017-02-11 18:01:55 +00:00
Simon Pilgrim	bfb1747806	[DAGCombine] Allow vector constant folding of any value type before type legalization The patch comes in 2 parts: 1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types. 2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled. Fix for PR30760 Differential Revision: https://reviews.llvm.org/D29568 llvm-svn: 294749	2017-02-10 14:37:25 +00:00
Craig Topper	a9f1121896	[SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization Summary: With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization. I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary. Having said that though, I think we should probably fix legalize vector ops to log what its doing. Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc Reviewed By: RKSimon Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D29554 llvm-svn: 294711	2017-02-10 05:05:57 +00:00
Geoff Berry	7e320c2485	[SelectionDAG] Fix bugs in inverted condition splitting code. Summary: Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by Mikael Holmen. Handle non-canonicalized xor not operation correctly (was assuming operand 0 was always the non-constant operand) and check that the negated condition is also in the same block as the original and/or instruction (as is done for and/or operands already) before proceeding with optimization. Reviewers: bogner, MatzeB, qcolombet Subscribers: mcrosier, uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D29680 llvm-svn: 294605	2017-02-09 18:28:17 +00:00
Artur Pilipenko	4a64031954	[DAGCombiner] Support non-zero offset in load combine Enable folding patterns which load the value from non-zero offset: i8 a = ... i32 val = a[4] \| (a[5] << 8) \| (a[6] << 16) \| (a[7] << 24) => i32 val = ((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 llvm-svn: 294582	2017-02-09 12:06:01 +00:00
Artur Pilipenko	045ab08252	[DAGCombiner] NFC. Mark ByteProvider accessors as const llvm-svn: 294494	2017-02-08 17:59:34 +00:00
Amaury Sechet	4b946916ac	[DAGCombiner] Push truncate through adde when the carry isn't used. Summary: As per title. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29528 llvm-svn: 294394	2017-02-08 00:32:36 +00:00
Reid Kleckner	0887d44a61	[SDAGISel] Simplify some SDAGISel code, NFC Hoist entry block code for arguments and swift error values out of the basic block instruction selection loop. Lowering arguments once up front seems much more readable than doing it conditionally inside the loop. It also makes it clear that argument lowering can update StaticAllocaMap because no instructions have been selected yet. Also use range-based for loops where possible. llvm-svn: 294329	2017-02-07 18:42:53 +00:00
Sanjay Patel	8c99ca3df0	[TargetLowering] fix formatting and comments for ShrinkDemandedConstant; NFC llvm-svn: 294325	2017-02-07 18:04:26 +00:00
Daniel Jasper	84b3cc394d	Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)" This reverts commit r294186. On an internal test, this triggers an out-of-memory error on PPC, presumably because there is another dagcombine that does the exact opposite triggering and endless loop consuming more and more memory. Chandler has started at creating a reduced test case and we'll attach it as soon as possible. llvm-svn: 294288	2017-02-07 08:57:50 +00:00
Artur Pilipenko	d3464bf9ad	[DAGCombiner] Support bswap as a part of load combine patterns Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29397 llvm-svn: 294201	2017-02-06 17:48:08 +00:00
Amaury Sechet	e674f5c758	Add ADDC to SelectionDAG::computeKnownBits and ComputeNumSignBits. Summary: As per title. Reviewers: bkramer, sunfish, lattner, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29521 llvm-svn: 294188	2017-02-06 14:59:06 +00:00
Amaury Sechet	8a3b32941d	[DAGCombiner] Make DAGCombiner smarter about overflow Summary: Leverage it to transform addc into add. Reviewers: mkuper, spatel, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29524 llvm-svn: 294187	2017-02-06 14:54:49 +00:00
Amaury Sechet	1d466f598e	[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry) Summary: This is extracted from D29443 . Reviewers: mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29564 llvm-svn: 294186	2017-02-06 14:28:39 +00:00
Simon Pilgrim	bfd4495512	[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183	2017-02-06 13:44:45 +00:00
Geoff Berry	76ca8c2b34	[SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register. Summary: Without this change, the getVR() call would hit an assert since it was being passed a physical register. Update the AArch64/ldst-opt.ll test with a case that triggers this behavior by adding a run with strict-align, which causes an unaligned STR XZR instruction to be split into byte stores, creating an EXTRACT_SUBREG of XZR that triggers the original problem. Reviewers: bogner, qcolombet, MatzeB, atrick Subscribers: aemerson, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D29495 llvm-svn: 294129	2017-02-05 18:28:14 +00:00
Amaury Sechet	143902c29f	[DAGCombiner] Leverage add's commutativity Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case. Reviewers: mkuper, spatel, bkramer, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29541 llvm-svn: 294125	2017-02-05 14:22:20 +00:00
Craig Topper	42b83f8d6e	[DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs. Based on similar code for INSERT_VECTOR_ELT. llvm-svn: 294110	2017-02-04 23:26:39 +00:00
Craig Topper	04dce84ead	[DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC llvm-svn: 294109	2017-02-04 23:26:37 +00:00
Craig Topper	ceaf9c1633	[DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS. llvm-svn: 294108	2017-02-04 23:26:34 +00:00
Amaury Sechet	6e2d8e49ec	Formatting in DAGCombiner. NFC llvm-svn: 294091	2017-02-04 13:01:53 +00:00
Eugene Zelenko	502d0bc28e	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). This is preparation to reduce TargetInstrInfo.h dependencies. llvm-svn: 294084	2017-02-04 02:00:53 +00:00
Ahmed Bougacha	9677cc6fb7	[TLI] Robustize SDAG LibFunc proto checking by merging it into TLI. This re-applies commit r292189, reverted in r292191. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to a few tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. - a handful of SystemZ tests check the SDAG support for lax prototype checking: Ulrich agrees on removing them. I don't think it's worth supporting any of these (IMO) invalid testcases. Instead, fix them to be more meaningful. llvm-svn: 294028	2017-02-03 19:11:19 +00:00
Alexey Bataev	a0d9f2582b	[SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() != ISD::DELETED_NODE && "NodeToMatch was removed partway through selection"' failed. NodeToMatch can be modified during matching, but code does not handle this situation. Differential Revision: https://reviews.llvm.org/D29292 llvm-svn: 294003	2017-02-03 12:28:40 +00:00
Nirav Dave	93f9d5ce04	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915	2017-02-02 18:24:55 +00:00
Amaury Sechet	f3e421d6e9	Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC llvm-svn: 293903	2017-02-02 16:07:44 +00:00
Nirav Dave	4442667fc5	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893	2017-02-02 14:39:42 +00:00
Florian Hahn	7a5ec55fb3	[legalizetypes] Push fp16 -> fp32 extension node to worklist. Summary: This way, the type legalization machinery will take care of registering the result of this node properly. This patches fixes all failing fp16 test cases with expensive checks. (CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll CodeGen/X86/soft-fp.ll) Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel Reviewed By: hfinkel Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28195 llvm-svn: 293765	2017-02-01 13:01:33 +00:00
Nicolai Haehnle	8813d5d221	[DAGCombine] require UnsafeFPMath for re-association of addition Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635	2017-01-31 14:35:37 +00:00
Simon Pilgrim	ffe2535cf6	Use SelectionDAG::getBuildVector helper function where possible. NFCI. llvm-svn: 293532	2017-01-30 18:53:45 +00:00
Justin Bogner	8f520a73b2	SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522	2017-01-30 18:29:46 +00:00
Simon Pilgrim	0a5ab5c4db	Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI. llvm-svn: 293520	2017-01-30 18:20:42 +00:00
Matt Arsenault	32e6bfa20f	DAG: Fold fneg into compare with constant into the constant fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512	2017-01-30 17:57:28 +00:00
Matt Arsenault	0c687390fe	DAG: Constant fold fp16_to_fp/fp16_to_fp This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499	2017-01-30 16:57:41 +00:00
Craig Topper	135da1faf5	[SelectionDAG] Make SDNode::getConstantOperandVal an inline method. It's operation already exists manually in many places without using the method. llvm-svn: 293421	2017-01-29 06:08:02 +00:00
Craig Topper	4753736abf	[DAGCombiner] Use unsigned for a constant vector index instead of APInt. The type system requires that the number of vector elements should fit in 32-bits so this should be safe. llvm-svn: 293414	2017-01-29 04:38:21 +00:00
Craig Topper	d15730902b	[DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR. The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue. llvm-svn: 293413	2017-01-29 04:38:19 +00:00
Craig Topper	24cdbe8fa6	[DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal. llvm-svn: 293412	2017-01-29 04:38:16 +00:00
Matthias Braun	8c209aa877	Cleanup dump() functions. We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359	2017-01-28 02:02:38 +00:00
Jonas Paulsson	bb0ed3e732	[DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert(). In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a partial number of available vector elements), WidenVecRes_Convert() used to resort to scalarization. This patch adds a handling of the (common) case where an input vector can be found of same width as the widened result vector, by converting the node to SIGN/ZERO_EXTEND_VECTOR_INREG. Review: Eli Friedman llvm-svn: 293268	2017-01-27 07:46:26 +00:00
Andrew Kaylor	a0a1164ce4	Add intrinsics for constrained floating point operations This commit introduces a set of experimental intrinsics intended to prevent optimizations that make assumptions about the rounding mode and floating point exception behavior. These intrinsics will later be extended to specify flush-to-zero behavior. More work is also required to model instruction dependencies in machine code and to generate these instructions from clang (when required by pragmas and/or command line options that are not currently supported). Differential Revision: https://reviews.llvm.org/D27028 llvm-svn: 293226	2017-01-26 23:27:59 +00:00
Nirav Dave	d32a421f75	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188	2017-01-26 16:46:13 +00:00
Nirav Dave	de6516c466	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184	2017-01-26 16:02:24 +00:00
Craig Topper	001aad7da7	[DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors. llvm-svn: 293152	2017-01-26 05:38:46 +00:00
Tim Northover	470f070b7d	SDag: fix how initial loads are formed when splitting vector ops. Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. llvm-svn: 293088	2017-01-25 20:58:26 +00:00
Krzysztof Parzyszek	ee9aa3ffee	Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC llvm-svn: 293077	2017-01-25 19:29:04 +00:00
Artur Pilipenko	bc93452420	Fix buildbot failures introduced by 293036 Fix unused variable, specify types explicitly to make VC compiler happy. llvm-svn: 293039	2017-01-25 09:10:07 +00:00
Artur Pilipenko	41c0005aa3	[DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2 . The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 llvm-svn: 293036	2017-01-25 08:53:31 +00:00
Matt Arsenault	732a531506	DAG: Recognize no-signed-zeros-fp-math attribute clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024	2017-01-25 06:08:42 +00:00
Matt Arsenault	8a27aee6ae	DAGCombiner: Allow negating ConstantFP after legalize llvm-svn: 293019	2017-01-25 04:54:34 +00:00
Geoff Berry	92a286ae5a	[SelectionDAG] Handle inverted conditions when splitting into multiple branches. Summary: When conditional branches with complex conditions are split into multiple branches in SelectionDAGBuilder::FindMergedConditions, also handle inverted conditions. These may sometimes appear without having been optimized by InstCombine when CodeGenPrepare decides to sink and duplicate cmp instructions, causing them to have only one use. This problem can be increased by e.g. GVNHoist hiding more cmps from InstCombine by combining equivalent cmps from different blocks. For example codegen X & !(Y \| Z) as: jmp_if_X TmpBB jmp FBB TmpBB: jmp_if_notY Tmp2BB jmp FBB Tmp2BB: jmp_if_notZ TBB jmp FBB Reviewers: bogner, MatzeB, qcolombet Subscribers: llvm-commits, hiraditya, mcrosier, sebpop Differential Revision: https://reviews.llvm.org/D28380 llvm-svn: 292944	2017-01-24 16:36:07 +00:00
Craig Topper	ff272ad4f3	[SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR Summary: This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations. For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there. Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines. Reviewers: RKSimon, delena Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29000 llvm-svn: 292876	2017-01-24 02:36:59 +00:00
David L. Jones	d21529fa0d	[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC) Summary: The LibFunc::Func enum holds enumerators named for libc functions. Unfortunately, there are real situations, including libc implementations, where function names are actually macros (musl uses "#define fopen64 fopen", for example; any other transitively visible macro would have similar effects). Strictly speaking, a conforming C++ Standard Library should provide any such macros as functions instead (via <cstdio>). However, there are some "library" functions which are not part of the standard, and thus not subject to this rule (fopen64, for example). So, in order to be both portable and consistent, the enum should not use the bare function names. The old enum naming used a namespace LibFunc and an enum Func, with bare enumerators. This patch changes LibFunc to be an enum with enumerators prefixed with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override macros.) There are additional changes required in clang. Reviewers: rsmith Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28476 llvm-svn: 292848	2017-01-23 23:16:46 +00:00
Matt Arsenault	4e305c6c1e	DAG: Don't fold vector extract into load if target doesn't want to Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842	2017-01-23 22:48:53 +00:00
Matt Arsenault	7030661364	DAG: Allow legalization of fcanonicalize vector types llvm-svn: 292814	2017-01-23 18:52:26 +00:00
Simon Pilgrim	fb32eea1b4	[SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293) This patch improves the knownbits logic for unsigned integer min/max opcodes. For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits. This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set. Differential Revision: https://reviews.llvm.org/D28853 llvm-svn: 292528	2017-01-19 22:41:22 +00:00
Mikael Holmen	2074e7497b	[DAG] Don't increase SDNodeOrder for dbg.value/declare. Summary: The SDNodeOrder is saved in the IROrder field in the SDNode, and this field may affects scheduling. Thus, letting dbg.value/declare increase the order numbers may in turn affect scheduling. Because of this change we also need to update the code deciding when dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues. Dbg values now have the same order as the SDNode they are connected to, not the following orders. Test cases provided by Florian Hahn. Reviewers: bogner, aprantl, sunfish, atrick Reviewed By: atrick Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D25318 llvm-svn: 292485	2017-01-19 13:55:55 +00:00
Matt Arsenault	f411071d63	DAG: Consider nnan in isKnownNeverNaN llvm-svn: 292328	2017-01-18 02:10:08 +00:00
Ahmed Bougacha	9e5a085cf1	Revert "[TLI] Robustize SDAG proto checking by merging it into TLI." This reverts commit r292189, as it causes issues on SystemZ bots. llvm-svn: 292191	2017-01-17 03:31:00 +00:00
Ahmed Bougacha	c018efd680	[TLI] Robustize SDAG proto checking by merging it into TLI. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to two tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. I don't think it's worth supporting either of these (IMO) broken testcases. Instead, fix them to be more correct. llvm-svn: 292189	2017-01-17 03:10:06 +00:00
Simon Pilgrim	3e91519a1c	[SelectionDAG] Add knownbits support for BITREVERSE llvm-svn: 292130	2017-01-16 14:49:26 +00:00
Simon Pilgrim	db73dbcc7c	[SelectionDAG] Add support for BITREVERSE constant folding We were relying on constant folding of the legalized instructions to do what constant folding we had previously llvm-svn: 292114	2017-01-16 13:39:00 +00:00
Malcolm Parsons	17d266bc96	Remove unused lambda captures. NFC llvm-svn: 291916	2017-01-13 17:12:16 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Craig Topper	7af39837a9	Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent." Some test appears to be hanging on the build bots. llvm-svn: 291650	2017-01-11 04:59:25 +00:00
Craig Topper	577d258569	[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent. llvm-svn: 291645	2017-01-11 04:02:23 +00:00
Matt Arsenault	e482403e1c	DAGCombiner: Add hasOneUse checks to fadd/fma combine Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642	2017-01-11 02:02:12 +00:00
Eugene Zelenko	c4ad1ce068	[Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291641	2017-01-11 01:45:03 +00:00
Matt Arsenault	def496c04b	Remove unused CONVERT_RNDSAT intrinsics llvm-svn: 291607	2017-01-10 22:38:02 +00:00
Matt Arsenault	0b382a7cb8	DAG: Avoid OOB when legalizing vector indexing If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604	2017-01-10 22:02:30 +00:00
Simon Dardis	548a53f5ee	[mips] Fix Mips MSA instrinsics The usage of some MIPS MSA instrinsics that took immediates could crash LLVM during lowering. This patch addresses that behaviour. Crucially this patch also makes the use of intrinsics with out of range immediates as producing an internal error. The ld,st instrinsics would trigger an assertion failure for MIPS64 as their lowering would attempt to add an i32 offset to a i64 pointer. Reviewers: vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D25438 llvm-svn: 291571	2017-01-10 16:40:57 +00:00
Craig Topper	588c734b0f	[DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC I guess previously we just assumed if the result type was i1, then the condition type must also be i1? llvm-svn: 291548	2017-01-10 07:42:57 +00:00
Craig Topper	d915d6ba57	[DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it. llvm-svn: 291534	2017-01-10 04:12:19 +00:00
Bjorn Pettersson	b14afd452d	[SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486. Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441	2017-01-09 12:03:50 +00:00
David Majnemer	9e04befb09	[SelectionDAG] Rework lowerRangeToAssertZExt Utilize ConstantRange to make it easier to interpret range metadata. llvm-svn: 291211	2017-01-06 02:43:28 +00:00
David Majnemer	eaba06cffa	[SelectionDAG] Correctly transform range metadata to AssertZExt We used the logBase2 of the high instead of the ceilLogBase2 resulting in the wrong result for certain values. For example, it resulted in an i1 AssertZExt when the exclusive portion of the range was 3. llvm-svn: 291196	2017-01-06 00:11:46 +00:00
Tim Shen	5480eb8445	[Legalizer] Fix fp-to-uint to fp-tosint promotion assertion. Summary: When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero extended. For example, given double 65534.0, without legalization: fp-to-uint16: 65534.0 -> 0xfffe With the legalization: fp-to-sint32: 65534.0 -> 0x0000fffe Without this patch, legalization wrongly emits a signed extend assertion, which is consumed by later icmp instruction, and cause miscompile. Note that the floating point value must be in [0, 65535), otherwise the behavior is undefined. This patch reverts r279223 behavior and adds more tests and documentations. In PR29041's context, James Molloy mentioned that: We don't need to mask because conversion from float->uint8_t is undefined if the integer part of the float value is not representable in uint8_t. Therefore we can assume this doesn't happen! which is totally true and good, because fptoui is documented clearly to have undefined behavior when overflow/underflow happens. We should take the advantage of this behavior so that we can save unnecessary mask instructions. Reviewers: jmolloy, nadav, echristo, kbarton Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28284 llvm-svn: 291015	2017-01-04 22:11:42 +00:00
Evgeny Stupachenko	c88697dc16	The patch fixes (base, index, offset) match. Summary: Instead of matching: (a + i) + 1 -> (a + i, undef, 1) Now it matches: (a + i) + 1 -> (a, i, 1) Reviewers: rengolin Differential Revision: http://reviews.llvm.org/D26367 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 291012	2017-01-04 21:43:39 +00:00
Florian Hahn	f872d230ad	[selectiondag] Check PromotedFloats map during expansive checks. Summary: `PromotedFloats` needs to be checked in `DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type legalization failures with expansive checks for ARM fp16 tests. Reviewers: baldrick, bogner, arsenm Subscribers: arsenm, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28187 llvm-svn: 290796	2017-01-01 13:58:27 +00:00
Reid Kleckner	0e7c84c682	Simplify FunctionLoweringInfo.cpp with range for loops I'm preparing to add some pattern matching code here, so simplify the code before I do. NFC llvm-svn: 290731	2016-12-30 00:21:38 +00:00
Igor Laevsky	4f31e52f94	Introduce element-wise atomic memcpy intrinsic This change adds a new intrinsic which is intended to provide memcpy functionality with additional atomicity guarantees. Please refer to the review thread or language reference for further details. Differential Revision: https://reviews.llvm.org/D27133 llvm-svn: 290708	2016-12-29 14:31:07 +00:00
Simon Pilgrim	0d66d29678	[SelectionDAG] Early out from computeKnownBits when we know we will have no common bits. Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits. llvm-svn: 290490	2016-12-24 12:59:35 +00:00
Zijiao Ma	bf6007bd1b	Make the canonicalisation on shifts benifit to more case. 1.Fix pessimized case in FIXME. 2.Add tests for it. 3.The canonicalisation on shifts results in different sequence for tests of machine-licm.Correct some check lines. Differential Revision: https://reviews.llvm.org/D27916 llvm-svn: 290410	2016-12-23 02:56:07 +00:00
Wei Mi	f3f01aba48	Change the interface of TLI.isMultiStoresCheaperThanBitsMerge. This is for splitMergedValStore in DAG Combine to share the target query interface with similar logic in CodeGenPrepare. Differential Revision: https://reviews.llvm.org/D24707 llvm-svn: 290363	2016-12-22 19:38:22 +00:00
Matt Arsenault	485dacd90c	DAG: Add helper for testing constant values There are helpers for testing for constant or constant build_vector, and for splat ConstantFP vectors, but not for a constantfp or non-splat ConstantFP vector. llvm-svn: 290317	2016-12-22 04:39:45 +00:00
Oren Ben Simhon	3b95157090	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240	2016-12-21 08:31:45 +00:00
Joel Jones	8980ba643e	Fix name typo in SelectonDAG llvm-svn: 289969	2016-12-16 18:22:54 +00:00
Chandler Carruth	ba5de63bc3	Add extra headers that got deleted by my revert in r289916 but for which new usage had already grown in the file. llvm-svn: 289917	2016-12-16 04:08:31 +00:00
Chandler Carruth	4154062b69	Revert patch series introducing the DAG combine to match a load-by-bytes idiom. r289538: Match load by bytes idiom and fold it into a single load r289540: Fix a buildbot failure introduced by r289538 r289545: Use more detailed assertion messages in the code ... r289646: Add a couple of assertions to the load combine code ... This DAG combine has a bad crash in it that is quite hard to trigger sadly -- it relies on sneaking code with UB through the SDAG build and into this particular combine. I've responded to the original commit with a test case that reproduces it. However, the code also has other problems that will require substantial changes to address and so I'm going ahead and reverting it for now. This should unblock us and perhaps others that are hitting the crash in the wild and will let a fresh patch with updated approach come in cleanly afterward. Sorry for any trouble or disruption! llvm-svn: 289916	2016-12-16 04:05:22 +00:00
Eli Friedman	379294676d	Don't combine splats with other shuffles. We sometimes end up creating shuffles which are worse than the obvious translation of the IR. Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 . Differential Revision: https://reviews.llvm.org/D27793 llvm-svn: 289882	2016-12-15 22:41:40 +00:00
Eli Friedman	34505083c6	Don't combine a shuffle of two BUILD_VECTORs with duplicate elements. Targets can't handle this case well in general; we often transform a shuffle of two cheap BUILD_VECTORs to element-by-element insertion, which is very inefficient. Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially fixes https://llvm.org/bugs/show_bug.cgi?id=31301. Differential Revision: https://reviews.llvm.org/D27787 llvm-svn: 289874	2016-12-15 21:36:59 +00:00
Sanjay Patel	afee21a5b2	[DAG] allow more select folding for targets that have 'and not' (PR31175) The original motivation for this patch comes from wanting to canonicalize more IR to selects and also canonicalizing min/max. If we're going to do that, we need more backend fixups to undo select codegen when simpler ops will do. I chose AArch64 for the tests because that shows the difference in the simplest way. This should fix: https://llvm.org/bugs/show_bug.cgi?id=31175 Differential Revision: https://reviews.llvm.org/D27489 llvm-svn: 289738	2016-12-14 22:59:14 +00:00
Nirav Dave	f5bf03c7ef	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667	2016-12-14 16:43:44 +00:00
Nirav Dave	8527ab0ad2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659	2016-12-14 15:44:26 +00:00
Simon Pilgrim	05ab8ffc7e	[DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2 Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases. Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value. Differential Revision: https://reviews.llvm.org/D27714 llvm-svn: 289654	2016-12-14 15:08:13 +00:00
Stephan Bergmann	17c7f70362	Replace APFloatBase static fltSemantics data members with getter functions At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647	2016-12-14 11:57:17 +00:00
Artur Pilipenko	f3ee444010	Add a couple of assertions to the load combine code introduced by r289538 llvm-svn: 289646	2016-12-14 11:55:47 +00:00
Artur Pilipenko	469fcd2afd	Use more detailed assertion messages in the code introduced by r289538 llvm-svn: 289545	2016-12-13 16:26:15 +00:00
Artur Pilipenko	79d1255e26	Fix a buildbot failure introduced by r289538 Build failed because of unused variable in product mode. llvm-svn: 289540	2016-12-13 14:55:31 +00:00
Artur Pilipenko	c93cc5955f	[DAGCombiner] Match load by bytes idiom and fold it into a single load Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: hfinkel, RKSimon, filcab Differential Revision: https://reviews.llvm.org/D26149 llvm-svn: 289538	2016-12-13 14:21:14 +00:00
Artur Pilipenko	01e86444a0	Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user llvm-svn: 289537	2016-12-13 14:16:02 +00:00
Simon Pilgrim	9dc67c0101	[SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI. We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this. llvm-svn: 289534	2016-12-13 13:36:27 +00:00
Philip Reames	51387a8c28	[Statepoints] Reuse stack slots more than once within a basic block The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse. The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again. Differential Revision: https://reviews.llvm.org/D25243 llvm-svn: 289509	2016-12-13 01:21:15 +00:00
Simon Pilgrim	040a36c176	[SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits Pre-commit as discussed on D27657 llvm-svn: 289425	2016-12-12 10:29:43 +00:00
Simon Pilgrim	54945a12ec	[SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector. Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types. llvm-svn: 289329	2016-12-10 17:00:00 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Matt Arsenault	38d8ed2b75	AMDGPU: Fix i128 mul llvm-svn: 289231	2016-12-09 17:49:14 +00:00
Nirav Dave	bedb5d906c	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	fd51ff4fd8	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Simon Pilgrim	b9eb99f570	Use SelectionDAG.getSplatBuildVector helper. NFCI. llvm-svn: 289220	2016-12-09 16:01:50 +00:00
Simon Pilgrim	bf9c0e7434	[SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI. Makes interception of BUILD_VECTOR creation easier for debugging. llvm-svn: 289218	2016-12-09 15:23:41 +00:00
Simon Pilgrim	15f1f828b5	[SelectionDAG] Add additional checks to CONCAT_VECTORS creation Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements. llvm-svn: 289214	2016-12-09 14:27:52 +00:00
Simon Pilgrim	e4050a2961	[SelectionDAG] Add partial BITCAST support to computeKnownBits Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together. We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them. Differential Revision: https://reviews.llvm.org/D27129 llvm-svn: 289200	2016-12-09 10:13:45 +00:00
Daniel Jasper	f51e05ffbc	Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes" This reverts commit r288916 as it is currently causing a crasher in Halide. Reproducer on llvm.org/PR31323. While it might be that halide is generating invalid IR, llc shouldn't crash. llvm-svn: 289194	2016-12-09 09:04:51 +00:00
Nicolai Haehnle	f08dc90253	[SelectionDAG] Add expansion and promotion of [US]MUL_LOHI Summary: Most targets set the action for these nodes to Expand even though there isn't actually any code for them in ExpandNode. Instead, targets simply relied on the fact that no code generates these nodes as long as the nodes aren't legal or custom. However, generating these nodes can be useful e.g. for divide-by-constant in wider integer types. Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and a sequence of half-width multiplications otherwise. Promote uses a wider multiply. This patch intends to not change the generated code, but indirect effects are possible since expansions/promotions that were previously done in DAGCombine may now be done in LegalizeDAG. See D24822 for a change that actually uses the new expansion. Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24956 llvm-svn: 289050	2016-12-08 14:08:14 +00:00
Simon Pilgrim	ba05d41095	[SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes llvm-svn: 288926	2016-12-07 17:54:00 +00:00
Simon Pilgrim	967325b373	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes llvm-svn: 288916	2016-12-07 16:28:21 +00:00
Simon Pilgrim	ff79f31328	[SelectionDAG] Removed old knownbits TODO comment. NFCI. EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range. llvm-svn: 288913	2016-12-07 15:31:12 +00:00
Eli Friedman	0a76e3241f	[CodeGen] Fix result type for SMULO/UMULO legalization On some platforms (like MSP430) the second element of the result structure for SMULO/UMULO may have a shorter type than the one returned by SetCC. We need to truncate it to the right type, or else some incorrect code may be generated later on. This fixes issue https://github.com/rust-lang/rust/issues/37829 Patch by Vadzim Dambrouski! Differential Revision: https://reviews.llvm.org/D27154 llvm-svn: 288857	2016-12-06 22:49:36 +00:00
Simon Pilgrim	dd6ca639d5	[DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted. Differential Revision: https://reviews.llvm.org/D27461 llvm-svn: 288842	2016-12-06 19:09:37 +00:00
Simon Pilgrim	1577b39f51	[SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element llvm-svn: 288839	2016-12-06 18:58:25 +00:00
Simon Pilgrim	29c17f3f58	Avoid repeated calls to Op.getOpcode(). NFCI. llvm-svn: 288814	2016-12-06 14:50:09 +00:00
Sanjay Patel	1f158d6955	[TargetLowering] add special-case for demanded bits analysis of 'not' We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get folded into another logic op if the target has those. However, if we can remove a logic instruction by changing the xor's constant mask value, that should always be a win. Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case currently (although that's marked with a FIXME). So if you run this IR through -instcombine, you should get the same end result. I'm hoping to add a different backend transform that will expose this problem though, so I need to solve this first. Differential Revision: https://reviews.llvm.org/D27356 llvm-svn: 288676	2016-12-05 15:58:21 +00:00
Matt Arsenault	92fede361f	DAG: Fold out out of bounds insert_vector_elt getNode already prevents formation of out of bounds constant extract_vector_elts. Do the same for insert_vector_elt. llvm-svn: 288603	2016-12-03 23:03:26 +00:00
Nicolai Haehnle	33ca182c91	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506	2016-12-02 16:06:18 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Justin Bogner	35c5e58f8c	SDAG: Avoid a large, usually empty SmallVector in a recursive function This SmallVector is using up 128 bytes on the stack every time despite almost always being empty[1], and since this function can recurse quite deeply that adds up to a lot of overhead. We've seen this run afoul of ulimits in some cases with ASAN on. Replacing the SmallVector with a std::vector trades an occasional heap allocation for vastly less stack usage. [1]: I gathered some stats on an internal test suite and the vector was non-empty in only 45,000 of 10,000,000 calls to this function. llvm-svn: 288441	2016-12-02 00:11:01 +00:00
Matthias Braun	d0ee66c2e9	Move most EH from MachineModuleInfo to MachineFunction Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405	2016-12-01 19:32:15 +00:00
Nicolai Haehnle	da7e4017c6	[SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC) Summary: Suggested by @spatel in D26602. Reviewers: spatel, hfinkel Subscribers: spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D27260 llvm-svn: 288336	2016-12-01 14:04:13 +00:00
Eric Christopher	e70b7c3dfb	Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction" This apprears to have broken the global isel bot: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console This reverts commit r288293. llvm-svn: 288322	2016-12-01 07:50:12 +00:00
Matthias Braun	ed14cb0604	Move most EH from MachineModuleInfo to MachineFunction Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288293	2016-11-30 23:49:01 +00:00
Matthias Braun	ef331eff5a	Move VariableDbgInfo from MachineModuleInfo to MachineFunction VariableDbgInfo is per function data, so it makes sense to have it with the function instead of the module. This is a necessary step to have machine module passes work properly. Differential Revision: https://reviews.llvm.org/D27186 llvm-svn: 288292	2016-11-30 23:48:50 +00:00
Nicolai Haehnle	73a9a27b5a	[SelectionDAG] Refactor TargetLowering::expandMUL (NFC) Summary: Further preparation for the expansion of MUL_LOHI added in D24956. Reviewers: efriedma, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27064 llvm-svn: 288248	2016-11-30 16:26:33 +00:00
Warren Ristow	d9777c1dbb	Test commit. Comment changes. NFC. llvm-svn: 288100	2016-11-29 02:37:13 +00:00
Sanjay Patel	2bd32b05fb	[DAG] clean up foldSelectCCToShiftAnd(); NFCI llvm-svn: 288088	2016-11-28 23:05:55 +00:00
Sanjay Patel	1cf9aff659	[DAG] add helper function for selectcc --> and+shift transforms; NFC llvm-svn: 288073	2016-11-28 21:47:41 +00:00
Nirav Dave	a413361798	Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor" This reverts commit r287773 which caused issues with ppc64le builds. llvm-svn: 288035	2016-11-28 14:30:29 +00:00
Simon Pilgrim	c5fb167df0	Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI llvm-svn: 287941	2016-11-25 17:25:21 +00:00
Craig Topper	8c4cdf06db	[DAGCombine] Teach DAG combine that if both inputs of a vselect are the same, then the condition doesn't matter and the vselect can be removed. Selects with scalar condition already handle this correctly. llvm-svn: 287904	2016-11-24 21:48:52 +00:00
Nicolai Haehnle	934470f536	[SelectionDAG] Early-out in TargetLowering::expandMUL (NFC) Summary: Reduce indentation level; preparation for D24956. Reviewers: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27063 llvm-svn: 287831	2016-11-23 22:14:20 +00:00
Nirav Dave	cf34556330	[DAG] Improve loads-from-store forwarding to handle TokenFactor Forward store values to matching loads down through token factors. Factored from D14834. Reviewers: jyknight, hfinkel Subscribers: hfinkel, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D26080 llvm-svn: 287773	2016-11-23 16:48:35 +00:00
John Brawn	150addb45c	[DAGCombiner] Fix infinite loop in vector mul/shl combining We have the following DAGCombiner transformations: (mul (shl X, c1), c2) -> (mul X, c2 << c1) (mul (shl X, C), Y) -> (shl (mul X, Y), C) (shl (mul x, c1), c2) -> (mul x, c1 << c2) Usually the constant shift is optimised by SelectionDAG::getNode when it is constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing with vectors and one of those vector constants contains an undef element FoldConstantArithmetic does not fold and we enter an infinite loop. Fix this by making FoldConstantArithmetic use getNode to decide how to fold each vector element, the same as FoldConstantVectorArithmetic does, and rather than adding the constant shift to the work list instead only apply the transformation if it's already been folded into a constant, as if it's not we're going to loop endlessly. Additionally add missing NoOpaques to one of those transformations, which I noticed when writing the tests for this. Differential Revision: https://reviews.llvm.org/D26605 llvm-svn: 287766	2016-11-23 16:05:51 +00:00
Elena Demikhovsky	09375d98b8	Type legalization for compressstore and expandload intrinsics. Implemented widening (v2f32) and splitting (v16f64). On splitting, I use "popcnt" to calculate memory increment. More type legalization work will come in the next patches. llvm-svn: 287761	2016-11-23 13:58:24 +00:00
Simon Pilgrim	72e43570b7	[SelectionDAG] ComputeNumSignBits of TRUNCATE operations Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size. Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results. Differential Revision: https://reviews.llvm.org/D26851 llvm-svn: 287635	2016-11-22 11:29:19 +00:00
Matt Arsenault	b30d2aca58	DAG: Ignore call site attributes when emitting target intrinsic A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. llvm-svn: 287593	2016-11-21 22:56:42 +00:00
Simon Pilgrim	5662074ba3	[VectorLegalizer] Remove EVT::getSizeInBits code duplications. NFCI. We were calling SVT.getSizeInBits() several times in a row - just call it once and reuse the result. llvm-svn: 287556	2016-11-21 18:24:44 +00:00
Simon Pilgrim	49d7eda968	[SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode llvm-svn: 287541	2016-11-21 14:36:19 +00:00
Simon Pilgrim	7a6b6d5656	Fix spelling mistakes in SelectionDAG comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287487	2016-11-20 13:14:57 +00:00
Simon Pilgrim	e40900dddd	[SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode llvm-svn: 287387	2016-11-18 22:21:22 +00:00
Matthias Braun	9f15a79e5d	Timer: Track name and description. The previously used "names" are rather descriptions (they use multiple words and contain spaces), use short programming language identifier like strings for the "names" which should be used when exporting to machine parseable formats. Also removed a unused TimerGroup from Hexxagon. Differential Revision: https://reviews.llvm.org/D25583 llvm-svn: 287369	2016-11-18 19:43:18 +00:00
Simon Pilgrim	c4d733cd6a	Fix spelling in comment. NFC. llvm-svn: 287222	2016-11-17 12:03:05 +00:00
Chris Bieneman	05c279fc4b	[CMake] NFC. Updating CMake dependency specifications This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system. llvm-svn: 287206	2016-11-17 04:36:50 +00:00
Ahmed Bougacha	bd6ce9a247	[CodeGen] Pass references, not pointers, to MMI helpers. NFC. While there, rename them to follow the coding style. llvm-svn: 287169	2016-11-16 22:25:03 +00:00
Ahmed Bougacha	456dce8a84	[CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC. They're not SelectionDAG- or FunctionLoweringInfo-specific. They are, however, specific to building MMI from IR. We could make them members, but it's nice having MMI be a "simple" data structure and this logic kept separate. This also lets us reuse them from GlobalISel. llvm-svn: 287167	2016-11-16 22:24:56 +00:00
Pawel Bylica	c3f6c97f71	Integer legalization: fix MUL expansion Summary: This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720. For tests I created a fuzz tester that compares the results with Boost.Multiprecision. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26628 llvm-svn: 286998	2016-11-15 18:29:24 +00:00
Joerg Sonnenberger	1a7eec68a9	Introduce TLI predicative for base-relative Jump Tables. For 64bit ABIs it is common practice to use relative Jump Tables with potentially different relocation bases. As the logic for the jump table itself doesn't depend on the relocation base, make it easier for targets to use the generic logic. Start by dropping the now redundant MIPS logic. Differential Revision: https://reviews.llvm.org/D26578 llvm-svn: 286951	2016-11-15 12:39:46 +00:00
Asaf Badouh	b573553424	DAGCombiner: fix combine of trunc and select bugzilla: https://llvm.org/bugs/show_bug.cgi?id=29002 pr29002 Differential Revision: https://reviews.llvm.org/D26449 llvm-svn: 286938	2016-11-15 07:55:22 +00:00
Simon Pilgrim	807f9cf243	[SelectionDAG] Add support for vector demandedelts in BSWAP opcodes llvm-svn: 286582	2016-11-11 11:51:29 +00:00
Simon Pilgrim	813721e98a	[SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes llvm-svn: 286578	2016-11-11 11:23:43 +00:00
Simon Pilgrim	0652227814	[SelectionDAG] Add support for vector demandedelts in UDIV opcodes llvm-svn: 286576	2016-11-11 10:47:24 +00:00
Evandro Menezes	21f9ce1a0d	[DAG Combiner] Fix the native computation of the Newton series for reciprocals The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523	2016-11-10 23:31:06 +00:00
Simon Pilgrim	38f0045cb0	[SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes llvm-svn: 286516	2016-11-10 22:41:49 +00:00
Simon Pilgrim	fe3a54371d	[SelectionDAG] Add support for splatted vectors in SUB opcode llvm-svn: 286509	2016-11-10 21:57:42 +00:00
Simon Pilgrim	d67af68f06	[SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes llvm-svn: 286481	2016-11-10 17:43:52 +00:00
Simon Pilgrim	33fef8e865	Use common SDLoc. NFCI. llvm-svn: 286473	2016-11-10 16:47:09 +00:00
Simon Pilgrim	ee187fd6e7	[SelectionDAG] Add support for vector demandedelts in MUL opcodes llvm-svn: 286471	2016-11-10 16:27:42 +00:00
Simon Pilgrim	ca57e53ded	[SelectionDAG] Add support for vector demandedelts in SRA opcodes llvm-svn: 286461	2016-11-10 15:05:09 +00:00
Simon Pilgrim	37c9034bd6	[DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes We were failing to extract a constant splat shift value if the shifted value was being masked. The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this. llvm-svn: 286454	2016-11-10 14:35:09 +00:00
Simon Pilgrim	3bf99c056a	[SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes llvm-svn: 286448	2016-11-10 13:52:42 +00:00
Simon Pilgrim	778596bf59	[TargetLowering] Fix undef vector element issue with true/false result handling Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching. The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements.... This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs). The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed. Differential Revision: https://reviews.llvm.org/D26031 llvm-svn: 286238	2016-11-08 15:07:01 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Richard Smith	857efb0880	Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel. Differential Revision: https://reviews.llvm.org/D26292 llvm-svn: 286119	2016-11-07 16:47:20 +00:00
Simon Pilgrim	39df78e384	[SelectionDAG] Add support for vector demandedelts in XOR opcodes llvm-svn: 286075	2016-11-06 16:49:19 +00:00
Simon Pilgrim	dd4809a603	[SelectionDAG] Add support for vector demandedelts in OR opcodes llvm-svn: 286071	2016-11-06 16:29:09 +00:00
Nicolai Haehnle	bea772c6dc	DAGCombiner: fix use-after-free when merging consecutive stores Summary: Have MergeConsecutiveStores explicitly return information about the stores that were merged, so that we can safely determine whether the starting node has been freed. Reviewers: chandlerc, bogner, niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25601 llvm-svn: 285916	2016-11-03 14:25:04 +00:00
Elena Demikhovsky	caaceef4b3	Expandload and Compressstore intrinsics 2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876	2016-11-03 03:23:55 +00:00
Simon Pilgrim	93f2f7fb6c	Use !operator to test if APInt is zero/non-zero. NFCI. Avoids APInt construction and slower comparisons. llvm-svn: 285822	2016-11-02 15:41:15 +00:00
Joerg Sonnenberger	5e31b3ad93	Simplify. llvm-svn: 285802	2016-11-02 12:45:28 +00:00
Sanjay Patel	70c5f02d25	[DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841) This bug was exposed by using nsw/nuw for more aggressive folds in: https://reviews.llvm.org/rL284844 The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(), but we can't just flip flag bits in the DAG; we have to create a new node that has the bits cleared. This should fix: https://llvm.org/bugs/show_bug.cgi?id=30841 llvm-svn: 285656	2016-10-31 23:28:45 +00:00
Sanjay Patel	339a51ac13	[DAG] x \| x --> x llvm-svn: 285522	2016-10-30 18:19:35 +00:00
Sanjay Patel	13aee345ca	[DAG] x & x --> x llvm-svn: 285521	2016-10-30 18:13:30 +00:00
Simon Pilgrim	75a697a17e	[DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander! Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285494	2016-10-29 11:29:39 +00:00
Davide Italiano	86168b23cf	[DAGCombiner] Fix a crash visiting `AND` nodes. Instead of asserting that the shift count is != 0 we just bail out as it's not profitable trying to optimize a node which will be removed anyway. Differential Revision: https://reviews.llvm.org/D26098 llvm-svn: 285480	2016-10-28 23:55:32 +00:00
Justin Bogner	db6b6a7f0c	SDAG: Make sure we use an allocatable reg class when we create this vreg As per the discussion on r280783, if constrainRegClass fails we need to call getAllocatableClass like we did before that commit. llvm-svn: 285467	2016-10-28 22:42:54 +00:00
Simon Pilgrim	d9189891fc	[SelectionDAG] computeKnownBits - early-out if any BUILD_VECTOR element has no known bits No need to check the remaining elements - no common known bits are available. llvm-svn: 285399	2016-10-28 14:07:44 +00:00
Simon Pilgrim	8c043061e5	[SelectionDAG] Tidyup UDIV computeKnownBits implementation No need to clear KnownOne2/KnownZero2 bits as the next call to computeKnownBits will overwrite them anyway llvm-svn: 285398	2016-10-28 13:42:23 +00:00
Simon Pilgrim	755cef1ba8	[SelectionDAG] Increment computeKnownBits recursion depth for SMIN/SMAX/UMIN/UMAX like all other ops llvm-svn: 285397	2016-10-28 13:13:16 +00:00
Juergen Ributzka	5cee232be4	Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits" This seems to have increased LTO compile time bejond 2x of previous builds. See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/ llvm-svn: 285381	2016-10-28 04:01:12 +00:00
Simon Pilgrim	01e755eab1	[DAGCombiner] Add vector demanded elements support to computeKnownBits Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used. I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course. DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit. Differential Revision: https://reviews.llvm.org/D25691 llvm-svn: 285296	2016-10-27 14:29:28 +00:00
Nemanja Ivanovic	275853e777	Do not assume that FP vector operands are never legalized by expanding This patch ensures that if a floating point vector operand is legalized by expanding, it is legalized through the stack rather than by calling DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand is a non-integer type. This fixes PR 30715. llvm-svn: 285231	2016-10-26 19:51:35 +00:00
Tom Stellard	284cf32ab4	LegalizeDAG: Support promoting [US]DIV and [US]REM operations Summary: AMDGPU will need this one i16 is added as a legal type. This is tested by: test/CodeGen/AMDGPU/sdiv.ll test/CodeGen/AMDGPU/sdivrem24.ll test/CodeGen/AMDGPU/udiv.ll test/CodeGen/AMDGPU/udivrem24.ll Reviewers: bogner, efriedma Subscribers: efriedma, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D25699 llvm-svn: 285199	2016-10-26 14:52:25 +00:00
Simon Pilgrim	de86241a09	[DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1)) combine for splatted vectors llvm-svn: 285129	2016-10-25 22:01:09 +00:00
Simon Pilgrim	f534573e8c	[DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectors SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927 llvm-svn: 285123	2016-10-25 21:20:18 +00:00
Simon Pilgrim	4ebb04510a	[DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectors SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927 llvm-svn: 285118	2016-10-25 20:56:42 +00:00
Evandro Menezes	601f4cb9f7	Switch lowering: improve partitioning of jump tables When there's a tie between partitionings of jump tables, consider also cases that result in no jump tables, but in one or a few cases. The motivation is that many contemporary processors typically perform case switches fairly quickly. Differential revision: https://reviews.llvm.org/D25212 llvm-svn: 285099	2016-10-25 19:11:43 +00:00
Zvi Rackover	124470a202	[DAGCombine] Preserve shuffles when one of the vector operands is constant Summary: Do not perform combines such as: vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X)) -> build_vector(X, C0, C1, C2) Keeping the shuffle allows lowering the constant build_vector to a materialized constant vector (such as a vector-load from the constant-pool or some other idiom). Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25524 llvm-svn: 285063	2016-10-25 12:14:19 +00:00
Simon Pilgrim	e3e6585c2d	[SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar or vector splats Use isConstOrConstSplat helper. Also use APInt instead of getZExtValue directly to avoid out of range issues. llvm-svn: 285033	2016-10-24 21:47:19 +00:00
Simon Pilgrim	d8ec09c74f	Use SDValue::getConstantOperandVal() helper. NFCI. llvm-svn: 285025	2016-10-24 20:56:52 +00:00
Peter Collingbourne	16e9b944e9	CodeGen: Do not add a global's address space to the folding set profile. It is already part of the type (which is part of the global, which is already being added), so there's no need to do it. llvm-svn: 285002	2016-10-24 18:56:09 +00:00
Sanjay Patel	9ca028c2d6	[DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constant llvm-svn: 284953	2016-10-23 23:13:31 +00:00
Simon Pilgrim	d06641d3dc	Use SDValue::getConstantOperandVal() helper. NFCI. llvm-svn: 284949	2016-10-23 20:17:21 +00:00
Sanjay Patel	ca92c36e01	[DAG] enhance computeKnownBits to handle SHL with vector splat constant Also, use APInt to avoid crashing on types larger than vNi64. llvm-svn: 284874	2016-10-21 20:16:27 +00:00
Sanjay Patel	81029f6a76	[DAG] fold negation of sign-bit 0 - X --> 0, if the sub is NUW 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW 0 - X --> X, if X is 0 or the minimum signed value This is the DAG equivalent of: https://reviews.llvm.org/rL284649 plus the fold for the NUW case which already existed in InstSimplify. Note that we miss a vector fold because of a deficiency in the DAG version of computeKnownBits(). llvm-svn: 284844	2016-10-21 17:24:26 +00:00
Sanjay Patel	cbaba93ce8	[DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds As discussed in D24815, let's start the process of killing off the broken fast-math global state housed in TargetOptions and eliminate the need for function-level fast-math attributes. Here we enable two similar folds that are possible when we don't care about signed-zero: fadd nsz x, 0 --> x fsub nsz 0, x --> -x Note that although the test cases include a 'sin' function call, I'm side-stepping the FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node. Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't actually do anything today because Flags are silently dropped for any node that is not a binary operator. Differential Revision: https://reviews.llvm.org/D25297 llvm-svn: 284824	2016-10-21 14:36:58 +00:00
Pirama Arumuga Nainar	05b0f93ad3	Fix _EXTEND_VECTOR_INREG legalization Summary: While promoting _EXTEND_VECTOR_INREG nodes whose inputs are already promoted, perform the appropriate sign extension for the promoted node before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined high-order bits of the promoted operand may (a) be garbage inc ase of zext) or (b) contribute the wrong sign-bit (in case of sext) Updated the promote-vec3.ll test after this change. The diff shows explicit zeroing in case of zext and intermediate sign extension in case of sext. Reviewers: RKSimon Subscribers: llvm-commits, srhines Differential Revision: https://reviews.llvm.org/D25790 llvm-svn: 284752	2016-10-20 17:56:36 +00:00
Sanjay Patel	0051efcf97	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746	2016-10-20 16:55:45 +00:00
Simon Pilgrim	618d3aedaf	[DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> (and x, cst2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284717	2016-10-20 11:10:21 +00:00
Simon Pilgrim	e32d0f8413	Merged nested ifs. NFCI. llvm-svn: 284616	2016-10-19 17:30:24 +00:00
Simon Pilgrim	a20aeea998	[DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284613	2016-10-19 17:12:22 +00:00
Reid Kleckner	f7ad5341d0	[WinEH] Allow catchpads to reuse the same catch object This code used a regular when it should have used a multimap. llvm-svn: 284612	2016-10-19 17:08:23 +00:00
Sanjay Patel	3a3aaf67e0	[DAG] optimize negation of bool Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611	2016-10-19 16:58:59 +00:00
Simon Pilgrim	4554e161be	[DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) -> (and x, (shl -1, c1)) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284608	2016-10-19 16:15:30 +00:00
Simon Pilgrim	c2e9724909	[DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) -> (mul x, c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284607	2016-10-19 15:59:28 +00:00
Simon Pilgrim	7dcb6e572e	[DAGCombiner] Just call isConstOrConstSplat directly. NFCI. This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach. llvm-svn: 284578	2016-10-19 11:28:15 +00:00
Simon Pilgrim	b2ca2505cc	[DAGCombine] Generalize distributeTruncateThroughAnd to work with any non-opaque constant or constant vector llvm-svn: 284574	2016-10-19 08:57:37 +00:00
Sanjay Patel	19601fa587	revert r284495: [Target] remove TargetRecip class There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513	2016-10-18 18:36:49 +00:00
Sanjay Patel	08fff9ca81	[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495	2016-10-18 17:05:05 +00:00
Simon Pilgrim	25e9628978	[DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x >>u (log2(pow2)+y) llvm-svn: 284491	2016-10-18 16:36:00 +00:00
Simon Pilgrim	65e0c73875	Strip trailing whitespace (NFCI) llvm-svn: 284478	2016-10-18 13:44:00 +00:00
Sanjay Patel	523cd8290a	[DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424	2016-10-17 20:41:39 +00:00
Sanjay Patel	a7cab58055	[DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC As noted in: https://reviews.llvm.org/D25685 This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch. In a minor attempt to keep some structure, we're pulling the FP helper over along with its integer sibling, but clearly we can and should do more refactoring of the similar helper functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality. llvm-svn: 284421	2016-10-17 20:26:46 +00:00
Sanjay Patel	2cf6bfaf73	[DAG] optimize away an arithmetic-right-shift of a 0 or -1 value This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395	2016-10-17 15:58:28 +00:00
James Molloy	aa79b19a3e	[SDAG] Use ABI type alignment for constant pools when optimizing for size SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128. In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding. llvm-svn: 284381	2016-10-17 12:54:07 +00:00
Konstantin Zhuravlyov	8ea0246e93	[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG. Differential Revision: https://reviews.llvm.org/D24577 llvm-svn: 284312	2016-10-15 22:01:18 +00:00
Sanjay Patel	72b5ff646d	[DAG] avoid creating illegal node when transforming negated shifted sign bit Eli noted this potential bug in the post-commit thread for: https://reviews.llvm.org/rL284239 ...but I'm not sure how to trigger it, so there's no test case yet. llvm-svn: 284268	2016-10-14 19:46:31 +00:00
Tom Stellard	ab61007914	TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt Summary: The main purpose of this new helper is to enable simplifying operations that have multiple uses. SimplifyDemandedBits does not handle multiple uses currently, and this new function makes it possible to optimize: and v1, v0, 0xffffff mul24 v2, v1, v1 ; Multiply ignoring high 8-bits. To: mul24 v2, v0, v0 Where before this would not be optimized, because v1 has multiple uses. Reviewers: bogner, arsenm Subscribers: nhaehnle, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24964 llvm-svn: 284266	2016-10-14 19:14:26 +00:00
Sanjay Patel	00fc7a6159	[DAG] add folds for negated shifted sign bit The same folds exist in InstCombine already. This came up as part of: https://reviews.llvm.org/D25485 llvm-svn: 284239	2016-10-14 14:26:47 +00:00
Nicolai Haehnle	86e72d98dd	Fix use-after-frees Extracted from D25313, as suggested by Justin Bogner. llvm-svn: 284220	2016-10-14 09:49:51 +00:00
Craig Topper	40feb7f157	[DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size. This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code. llvm-svn: 284204	2016-10-14 06:00:42 +00:00
Sanjay Patel	98d0ea64ca	[DAG] hoist DL(N) and fix formatting; NFC llvm-svn: 284170	2016-10-13 22:27:10 +00:00
Tom Stellard	f80c1875a3	LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE Summary: This operation is promoted the same way was ISD::BSWAP. This will prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16 support is implemented. Reviewers: bogner, hfinkel Subscribers: hfinkel, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D25202 llvm-svn: 284163	2016-10-13 21:03:49 +00:00
Nirav Dave	a81682aad4	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157	2016-10-13 20:23:25 +00:00
Nirav Dave	4b36957243	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151	2016-10-13 19:20:16 +00:00
Simon Pilgrim	cb59b5257c	[DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines llvm-svn: 284122	2016-10-13 14:04:35 +00:00
Simon Pilgrim	fa8fadc0e5	[DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding llvm-svn: 284117	2016-10-13 12:49:31 +00:00
Simon Pilgrim	833b8a2071	[DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization Improves commutation potential llvm-svn: 284113	2016-10-13 12:05:20 +00:00
Albert Gutowski	795d7d6381	Create llvm.addressofreturnaddress intrinsic Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows. Reviewers: majnemer, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25293 llvm-svn: 284061	2016-10-12 22:13:19 +00:00
Simon Pilgrim	08190943cb	[DAGCombiner] Update most ADD combines to support general vector combines Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors. Differential Revision: https://reviews.llvm.org/D25374 llvm-svn: 284015	2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov	081385a74e	[DAGCombiner] Do not remove the load of stored values when optimizations are disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014	2016-10-12 13:44:24 +00:00
Michael Kuperstein	7adbf6b042	[DAG] Fix crash in build_vector -> vector_shuffle combine Fixes a crash in the build_vector -> vector_shuffle combine when the first vector input is twice as wide as the output, and the second input vector is even wider. llvm-svn: 283953	2016-10-11 22:44:31 +00:00
Arnold Schwaighofer	9103e268cf	Silence -Wunused-but-set-variable warning llvm-svn: 283927	2016-10-11 19:49:29 +00:00
Sanjay Patel	8253e15ef3	[DAG] add fold for masked negated sign-extended bool This enhances the fold added with: https://reviews.llvm.org/rL283900 llvm-svn: 283905	2016-10-11 17:05:52 +00:00
Sanjay Patel	8384703d9b	[DAG] add fold for masked negated extended bool The non-obvious motivation for adding this fold (which already happens in InstCombine) is that we want to canonicalize IR towards select instructions and canonicalize DAG nodes towards boolean math. So we need to recreate some folds in the DAG to handle that change in direction. An interesting implementation difference for cases like this is that InstCombine generally works top-down while the DAG goes bottom-up. That means we need to detect different patterns. In this case, the SimplifyDemandedBits fold prevents us from performing a zext to sext fold that would then be recognized as a negation of a sext. llvm-svn: 283900	2016-10-11 16:26:36 +00:00
Sanjay Patel	38a42e4bfa	[DAG] simplify logic; NFC llvm-svn: 283885	2016-10-11 14:14:30 +00:00
Sanjay Patel	907ae69125	[DAG] hoist DL(N) and fix formatting; NFC llvm-svn: 283884	2016-10-11 14:04:24 +00:00
Sanjay Patel	9609f3d6c7	[DAG] fix formatting; NFC llvm-svn: 283878	2016-10-11 13:47:43 +00:00
Hal Finkel	fcd2421667	[SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal type. Patch by Edward Jones, thanks! Differential Revision: https://reviews.llvm.org/D24459 llvm-svn: 283797	2016-10-10 20:45:15 +00:00
Elena Demikhovsky	5b10aa1f1e	DAG: Setting Masked-Expand-Load as a variant of Masked-Load node Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector. Right now, the node is used in intrinsics for VEXPAND* instructions. The work is done towards implementation of masked.expandload and masked.compressstore intrinsics. Differential Revision: https://reviews.llvm.org/D25322 llvm-svn: 283694	2016-10-09 10:48:52 +00:00
Arnold Schwaighofer	3f25658143	swifterror: Don't compute swifterror vregs during instruction selection The code used llvm basic block predecessors to decided where to insert phi nodes. Instruction selection can and will liberally insert new machine basic block predecessors. There is not a guaranteed one-to-one mapping from pred. llvm basic blocks and machine basic blocks. Therefore the current approach does not work as it assumes we can mark predecessor machine basic block as needing a copy, and needs to know the set of all predecessor machine basic blocks to decide when to insert phis. Instead of computing the swifterror vregs as we select instructions, propagate them at the end of instruction selection when the MBB CFG is complete. When an instruction needs a swifterror vreg and we don't know the value yet, generate a new vreg and remember this "upward exposed" use, and reconcile this at the end of instruction selection. This will only happen if the target supports promoting swifterror parameters to registers and the swifterror attribute is used. rdar://28300923 llvm-svn: 283617	2016-10-07 22:06:55 +00:00
Sanjay Patel	14c02052d6	[DAG] clean up foldSelectOfConstants(); NFCI Rename variables, simplify logic. Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too. llvm-svn: 283613	2016-10-07 21:55:42 +00:00
Sanjay Patel	ecaf343fe7	[DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC We're missing at least 3 other similar folds based on what we have in InstCombine. llvm-svn: 283596	2016-10-07 20:47:51 +00:00
Vedant Kumar	7beb423765	Delete some dead code in SelectionDAG (NFC) Differential Revision: https://reviews.llvm.org/D24435 llvm-svn: 283505	2016-10-06 22:53:43 +00:00
Pirama Arumuga Nainar	cc152ac794	Handle *_EXTEND_VECTOR_INREG during Integer Legalization Summary: These nodes need legalization for 3-element vectors. This commit handles the legalization and adds tests for zext and sext. This fixes PR30614. Reviewers: RKSimon, srhines Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25268 llvm-svn: 283496	2016-10-06 21:27:05 +00:00
Michael Kuperstein	7cc2123847	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480	2016-10-06 18:58:24 +00:00
Peter Collingbourne	d799d28540	FastISel: Remove unused/un-overridden entry points. NFCI. llvm-svn: 283366	2016-10-05 19:25:20 +00:00
Bjorn Pettersson	12559441bd	[DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 llvm-svn: 283347	2016-10-05 17:40:27 +00:00
Mehdi Amini	3e021be3b6	Use StringRef in FastISel API (NFC) llvm-svn: 283291	2016-10-05 01:37:29 +00:00
whitequark	7c4fe0e9a3	[SelectionDAG] Fix calling convention in expansion of ?MULO. The SMULO/UMULO DAG nodes, when not directly supported by the target, expand to a multiplication twice as wide. In case that the resulting type is not legal, an __mul?i3 intrinsic is used. Since the type is not legal, the legalizer cannot directly call the intrinsic with the wide arguments; instead, it "pre-lowers" them by splitting them in halves. The "pre-lowering" code in essence made assumptions about the calling convention, specifically that i(N*2) values will be split into two iN values and passed in consecutive registers in little-endian order. This, naturally, breaks on a big-endian system, such as our OR1K out-of-tree backend. Thanks to James Miller <james@aatch.net> for help in debugging. Differential Revision: https://reviews.llvm.org/D25223 llvm-svn: 283203	2016-10-04 09:07:49 +00:00
Sanjay Patel	f7df85af87	fix formatting; NFC llvm-svn: 283115	2016-10-03 15:18:36 +00:00
Nirav Dave	e524f50882	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604	2016-09-28 16:37:50 +00:00
Nirav Dave	e17e055b75	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600	2016-09-28 15:50:43 +00:00
Michael Kuperstein	3e06eafc20	[DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine This check currently doesn't seem to do anything useful on any in-tree target: On non-x86, it always evaluates to false, so we never hit the code path that creates the shuffle with zero. On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to query in general, but doesn't make sense if only restricted to zero blends. Differential Revision: https://reviews.llvm.org/D24625 llvm-svn: 282567	2016-09-28 06:13:58 +00:00
Evandro Menezes	e45de8a5ec	Add support to optionally limit the size of jump tables. Many high-performance processors have a dedicated branch predictor for indirect branches, commonly used with jump tables. As sophisticated as such branch predictors are, they tend to have well defined limits beyond which their effectiveness is hampered or even nullified. One such limit is the number of possible destinations for a given indirect branches that such branch predictors can handle. This patch considers a limit that a target may set to the number of destination addresses in a jump table. Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar <aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>. Differential revision: https://reviews.llvm.org/D21940 llvm-svn: 282412	2016-09-26 15:32:33 +00:00
Ayman Musa	d7a5ed4141	[X86][avx512] Fix bug in masked compress store. Differential Revision: https://reviews.llvm.org/D23984 llvm-svn: 282381	2016-09-26 06:22:08 +00:00
Nirav Dave	9011da3d44	[DAG] Fix incorrect alignment of ext load. Correctly use alignment size from loaded size not output value size. Reviewers: jyknight, tstellarAMD, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23356 llvm-svn: 282177	2016-09-22 17:28:43 +00:00
Arnold Schwaighofer	de2490d0dc	Disable tail calls if there is an swifterror argument ISel does not handle them correctly yet i.e we crash trying to emit tail call code. radar://28407842 llvm-svn: 282088	2016-09-21 16:53:36 +00:00
Craig Topper	af5ee86bc9	[AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision. Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future. llvm-svn: 281868	2016-09-18 21:49:32 +00:00
Simon Pilgrim	6c21e6a54e	[X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations. This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware). While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions. Differential Revision: https://reviews.llvm.org/D24343 llvm-svn: 281852	2016-09-18 12:45:23 +00:00
Wei Mi	ab24cd189f	Change the order of the splitted store from high - low to low - high. It is a trivial change which could make the testcase easier to be reused for the store splitting in CodeGenPrepare. llvm-svn: 281846	2016-09-18 06:10:32 +00:00
Matt Arsenault	e8e0f5cac6	Make analyzeBranch family of instruction names consistent analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506	2016-09-14 17:24:15 +00:00
Sanjay Patel	284582b6d4	getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI llvm-svn: 281498	2016-09-14 16:54:10 +00:00
Sanjay Patel	1ed771f5d7	getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI llvm-svn: 281495	2016-09-14 16:37:15 +00:00
Sanjay Patel	b1f0a0f4a8	getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI llvm-svn: 281493	2016-09-14 16:05:51 +00:00
Sanjay Patel	5f6bb6cd24	getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI llvm-svn: 281490	2016-09-14 15:43:44 +00:00
Sanjay Patel	bd6fca1419	getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI llvm-svn: 281489	2016-09-14 15:21:00 +00:00
Pawel Bylica	c397f0b272	[CodeGen] Fix invalid shift in mul expansion Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354. Reviewers: hfinkel, majnemer, RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D24478 llvm-svn: 281403	2016-09-13 21:55:41 +00:00
Michael Kuperstein	59f8305305	[DAG] Allow build-to-shuffle combine to combine builds from two wide vectors. This allows us to, in some cases, create a vector_shuffle out of a build_vector, when the inputs to the build are extract_elements from two different vectors, at least one of which is wider than the output. (E.g. a <8 x i16> being constructed out of elements from a <16 x i16> and a <8 x i16>). Differential Revision: https://reviews.llvm.org/D24491 llvm-svn: 281402	2016-09-13 21:53:32 +00:00
Simon Pilgrim	4a8eba3e96	[DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range. Followup to D23007 llvm-svn: 281362	2016-09-13 18:33:29 +00:00
Simon Pilgrim	bd28a85d14	[DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue() Followup to D23007 llvm-svn: 281354	2016-09-13 17:15:28 +00:00
Ayman Musa	0c2da88f82	Remove MVT:i1 xor instruction before SELECT. (Performance improvement). Differential Revision: https://reviews.llvm.org/D23764 llvm-svn: 281308	2016-09-13 09:12:45 +00:00
Michael Kuperstein	efc0667583	[DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI. This should make it easier to add cases that we currently don't cover, like supporting more kinds of type mismatches and more than 2 input vectors. llvm-svn: 281283	2016-09-13 00:57:43 +00:00
Justin Lebar	adbf09e8cf	[CodeGen] Split out the notions of MI invariance and MI dereferenceability. Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151	2016-09-11 01:38:58 +00:00
Arnold Schwaighofer	7d7b4b4014	Create phi nodes for swifterror values at the end of the phi instructions list ISel makes assumption about the order of phi nodes. rdar://28190150 llvm-svn: 281095	2016-09-09 21:18:47 +00:00
Simon Pilgrim	153b408433	[SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type Fixes issue with rL280927 identified by Mikael Holmén llvm-svn: 281042	2016-09-09 13:31:52 +00:00
James Molloy	c6a6144966	[SDAGBuilder] Don't create a binary tree for switches in minsize mode This bloats codesize - all of the non-leaf nodes are extra code. llvm-svn: 280932	2016-09-08 13:12:22 +00:00
Simon Pilgrim	cc7b4b511b	[SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element. This is an initial step towards determining the sign bits of a vector (PR29079). Differential Revision: https://reviews.llvm.org/D24253 llvm-svn: 280927	2016-09-08 12:57:51 +00:00
Simon Pilgrim	a01ee07a19	[DAGCombiner] Enable AND combines of splatted constant vectors Allow AND combines to use a vector splatted constant as well as a constant scalar. Preliminary part of D24253. llvm-svn: 280926	2016-09-08 12:36:39 +00:00
Elena Demikhovsky	dcc86d5bb6	Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase. https://llvm.org/bugs/show_bug.cgi?id=29058. While node legalization we tried to legalize its operands. If an operand node is replaced during legalization the user node may be destroyed. Differential Revision: https://reviews.llvm.org/D24244 llvm-svn: 280862	2016-09-07 20:54:33 +00:00
Matt Arsenault	6cda10c950	Remove unnecessary call to getAllocatableRegClass This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783	2016-09-07 06:16:45 +00:00
Hal Finkel	8ca2ed22b2	[DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike) I might have called this "r246507, the sequel". It fixes the same issue, as the issue has cropped up in a few more places. The underlying problem is that isSetCCEquivalent can pick up select_cc nodes with a result type that is not legal for a setcc node to have, and if we use that type to create new setcc nodes, nothing fixes that (and so we've violated the contract that the infrastructure has with the backend regarding setcc node types). Fixes PR30276. For convenience, here's the commit message from r246507, which explains the problem is greater detail: [DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and\|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and\|or node are actual setcc nodes, then this is not an issue (because the and\|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 280767	2016-09-06 23:02:23 +00:00
Simon Pilgrim	1b4462b7c1	[SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector. This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes. Differential Revision: https://reviews.llvm.org/D24254 llvm-svn: 280715	2016-09-06 16:42:05 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Andrea Di Biagio	fd503e5af3	[DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift. This fixes a regression introduced by revision 268094. Revision 268094 added the following dag combine rule: // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2 That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2). The problem is that the constraint on the shift count is incorrect, so the rule doesn't work well in some cases involving vector types. The combine rule should have been written instead like this: // trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits() Basically, if K is smaller than the "scalar size in bits" of the truncated value then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined. This patch fixes the check on the shift count, and adds test cases to make sure that we don't regress the behavior. Differential Revision: https://reviews.llvm.org/D24154 llvm-svn: 280482	2016-09-02 11:29:09 +00:00
Aditya Kumar	356f79d535	[SelectionDAGBuilder] Add const to relevant places Reviewers: hans, evandro, sebpop Differential Revision: https://reviews.llvm.org/D24112 llvm-svn: 280430	2016-09-01 23:35:26 +00:00
Michael Kuperstein	7bc54cebea	[Legalizer] Don't throw away false low half when expanding GT/LT SETCC When expanding a SETCC for which the low half is known to evaluate to false, we can only throw it away for LT/GT comparisons, not LE/GE. This fixes PR29170. Differential Revision: https://reviews.llvm.org/D24151 llvm-svn: 280424	2016-09-01 23:02:32 +00:00
Michael Kuperstein	5f17d08f49	[SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes Prior to this, we could generate a vector_shuffle from an IR shuffle when the size of the result was exactly the sum of the sizes of the input vectors. If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts and inserts. Instead, we can form a larger vector_shuffle, and then extract a subvector of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8> and then extract a <12 x i8>. This also includes a target-specific X86 combine that in the presence of AVX2 combines: (vector_shuffle <mask> (concat_vectors t1, undef) (concat_vectors t2, undef)) into: (vector_shuffle <mask> (concat_vectors t1, t2), undef) in cases where this allows us to form VPERMD/VPERMQ. (This is not a separate commit, as that pattern does not appear without the DAGBuilder change.) llvm-svn: 280418	2016-09-01 21:32:09 +00:00
Michael Kuperstein	b4743597bd	Rename some variables to have meaningful names. NFC. llvm-svn: 280391	2016-09-01 18:24:42 +00:00
Michael Kuperstein	65bc3c89ff	[DAGCombine] Don't fold a trunc if it feeds an anyext Legalization tends to create anyext(trunc) patterns. This should always be combined - into either a single trunc, a single ext, or nothing if the types match exactly. But if we happen to combine the trunc first, we may pull the trunc away from the anyext or make it implicit (e.g. the truncate(extract) -> extract(bitcast) fold). To prevent this, we can avoid doing the fold, similarly to how we already handle fpround(fpextend). Differential Revision: https://reviews.llvm.org/D23893 llvm-svn: 280386	2016-09-01 17:59:24 +00:00
Hal Finkel	5081ac27c7	Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is lowered using: ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET) where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86, FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not work for PowerPC. Because of the way that the stack layout works, the canonical frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC (there is a lower save-area offset as well), so it is not just a matter of implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its semantics -- We can do that, since it is currently used only for @llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips currently does this, but by using a custom lowering for ADD that specifically recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern. This change introduces a ISD::EH_DWARF_CFA node, which by default expands using the existing logic, but can be directly lowered by the target. Mips is updated to use this method (which simplifies its implementation, and I suspect makes it more robust), and updates PowerPC to do the same. Fixes PR26761. Differential Revision: https://reviews.llvm.org/D24038 llvm-svn: 280350	2016-09-01 10:28:47 +00:00
Philip Reames	2b1084ac93	[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we only fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any known miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250	2016-08-31 15:12:17 +00:00
Krzysztof Parzyszek	354832e585	Propagate TBAA info in SelectionDAG::getIndexedLoad Patch by Pranav Bhandarkar. llvm-svn: 279998	2016-08-29 19:50:15 +00:00
Igor Breger	24281b4740	Fixed a bug in type legalizer for masked gather. The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist. In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain). Differential Revision: http://reviews.llvm.org/D23756 llvm-svn: 279961	2016-08-29 09:12:31 +00:00
Quentin Colombet	e063e1f68a	[SelectionDAG] Do not run the ISel process on already selected code. Right now, this cannot happen, but with the fall back path of GlobalISel it will show up eventually. llvm-svn: 279877	2016-08-26 22:32:55 +00:00
Michael Kuperstein	260daed147	Reuse an SDLoc throughout a function. NFC. llvm-svn: 279767	2016-08-25 18:50:56 +00:00
Justin Lebar	1972e222ea	[SelectionDAG] Use a union of bitfield structs for SDNode::SubclassData. Summary: This greatly simplifies our handling of SDNode::SubclassData. NFC, hopefully. :) See discussion in D23035 for discussion about the design API of these bitfields. Reviewers: chandlerc Subscribers: llvm-commits, rnk Differential Revision: https://reviews.llvm.org/D23036 llvm-svn: 279537	2016-08-23 17:18:11 +00:00
Pete Cooper	036b94dad3	Fix some more asserts after r279466. That commit added a new version of Intrinsic::getName which should only be called when the intrinsic has no overloaded types. There are several debugging paths, such as SDNode::dump which are printing the name of the intrinsic but don't have the overloaded types. These paths should be ok to just print the name instead of crashing. The fix here is ultimately to just add a 'None' second argument as that calls the overload capable getName, which is less efficient, but this is a debugging path anyway, and not perf critical. Thanks to Björn Pettersson for pointing out that there were more crashes. llvm-svn: 279528	2016-08-23 16:23:45 +00:00
Simon Pilgrim	02b13d4d3c	Use SDValue::getOpcode() helper instead of via SDValue::getNode() llvm-svn: 279381	2016-08-20 20:04:18 +00:00
James Molloy	7ee640f9b6	[CodeGen] Fix a trivial type conversion bug dating back to pre-2008 The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the mutated opcode later to see if we need to emit an AssertSext or AssertZext node. Fixes PR29041. llvm-svn: 279223	2016-08-19 08:38:50 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
Ayman Musa	71b43c5c1d	Fix bug in DAGBuilder for getelementptr with expanded vector. Replacing the usage of MVT with EVT in case the vector type is expanded. Differential Revision: https://reviews.llvm.org/D23306 llvm-svn: 278913	2016-08-17 07:52:15 +00:00
Ayman Musa	c96f421ad4	First commit (test commit) - Adding empty line. llvm-svn: 278910	2016-08-17 07:37:34 +00:00
Justin Bogner	b03fd12cef	Replace "fallthrough" comments with LLVM_FALLTHROUGH This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902	2016-08-17 05:10:15 +00:00
Pierre Gousseau	051db7d838	[x86] Refactor a PowerPC specific ctlz/srl transformation (NFC). Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets. Differential Revision: https://reviews.llvm.org/D23445 llvm-svn: 278799	2016-08-16 13:53:53 +00:00
Eli Friedman	98151d6440	Fix typo in lowering for fp128 ueq. Regression from r259791. Differential Revision: https://reviews.llvm.org/D23374 llvm-svn: 278750	2016-08-15 21:46:19 +00:00
Wolfgang Pieb	dfad9b20c9	Local variables whose address is taken and passed on to a call are described in debug info using their stack slots instead of as an indirection of param reg + 0 offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack location operands. Differential Revision: http://reviews.llvm.org/D23283 llvm-svn: 278703	2016-08-15 18:18:26 +00:00
Duncan P. N. Exon Smith	f197b1f78f	ADT: Remove all ilist_iterator => pointer casts, NFC Remove all ilist_iterator to pointer casts. There were two reasons for casts: - Checking for an uninitialized (i.e., null) iterator. I added MachineInstrBundleIterator::isValid() to check for that case. - Comparing an iterator against the underlying pointer value while avoiding converting the pointer value to an iterator. This is occasionally necessary in MachineInstrBundleIterator, since there is an assertion in the constructors that the underlying MachineInstr is not bundled (but we don't care about that if we're just checking for pointer equality). To support the latter case, I rewrote the == and != operators for ilist_iterator and MachineInstrBundleIterator. - The implicit constructors now use enable_if to exclude const-iterator => non-const-iterator conversions from overload resolution (previously it was a compiler error on instantiation, now it's SFINAE). - The == and != operators are now global (friends), and are not templated. - MachineInstrBundleIterator has overloads to compare against both const_pointer and const_reference. This avoids the implicit conversions to MachineInstrBundleIterator that assert, instead just checking the address (and I added unit tests to confirm this). Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked are in ilist.h, and no code outside of ilist.h directly relies on this UB end-iterator-to-pointer conversion anymore. It's still needed for ilist_sentinel_traits, but I'll clean that up soon. llvm-svn: 278478	2016-08-12 05:05:36 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
David Majnemer	0a16c22846	Use range algorithms instead of unpacking begin/end No functionality change is intended. llvm-svn: 278417	2016-08-11 21:15:00 +00:00
Simon Pilgrim	85c7ea86ae	[DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678) If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector. i.e. INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx ) Differential Revision: https://reviews.llvm.org/D23330 llvm-svn: 278211	2016-08-10 10:50:53 +00:00
Simon Pilgrim	76964e3140	[DAGCombiner] Better support for shifting large value type by constants As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths. This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required. Differential Revision: https://reviews.llvm.org/D23007 llvm-svn: 278141	2016-08-09 17:39:11 +00:00
Diana Picus	4dd6c249ac	[SelectionDAG] Refactor visitInlineAsm a bit. NFCI. This shaves off ~100 lines from visitInlineAsm. llvm-svn: 277987	2016-08-08 08:54:39 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Diana Picus	ddddbc2440	Typo fix in comment. NFC llvm-svn: 277704	2016-08-04 08:25:08 +00:00
Elliot Colp	82b1468a4d	Disable shrinking of SNaN constants When expanding FP constants, we attempt to shrink doubles to floats and perform an extending load. However, on SystemZ, and possibly on other targets (I've only confirmed the problem on SystemZ), the FP extending load instruction may convert SNaN into QNaN, or may cause an exception. So in the general case, we would still like to shrink FP constants, but SNaNs should be left as doubles. Differential Revision: https://reviews.llvm.org/D22685 llvm-svn: 277602	2016-08-03 15:09:21 +00:00
Michael Kuperstein	c97da7f3a4	[DAGCombine] Make sext(setcc) combine respect getBooleanContents We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)" Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value of T is 1 or -1, depending on the type of the setcc, and getBooleanContents() for the type if it is not i1. This fixes PR28504. llvm-svn: 277371	2016-08-01 19:39:49 +00:00
Weiming Zhao	812fde3603	DAG: avoid duplicated truncating for sign extended operand Summary: When performing cmp for EQ/NE and the operand is sign extended, we can avoid the truncaton if the bits to be tested are no less than origianl bits. Reviewers: eli.friedman Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D22933 llvm-svn: 277252	2016-07-29 23:33:48 +00:00
Andrew Kaylor	b99d1cc7ed	Recommitting r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!) llvm-svn: 277189	2016-07-29 18:23:18 +00:00
Nirav Dave	563d6f8614	Cleanup TransferDbgValues [DAG] Check debug values for invalidation before transferring and mark old debug values invalid when transferring to another SDValue. This fixes PR28613. Reviewers: jyknight, hans, dblaikie, echristo Subscribers: yaron.keren, ismail, llvm-commits Differential Revision: https://reviews.llvm.org/D22858 llvm-svn: 277135	2016-07-29 11:49:32 +00:00
Nirav Dave	b7c72717c9	Fix DbgValue handling in SelectionDAG. [DAG] Relocate TransferDbgValues in ReplaceAllUsesWith(SDValue, SDValue) to before we modify the CSE maps. llvm-svn: 277027	2016-07-28 19:48:39 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Simon Pilgrim	10bf0ff879	[DAGCombiner] Use APInt directly to detect out of range shift constants Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match As mentioned on D22726 llvm-svn: 276855	2016-07-27 10:30:55 +00:00
Andrew Kaylor	f990fa5f7b	Reverting r276771 due to MSan failures. llvm-svn: 276824	2016-07-27 01:19:24 +00:00
Andrew Kaylor	3104a6bad0	Re-committing r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 276771	2016-07-26 17:23:13 +00:00
Simon Pilgrim	820f87a72d	[SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size. After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts. In doing so we can significantly reduce the number of operations required. Differential Revision: https://reviews.llvm.org/D21578 llvm-svn: 276432	2016-07-22 16:46:25 +00:00
Ahmed Bougacha	29333c9de6	[FastISel] Ignore @llvm.assume. llvm-svn: 276410	2016-07-22 12:54:53 +00:00
Elena Demikhovsky	2c0780b8e5	AVX-512: Fixed BT instruction selection. The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950	2016-07-19 07:14:21 +00:00
Chih-Hung Hsieh	4d9f2c154d	[X86] Accept SELECT op code for x86-64 fp128 type DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow SELECT op code for x86_64 fp128 type for MME targets, so SoftenFloatOperand does not abort on SELECT op code. Differential Revision: http://reviews.llvm.org/D21758 llvm-svn: 275818	2016-07-18 17:20:09 +00:00
Simon Dardis	d32a2d30cb	[inlineasm] Propagate operand constraints to the backend When SelectionDAGISel transforms a node representing an inline asm block, memory constraint information is not preserved. This can cause constraints to be broken when a memory offset is of the form: offset + frame index when the frame is resolved. By propagating the constraints all the way to the backend, targets can enforce memory operands of inline assembly to conform to their constraints. For MIPSR6, some instructions had their offsets reduced to 9 bits from 16 bits such as ll/sc. This becomes problematic when using inline assembly to perform atomic operations, as an offset can generated that is too big to encode in the instruction. Reviewers: dsanders, vkalintris Differential Review: https://reviews.llvm.org/D21615 llvm-svn: 275786	2016-07-18 13:17:31 +00:00
Justin Lebar	9c375817ac	[SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592	2016-07-15 18:27:10 +00:00
Justin Lebar	0af80cd6f0	[CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand. Summary: Previously we took an unsigned. Hooray for type-safety. Reviewers: chandlerc Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D22282 llvm-svn: 275591	2016-07-15 18:26:59 +00:00
Michael Kuperstein	4d36e77048	Fix copy/paste bug in r275340. llvm-svn: 275343	2016-07-13 23:28:00 +00:00
Michael Kuperstein	be837fa40f	[DAG] Correctly chain masked loads If a masked loads is not added to the chain, it should not reset the chain's root. This fixes the remaining part of PR28515. llvm-svn: 275340	2016-07-13 23:23:40 +00:00
Andrew Kaylor	346dd7f1bd	Reverting r275284 due to platform-specific test failures llvm-svn: 275304	2016-07-13 19:09:16 +00:00
Andrew Kaylor	12cccdd731	Fix for Bug 26903, adds support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 275284	2016-07-13 17:25:11 +00:00
Sanjay Patel	bb7d87ee25	fix documentation comments; NFC llvm-svn: 275101	2016-07-11 20:50:39 +00:00
Sanjay Patel	fedc01ad76	[DAG] make isConstantSplatVector() available to the rest of lowering llvm-svn: 275025	2016-07-10 21:27:06 +00:00
Sanjay Patel	9bedcdb5f5	fix documentation comments; NFC llvm-svn: 275021	2016-07-10 21:02:16 +00:00
Sanjay Patel	303326541b	reformat, fix comments/names; NFCI llvm-svn: 275015	2016-07-10 13:05:57 +00:00
Benjamin Kramer	4d09892e9a	Give helper classes/functions internal linkage. NFC. llvm-svn: 275014	2016-07-10 11:28:51 +00:00
Sanjay Patel	6170b4bebd	fix documentation comments; NFC llvm-svn: 274981	2016-07-09 18:52:07 +00:00
Matt Arsenault	3fb8f9eabf	Reapply r274829 with fix for FP vectors llvm-svn: 274937	2016-07-08 21:25:33 +00:00
Nico Weber	28410c6846	Revert r274829, it caused PR28472. llvm-svn: 274916	2016-07-08 19:52:19 +00:00

... 9 10 11 12 13 ...

8683 Commits