llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	5a68d401c7	[SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS llvm-svn: 298108	2017-03-17 17:45:36 +00:00
Matthias Braun	f0b68d3fbc	SplitKit: Correctly implement partial subregister copies - This fixes a bug where subregister incompatible with the vregs register class where used. - Implement the case where multiple copies are necessary to cover a given lanemask. Differential Revision: https://reviews.llvm.org/D30438 llvm-svn: 298025	2017-03-17 00:41:39 +00:00
Matthias Braun	fa289ec7f0	VirtRegMap: Correctly deal with bundles when deleting identity copies. This fixes two problems when VirtRegMap encounters bundles: - When substituting a vreg subregister def with an actual register the internal read flag must be cleared. - Removing an identity COPY from a bundle needs to use removeFromBundle() and a newly introduced function to update SlotIndexes. No testcase here, because none of the in-tree targets trigger this, however an upcoming commit of mine will need this and the testcase there will trigger this. Differential Revision: https://reviews.llvm.org/D30925 llvm-svn: 298024	2017-03-17 00:41:33 +00:00
Eric Christopher	53da761570	Remove LessPreciseFPMADOption from TargetOptions along with all of the associated command line options and functions - it's currently unused in all of llvm and clang other than being set and reset. llvm-svn: 298023	2017-03-17 00:38:03 +00:00
Reid Kleckner	45707d4d5a	Remove getArgumentList() in favor of arg_begin(), args(), etc Users often call getArgumentList().size(), which is a linear way to get the number of function arguments. arg_size(), on the other hand, is constant time. In general, the fact that arguments are stored in an iplist is an implementation detail, so I've removed it from the Function interface and moved all other users to the argument container APIs (arg_begin(), arg_end(), args(), arg_size()). Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D31052 llvm-svn: 298010	2017-03-16 22:59:15 +00:00
Simon Pilgrim	fbfb19b1d7	Remove redundant conditions (PR31753). NFCI. llvm-svn: 297976	2017-03-16 19:52:00 +00:00
Adrian Prantl	dc855221af	Attempt to fix bot failure on Windows. Looks like this expression was accidentally using 32-bit arithmetic. llvm-svn: 297969	2017-03-16 18:06:04 +00:00
Adrian Prantl	3621309e8d	Rearrange fields. NFC. llvm-svn: 297967	2017-03-16 17:42:47 +00:00
Adrian Prantl	a63b8e8227	Rename methods in DwarfExpression to adhere to the LLVM coding guidelines. NFC. llvm-svn: 297966	2017-03-16 17:42:45 +00:00
Adrian Prantl	981f03e6a2	PR32288: More efficient encoding for DWARF expr subregister access. Citing http://bugs.llvm.org/show_bug.cgi?id=32288 The DWARF generated by LLVM includes this location: 0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply 0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable to assume the DWARF consumer knows which part of a register logically holds the value (low bytes, high bytes, how many bytes, etc) for a primitive value like an integer. This patch gets rid of the redundant DW_OP_piece when a subregister is at offset 0. It also adds previously missing subregister masking when a subregister is followed by another operation. (This reapplies r297960 with two additional testcase updates). rdar://problem/31069390 https://reviews.llvm.org/D31010 llvm-svn: 297965	2017-03-16 17:14:56 +00:00
Adrian Prantl	c5b3351750	Revert "PR32288: More efficient encoding for DWARF expr subregister access." This reverts commit 2bf453116889a576956892ea9683db4fcd96e30e while investigating buildbot breakage. llvm-svn: 297962	2017-03-16 16:38:22 +00:00
Adrian Prantl	8508e87998	PR32288: More efficient encoding for DWARF expr subregister access. Citing http://bugs.llvm.org/show_bug.cgi?id=32288 The DWARF generated by LLVM includes this location: 0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply 0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable to assume the DWARF consumer knows which part of a register logically holds the value (low bytes, high bytes, how many bytes, etc) for a primitive value like an integer. This patch gets rid of the redundant DW_OP_piece when a subregister is at offset 0. It also adds previously missing subregister masking when a subregister is followed by another operation. rdar://problem/31069390 https://reviews.llvm.org/D31010 llvm-svn: 297960	2017-03-16 16:34:14 +00:00
Oren Ben Simhon	da59ffae91	Fixing typos. llvm-svn: 297932	2017-03-16 08:15:52 +00:00
Jonas Paulsson	84319bfc40	[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types. Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930	2017-03-16 07:17:12 +00:00
Kyle Butt	08655997eb	CodeGen: BlockPlacement: Reduce TriangleChainCount to 2 This produces a 1% speedup on an important internal Google benchmark (protocol buffers), with no other regressions in google or in the llvm test-suite. Only 5 targets in the entire llvm test-suite are affected, and on those 5 targets the size increase is 0.027% llvm-svn: 297925	2017-03-16 01:32:29 +00:00
Craig Topper	6c66bbca4a	[StackColoring] Remove unused header file for post-order traversal. Update comment that indicated we were using it when we really use a depth-first search. NFC llvm-svn: 297904	2017-03-15 22:40:26 +00:00
Matt Arsenault	02d915be90	CodeGenPrepare: Sink addressing modes for atomics llvm-svn: 297903	2017-03-15 22:35:20 +00:00
Eric Christopher	17ce8a2f5e	Fix up grammar in a comment. llvm-svn: 297898	2017-03-15 21:50:46 +00:00
Zvi Rackover	48cdde0e59	[DAGCombine] Bail out if can't create a vector with at least two elements Summary: Fixes pr32278 Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30978 llvm-svn: 297878	2017-03-15 19:48:36 +00:00
Ahmed Bougacha	2fb8030748	[GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS. Currently, we create a G_CONSTANT for every "synthetic" integer constant operand (for instance, for the G_GEP offset). Instead, share the G_CONSTANTs we might have created by going through the ValueToVReg machinery. When we're emitting synthetic constants, we do need to get Constants from the context. One could argue that we shouldn't modify the context at all (for instance, this means that we're going to use a tad more memory if the constant wasn't used elsewhere), but constants are mostly harmless. We currently do this for extractvalue and all. For constant fcmp, this does mean we'll emit an extra COPY, which is not necessarily more optimal than an extra materialized constant. But that preserves the current intended design of uniqued G_CONSTANTs, and the rematerialization problem exists elsewhere and should be resolved with a single coherent solution. llvm-svn: 297875	2017-03-15 19:21:11 +00:00
Tim Northover	0d98b03b9f	ARM: avoid clobbering register in v6 jump-table expansion. If we got unlucky with register allocation and actual constpool placement, we could end up producing a tTBB_JT with an index that's already been clobbered. Technically, we might be able to fix this situation up with a MOV, but I think the constant islands pass is complex enough without having to deal with more weird edge-cases. llvm-svn: 297871	2017-03-15 18:38:13 +00:00
Ahmed Bougacha	07f247b6c2	[GlobalISel] Insert translated switch icmp blocks after switch parent. Now that we preserve the IR layout, we would end up with all the newly synthesized switch comparison blocks at the end of the function. Instead, use a hopefully more reasonable layout, with the comparison blocks immediately following the switch comparison blocks. llvm-svn: 297869	2017-03-15 18:22:37 +00:00
Ahmed Bougacha	a61c214f51	[GlobalISel] Preserve IR block layout. It makes the output function layout more predictable; the layout has an effect on performance, we don't want it to be at the mercy of the translator's visitation order and such. The predictable output is also easier to digest. getOrCreateBB isn't appropriately named anymore, as it never needs to create anything. Rename it and extract the MBB creation logic out of it. A couple tests were sensitive to the order. Update them. llvm-svn: 297868	2017-03-15 18:22:33 +00:00
Craig Topper	bcb6093610	[CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom. In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent. Differential Revision: https://reviews.llvm.org/D30965 llvm-svn: 297860	2017-03-15 16:53:53 +00:00
Ahmed Bougacha	2b7f1377aa	[GlobalISel] Remove dead member. NFC. llvm-svn: 297855	2017-03-15 16:29:32 +00:00
Peter Collingbourne	d44a01aae6	CodeGen: Use the source filename as the argument to .file, rather than the module ID. Using the module ID here is wrong for a couple of reasons: 1) The module ID is not persisted, so we can end up with different object file contents given the same input file (for example if the same file is accessed via different paths). 2) With ThinLTO the module ID field may contain the path to a bitcode file, which is incorrect, as the .file argument is supposed to contain the path to a source file. Differential Revision: https://reviews.llvm.org/D30584 llvm-svn: 297853	2017-03-15 16:24:52 +00:00
Simon Pilgrim	018eedd9a5	[SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273) llvm-svn: 297852	2017-03-15 16:22:24 +00:00
Nuno Lopes	ae455c562d	fix gcc -Wmisleading-indentation [NFC] llvm-svn: 297816	2017-03-15 09:33:33 +00:00
Taewook Oh	1b192336d8	NFC: Reformats comments according to the coding guildelines. llvm-svn: 297808	2017-03-15 06:29:23 +00:00
Taewook Oh	fb1833efeb	[BranchFolding] Merge debug locations from common tail instead of removing Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions. Reviewers: aprantl, probinson, MatzeB, rob.lougher Reviewed By: aprantl Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D30226 llvm-svn: 297805	2017-03-15 05:44:59 +00:00
Peter Collingbourne	7f6e2c97b8	Ensure that prefix data is preserved with subsections-via-symbols On MachO platforms that use subsections-via-symbols dead code stripping will drop prefix data. Unfortunately there is no great way to convey the relationship between a function and its prefix data to the linker. We are forced to use a bit of a hack: we give the prefix data it’s own symbol, and mark the actual function entry an .alt_entry. Patch by Moritz Angermann! Differential Revision: https://reviews.llvm.org/D30770 llvm-svn: 297804	2017-03-15 04:18:16 +00:00
Volkan Keles	4862c63594	[GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors Summary: <1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES instruction for them. Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30948 llvm-svn: 297792	2017-03-14 23:45:06 +00:00
Daniel Sanders	8a4bae9993	[globalisel][tblgen] Add support for ComplexPatterns Summary: Adds a new kind of MachineOperand: MO_Placeholder. This operand must not appear in the MIR and only exists as a way of creating an 'uninitialized' operand until a matcher function overwrites it. Depends on D30046, D29712 Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30089 llvm-svn: 297782	2017-03-14 21:32:08 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Sanjay Patel	8dd99dce6c	[DAG] vector div/rem with any zero element in divisor is undef This is the backend counterpart to: https://reviews.llvm.org/rL297390 https://reviews.llvm.org/rL297409 and follow-up to: https://reviews.llvm.org/rL297384 It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those someday. Differential Revision: https://reviews.llvm.org/D30826 llvm-svn: 297762	2017-03-14 18:06:28 +00:00
Benjamin Kramer	babcbddae0	[CodeGen] Fix -Wreorder warning. llvm-svn: 297729	2017-03-14 10:29:47 +00:00
Sam Parker	916b1ba617	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Oren Ben Simhon	fe34c5e429	Disable Callee Saved Registers Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715	2017-03-14 09:09:26 +00:00
Nirav Dave	4fc8401abf	Recommitting Craig Topper's patch now that r296476 has been recommitted. When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 297698	2017-03-14 01:42:23 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Adrian Prantl	19aadf57c8	Revert "Debug Info: Add basic support for external types references." This reverts commit r242302. External type refs of this form were never used by any LLVM frontend so this is effectively dead code. (They were introduced to support clang module debug info, but in the end we came up with a better design that doesn't use this feature at all.) rdar://problem/25897929 Differential Revision: https://reviews.llvm.org/D30917 llvm-svn: 297684	2017-03-13 22:56:14 +00:00
Marcello Maggioni	598d89a3f4	[IPRA] Change algorithm for RegUsageInfoCollector. The previous algorithm for RegUsageInfoCollector had pretty bad performance on architectures with a lot of registers that alias a lot one another, because we potentially iterate for every register over all the aliasing registers. This costs even more if the function is small and doesn't define a lot of registers. This patch changes the algorithm to one that while iterating over all the registers it will iterate over the aliasing registers only if the register itself is defined. This should be faster based on the assumption that only a subset of the whole LLVM registers set is actually defined in the function. Differential Revision: https://reviews.llvm.org/D30880 llvm-svn: 297673	2017-03-13 21:42:53 +00:00
Volkan Keles	38a91a0de6	GlobalISel: Translate ConstantDataVector Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab Reviewed By: qcolombet, dsanders, ab Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30216 llvm-svn: 297670	2017-03-13 21:36:19 +00:00
Jessica Paquette	c984e21394	[Outliner] Add tail call support This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653	2017-03-13 18:39:33 +00:00
Simon Pilgrim	fa97699d09	Fix -Wsentinel warning llvm-svn: 297560	2017-03-11 12:56:02 +00:00
Amaury Sechet	d1ec5d54cf	Use setBits in SelectionDAG Summary: As per title. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30836 llvm-svn: 297559	2017-03-11 11:24:03 +00:00
Quentin Colombet	ee8a4f51c4	[IRTranslator] Simplify error handling for translating constants. NFC. We don't need to check whether the fallback path is enabled to return false. Just do that all the time on error cases, the caller knows (or at least should know!) how to handle the failing case. llvm-svn: 297535	2017-03-11 00:28:33 +00:00
Stanislav Mekhanoshin	b546174b0e	Fix subreg value numbers in handleMoveUp The problem can occur in presence of subregs. If we are swapping two instructions defining different subregs of the same register we will get a new liveout from a block. We need to preserve value number for block's liveout for successor block's livein to match. Differential Revision: https://reviews.llvm.org/D30558 llvm-svn: 297534	2017-03-11 00:14:52 +00:00
Simon Pilgrim	455e2f3313	Strip trailing whitespace. llvm-svn: 297529	2017-03-10 22:53:19 +00:00
Simon Pilgrim	83c37c4dac	Fix redundant condition (PR32138) '!A \|\| (A && B)' is equivalent to '!A \|\| B' llvm-svn: 297527	2017-03-10 22:44:47 +00:00
Volkan Keles	225921ac41	[GlobalISel] LegalizerHelper: Lower (G_FSUB X, Y) to (G_FADD X, (G_FNEG Y)) Summary: No test case as none of the in-tree targets with GlobalISel support has this condition. Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab Reviewed By: qcolombet Subscribers: dberris, rovka, kristof.beyls, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D30786 llvm-svn: 297512	2017-03-10 21:25:09 +00:00
Volkan Keles	970fee4bfe	GlobalISel: Translate ConstantAggregateZero vectors Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30259 llvm-svn: 297509	2017-03-10 21:23:13 +00:00
Volkan Keles	04cb08cc83	[GlobalISel] Translate insertelement and extractelement Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30761 llvm-svn: 297495	2017-03-10 19:08:28 +00:00
Simon Pilgrim	7dedbfa89d	[SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits llvm-svn: 297492	2017-03-10 18:36:46 +00:00
Volkan Keles	685fbda217	[GlobalISel] Make LegalizerInfo accessible in LegalizerHelper Summary: We don’t actually use LegalizerInfo in Legalizer pass, it’s just passed as an argument. In order to check if an instruction is legal or not, we need to get LegalizerInfo by calling `MI.getParent()->getParent()->getSubtarget().getLegalizerInfo()`. Instead, make LegalizerInfo accessible in LegalizerHelper. Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, kristof.beyls Reviewed By: qcolombet Subscribers: dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D30838 llvm-svn: 297491	2017-03-10 18:34:57 +00:00
Amaury Sechet	62e0759d56	[SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC. Summary: Depends on D30379 This improves the state of things for the sub class of operation. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30436 llvm-svn: 297482	2017-03-10 17:26:44 +00:00
Amaury Sechet	69fa16c810	[SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO. Summary: As per title. This is extracted from D29872 and I threw SADDO in. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30379 llvm-svn: 297479	2017-03-10 17:06:52 +00:00
Simon Pilgrim	b02667c469	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Ahmed Bougacha	d22b84b9d0	[GlobalISel] Use ImmutableCallSite instead of templates. NFC. ImmutableCallSite abstracts away CallInst and InvokeInst. Use it! llvm-svn: 297426	2017-03-10 00:25:44 +00:00
Ahmed Bougacha	4ec6d5abed	[GlobalISel] Fallback when failing to translate invoke. We unintentionally stopped falling back in r293670. While there, change an unusual construct. llvm-svn: 297425	2017-03-10 00:25:35 +00:00
Tim Northover	aa995c98f4	GlobalISel: support trivial inlineasm calls. They're used for nefarious purposes by ObjC. llvm-svn: 297422	2017-03-09 23:36:26 +00:00
Eli Friedman	93f47e5ffb	Refactor alias check from MISched into common helper. NFC. Differential Revision: https://reviews.llvm.org/D30598 llvm-svn: 297421	2017-03-09 23:33:36 +00:00
Amaury Sechet	e7d102cf02	[DAGCombiner] Do various combine on uaddo. Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 llvm-svn: 297416	2017-03-09 22:47:00 +00:00
Tim Northover	d1e951e5eb	GlobalISel: inform FrameLowering when we emit a function call. Amongst other things (I expect) this is necessary to ensure decent backtraces when an "unreachable" is involved. llvm-svn: 297413	2017-03-09 22:00:39 +00:00
Tim Northover	7a9ea8f628	GlobalISel: put debug info for static allocas in the MachineFunction. The good reason to do this is that static allocas are pretty simple to handle (especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline should give some kind of performance benefit. The bad reason is that the debug pipeline is an unholy mess of implicit contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a load or not involves the services of at least 3 soothsayers and the sacrifice of at least one chicken. And it still gets it wrong if the variable is at SP directly. llvm-svn: 297410	2017-03-09 21:12:06 +00:00
Amaury Sechet	10425de063	[DAGCombiner] Do various combine on usubo. Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404	2017-03-09 19:28:00 +00:00
Sanjay Patel	df21979db7	[DAG] recognize div/rem by 0 as undef before trying constant folding As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384	2017-03-09 15:02:25 +00:00
Adam Nemet	5361b82d54	[SSP] In opt remarks, stream Function directly With this, it shows up as an attribute in YAML and non-printable characters are properly removed by GlobalValue::getRealLinkageName. llvm-svn: 297362	2017-03-09 06:10:27 +00:00
Matt Arsenault	9a3fd87523	DAG: Check no signed zeros instead of unsafe math attribute llvm-svn: 297354	2017-03-09 01:36:39 +00:00
Konstantin Zhuravlyov	d5561e0a0b	[DebugInfo] Emit address space with DW_AT_address_class attribute for pointer and reference types Differential Revision: https://reviews.llvm.org/D29670 llvm-svn: 297320	2017-03-08 23:55:44 +00:00
Jessica Paquette	d4cb9c6da0	[Outliner] Fix memory leak in suffix tree. This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator. Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes. Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released. llvm-svn: 297319	2017-03-08 23:55:33 +00:00
Tim Northover	7596bd7a27	GlobalISel: correctly handle trivial fcmp predicates. It makes sense to only do them once in IRTranslator rather than making everyone deal with them. llvm-svn: 297304	2017-03-08 18:49:54 +00:00
Volkan Keles	5698b2ae6e	[GlobalISel] Add default action for G_FNEG Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets. Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits Differential Revision: https://reviews.llvm.org/D30721 llvm-svn: 297301	2017-03-08 18:09:14 +00:00
Eli Friedman	c2c2e21d77	[DAGCombine] Simplify ISD::AND in GetDemandedBits. This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249	2017-03-08 00:56:35 +00:00
Konstantin Zhuravlyov	f9b41cd3d8	[DebugInfo] Make legal and emit DW_OP_swap and DW_OP_xderef Differential Revision: https://reviews.llvm.org/D29672 llvm-svn: 297247	2017-03-08 00:28:57 +00:00
Daniel Sanders	1351db49b2	Fix additional constructor call missed by r297241. It was added between my build+test and my commit. llvm-svn: 297244	2017-03-07 23:32:10 +00:00
Daniel Sanders	52b4ce727a	Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241	2017-03-07 23:20:35 +00:00
Tim Northover	542d1c1463	GlobalISel: use inserts for landingpad instead of sequences. llvm-svn: 297237	2017-03-07 23:04:06 +00:00
Tim Northover	2eb18d3c4b	GlobalISel: fix legalization of G_INSERT We were calculating incorrect extract/insert offsets by trying to be too tricksy with min/max. It's clearer to just split the logic up into "register starts before this segment" vs "after". llvm-svn: 297226	2017-03-07 21:24:33 +00:00
Yaron Keren	75fadfc774	Implement FreeMachineFunction::getPassName(). llvm-svn: 297222	2017-03-07 20:59:08 +00:00
Ahmed Bougacha	55d10423a6	[GlobalISel] Don't translate intrinsics with metadata parameters. Some intrinsics take metadata parameters. These all need custom handling of some form, and cannot possibly be lowered generically to G_INTRINSIC calls with vreg operands. Reject them, instead of hitting an assert later in getOrCreateVReg. llvm-svn: 297209	2017-03-07 20:53:09 +00:00
Ahmed Bougacha	5c7924fca5	[GlobalISel] Avoid invalidating ValToVReg when translating no-op bitcast. When we translate a no-op (same type) bitcast, we try to be clever and only emit a COPY if we already assigned a vreg to the defined value. However, when we didn't, we tried to assign to a reference into the ValToVReg DenseMap, even though the RHS of the assignment (getOrCreateVReg) could potentially grow that DenseMap, invalidating the reference. Avoid that by getting the source vreg first. I audited the rest of the translator; this is the only tricky case. The test is quite unwieldy, as the problem is caused by the DenseMap growing, which happens after the 47th mapped value. llvm-svn: 297208	2017-03-07 20:53:06 +00:00
Ahmed Bougacha	38455ea8a6	[GlobalISel] Relax vector G_SELECT assertion. For vector operands, the `select` instruction supports both vector and non-vector conditions. The MIR builder had an overly restrictive assertion, that only accepted vector conditions for vector selects (in effect implementing ISD::VSELECT). Make it possible to express the full range of G_SELECTs. llvm-svn: 297207	2017-03-07 20:53:03 +00:00
Ahmed Bougacha	adce3ee219	[GlobalISel] Slightly clean up DBG_VALUE FP build code. I messed up my rebases leading to r297200, and ended up with stale (but working) code. Fix it. llvm-svn: 297205	2017-03-07 20:52:57 +00:00
Ahmed Bougacha	c373262d52	[GlobalISel] Ignore %noreg when applying default regbank mapping. When computing the mapping for non-generic instructions, we skipped %noreg operands, because we can't always reason about their banks. Also skip them when applying the mapping. Otherwise, we could end up with mappings that we can't apply. While there, duplicate an assert to distinguish between the two error conditions. llvm-svn: 297201	2017-03-07 20:34:23 +00:00
Ahmed Bougacha	4826bae8b4	[GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values. When a dbg_value has a constant operand that isn't representable in MI, there isn't much we can do. Use %noreg (0) for those situations. This matches the SelectionDAG behavior. llvm-svn: 297200	2017-03-07 20:34:20 +00:00
Arnold Schwaighofer	69e74b48f2	SjLjEHPrepare: Fix the pass for swifterror arguments We cannot leave the identity copies 'select true, arg, undef' that this pass inserts for arguments to simplify handling of values on swifterror arguments. swifterror arguments have restrictions on their uses. rdar://30839288 llvm-svn: 297197	2017-03-07 20:29:02 +00:00
Daniel Sanders	8ebec37d26	Revert r297177: Change LLT constructor string into an LLT-based object ... More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188	2017-03-07 19:21:23 +00:00
Daniel Sanders	8612326a08	[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177	2017-03-07 18:32:25 +00:00
Volkan Keles	20d3c4200d	[GlobalISel] Translate floating-point negation Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30671 llvm-svn: 297171	2017-03-07 18:03:28 +00:00
Tim Northover	c2c545b8f7	GlobalISel: restrict G_EXTRACT instruction to just one operand. A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100	2017-03-06 23:50:28 +00:00
Sanjay Patel	7f18ec50ba	[DAG] refactor related div/rem folds; NFCI This is known incomplete and not called in the right order relative to other folds, but that's the current behavior. I'm just trying to clean this up before making actual functional changes to make the patch smaller. The logic here should mimic the IR equivalents that are in InstSimplify's simplifyDivRem(). llvm-svn: 297086	2017-03-06 22:32:40 +00:00
Paul Robinson	f96e21ad6d	[DWARFv5] Update definitions to match published spec. Some late additions to DWARF v5 were not in Dwarf.def; also one form was redefined. Add the new cases to relevant switches in different parts of LLVM. Replace DW_FORM_ref_sup with DW_FORM_ref_sup[4,8]. I did not add support for DW_FORM_strx3/addrx3 other that defining the constants. We don't have any infrastructure to support these. Differential Revision: http://reviews.llvm.org/D30664 llvm-svn: 297085	2017-03-06 22:20:03 +00:00
Jessica Paquette	596f483a5e	[Outliner] Fixed Asan bot failure in r296418 Fixed the asan bot failure which led to the last commit of the outliner being reverted. The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector is no longer initialized using reserve but just a standard constructor. llvm-svn: 297081	2017-03-06 21:31:18 +00:00
Krzysztof Parzyszek	5b8fae5edd	[IfConversion] Only renormalize probabilities if branches are analyzable If a block has non-analyzable branches, the listed successors don't need to add up to one. For example, if a block has a conditional tail call, that tail call will not have a corresponding successor in the successor list, but will still be a possible branch. Differential Revision: https://reviews.llvm.org/D30556 llvm-svn: 297054	2017-03-06 19:12:42 +00:00
Tim Northover	95b6d5f2b1	GlobalISel: don't emit degenerate G_INSERT instructions. Before, we were producing G_INSERT instructions that were actually closer to a cast or even a COPY when both input and output sizes are the same. This doesn't really make sense and means that everything interpreting a G_INSERT also has to handle all these kinds of casts. So now we detect these degenerate cases and emit real casts instead. llvm-svn: 297051	2017-03-06 19:04:17 +00:00
Tim Northover	81dafc1c88	GlobalISel: add buildUndef method to MachineIRBuilder. NFC. llvm-svn: 297044	2017-03-06 18:36:40 +00:00
Tim Northover	75e0b91e59	GlobalISel: refactor legalization of G_INSERT. Now that G_INSERT instructions can only insert one register, this code was overly general. In another direction it didn't handle registers that crossed split boundaries properly, which needed to be fixed. llvm-svn: 297042	2017-03-06 18:23:04 +00:00
Sanjay Patel	7f7947bf41	[DAGCombiner] simplify div/rem-by-0 Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. llvm-svn: 297026	2017-03-06 16:36:42 +00:00
Sanjay Patel	6b029a5380	[DAG] fix formatting; NFC llvm-svn: 297015	2017-03-06 15:27:57 +00:00
Sanjay Patel	5273afd4bb	[DAG] fix typo in comment; NFC llvm-svn: 297011	2017-03-06 15:07:43 +00:00
Dean Michael Berris	7e8eea429f	[XRay] Allow logging the first argument of a function call. Summary: Functions with the "xray-log-args" attribute will have a special XRay sled kind emitted, for compiler-rt to copy any call arguments to your logging handler. For practical and performance reasons, only the first argument is supported, and only up to 64 bits. Reviewers: dberris Reviewed By: dberris Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29702 llvm-svn: 296998	2017-03-06 06:48:56 +00:00
Simon Pilgrim	584d6d9d91	[SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions Found by fuzz testing after rL296985 landed llvm-svn: 296989	2017-03-05 15:52:18 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Craig Topper	6ffc044b2f	[DAGCombine] Use APInt::operator\|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator\| support rvalue references similar to add/sub. llvm-svn: 296981	2017-03-05 01:08:16 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Florian Hahn	6406f98342	[legalize-types] Remove stale entries from SoftenedFloats. Summary: When replacing a SDValue, we should remove the replaced value from SoftenedFloats (and possibly the other maps as well?). When we revisit a Node because it needs analyzing again, we have to remove all result values from SoftenedFloats (and possibly other maps?). This fixes the fp128 test failures with expensive checks for X86. I think we probably should also remove the values from the other maps (PromotedIntegers and so on), let me know what you think. Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon Reviewed By: chh Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D29265 llvm-svn: 296964	2017-03-04 12:00:35 +00:00
Eli Friedman	49f0220086	[MISched] Remove unused arguments. NFC. llvm-svn: 296934	2017-03-04 00:42:55 +00:00
Matthias Braun	ffe40dd69e	RegAllocGreedy: Follow-up to r296722 We can now end up in situations where we initiate LiveIntervalUnion queries with different SubRanges against the same register unit, so the assert() no longer holds in all cases. Just recalculate now when we know the cache is out of date. llvm-svn: 296928	2017-03-03 23:27:20 +00:00
Tim Northover	3e6a7afd81	GlobalISel: constrain G_INSERT to inserting just one value per instruction. It's much easier to reason about single-value inserts and no-one was actually using the variadic variants before. llvm-svn: 296923	2017-03-03 23:05:47 +00:00
Tim Northover	bf017293af	GlobalISel: add merge/unmerge nodes for legalization. These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which assume the individual parts will be contiguous, homogeneous, and occupy the entirity of the larger register. This makes reasoning about them much easer since you only have to look at the first register being merged and the result to know what the instruction is doing. I intend to gradually replace all uses of the more complicated sequence/extract with these (or single-element insert/extracts), and then remove the older variants. For now we start with legalization. llvm-svn: 296921	2017-03-03 22:46:09 +00:00
Matthias Braun	a04d7ad851	RegisterCoalescer: Simplify subrange splitting code; NFC - Use slightly better variable names / compute in a more direct way. llvm-svn: 296905	2017-03-03 19:05:34 +00:00
Simon Pilgrim	6dfab414db	Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call llvm-svn: 296886	2017-03-03 17:03:52 +00:00
Simon Pilgrim	cf12b5e1a6	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296879	2017-03-03 16:35:57 +00:00
Simon Pilgrim	10754abe7e	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296871	2017-03-03 14:25:46 +00:00
Simon Pilgrim	b01bb3a1b2	Fix Wdocumentation warning llvm-svn: 296866	2017-03-03 12:09:11 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Adrian Prantl	ea8880b81f	LiveDebugValues: Assume calls never clobber SP. A call should never modify the stack pointer, but some backends are not so sure about this and never list SP in the regmask. For the purposes of LiveDebugValues we assume a call never clobbers SP. We already have a similar workaround in DbgValueHistoryCalculator (which we hopefully can retire soon). This fixes the availabilty of local ASANified variables on AArch64. rdar://problem/27757381 llvm-svn: 296847	2017-03-03 01:08:25 +00:00
Kyle Butt	1fa6030767	CodeGen: BlockPlacement: Precompute layout for chains of triangles. For chains of triangles with small join blocks that can be tail duplicated, a simple calculation of probabilities is insufficient. Tail duplication can be profitable in 3 different ways for these cases: 1) The post-dominators marked 50% are actually taken 56% (This shrinks with longer chains) 2) The chains are statically correlated. Branch probabilities have a very U-shaped distribution. [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805] If the branches in a chain are likely to be from the same side of the distribution as their predecessor, but are independent at runtime, this transformation is profitable. (Because the cost of being wrong is a small fixed cost, unlike the standard triangle layout where the cost of being wrong scales with the # of triangles.) 3) The chains are dynamically correlated. If the probability that a previous branch was taken positively influences whether the next branch will be taken We believe that 2 and 3 are common enough to justify the small margin in 1. The code pre-scans a function's CFG to identify this pattern and marks the edges so that the standard layout algorithm can use the computed results. llvm-svn: 296845	2017-03-03 01:00:22 +00:00
Taewook Oh	96c6415697	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825	2017-03-02 21:58:35 +00:00
Sanjay Patel	7884dcb788	[DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI llvm-svn: 296824	2017-03-02 21:56:43 +00:00
Kyle Butt	1393761e0c	CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic. Outlining optional branches isn't a good heuristic, and it's never been on by default. Remove it to clean things up. llvm-svn: 296818	2017-03-02 21:44:24 +00:00
Zachary Turner	d9dc2829ea	[Support] Move Stream library from MSF -> Support. After several smaller patches to get most of the core improvements finished up, this patch is a straight move and header fixup of the source. Differential Revision: https://reviews.llvm.org/D30266 llvm-svn: 296810	2017-03-02 20:52:51 +00:00
Sanjay Patel	209b0f9aad	[DAG] improve documentation comments; NFC llvm-svn: 296808	2017-03-02 20:48:08 +00:00
Simon Pilgrim	7b227fecb5	Fix some Wdocumentation warnings llvm-svn: 296783	2017-03-02 18:59:07 +00:00
Sanjay Patel	fffa179837	[DAGCombiner] avoid assertion when folding binops with opaque constants This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768	2017-03-02 17:18:56 +00:00
Sanjay Patel	f7aba7ba22	fix typo in comment; NFC llvm-svn: 296760	2017-03-02 16:37:24 +00:00
Serge Pavlov	e2bf69715f	Do not verify MachimeDominatorTree if it is not calculated If dominator tree is not calculated or is invalidated, set corresponding pointer in the pass state to nullptr. Such pointer value will indicate that operations with dominator tree are not allowed. In particular, it allows to skip verification for such pass state. The dominator tree is not calculated if the machine dominator pass was skipped, it occures in the case of entities with linkage available_externally. The change fixes some test fails observed when expensive checks are enabled. Differential Revision: https://reviews.llvm.org/D29280 llvm-svn: 296742	2017-03-02 12:00:10 +00:00
Matthias Braun	dbcf9e2ee4	LiveRegMatrix: Fix some subreg interference checks Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. llvm-svn: 296722	2017-03-02 00:35:08 +00:00
Paul Robinson	a94f76b18c	Remove spurious use of LLVM_FALLTHROUGH (NFC) llvm-svn: 296713	2017-03-01 23:59:11 +00:00
Amaury Sechet	71f511fd1e	[DAGCombiner] mulhi + 1 never overflow. Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 llvm-svn: 296711	2017-03-01 23:44:17 +00:00
Ahmed Bougacha	120ae22d70	[GlobalISel] Add a way for targets to enable GISel. Until now, we've had to use -global-isel to enable GISel. But using that on other targets that don't support it will result in an abort, as we can't build a full pipeline. Additionally, we want to experiment with enabling GISel by default for some targets: we can't just enable GISel by default, even among those target that do have some support, because the level of support varies. This first step adds an override for the target to explicitly define its level of support. For AArch64, do that using a new command-line option (I know..): -aarch64-enable-global-isel-at-O=<N> Where N is the opt-level below which GISel should be used. Default that to -1, so that we still don't enable GISel anywhere. We're not there yet! While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is such a cold path that in practice that shouldn't matter at all. llvm-svn: 296710	2017-03-01 23:33:08 +00:00
Sanjay Patel	92938657a0	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699	2017-03-01 22:51:31 +00:00
Victor Leschuk	d7bfa40ace	[DebugInfo] [DWARFv5] Unique abbrevs for DIEs with different implicit_const values Take DW_FORM_implicit_const attribute value into account when profiling DIEAbbrevData. Currently if we have two similar types with implicit_const attributes and different values we end up with only one abbrev in .debug_abbrev section. For example consider two structures: S1 with implicit_const attribute ATTR and value VAL1 and S2 with implicit_const ATTR and value VAL2. The .debug_abbrev section will contain only 1 related record: [N] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL1 // .... This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1. With this patch we will have two different abbreviations here: [N] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL1 // .... [M] DW_TAG_structure_type DW_CHILDREN_yes DW_AT_ATTR DW_FORM_implicit_const VAL2 // .... llvm-svn: 296691	2017-03-01 22:13:42 +00:00
Benjamin Kramer	0e429606b0	[DAGCombiner] Remove non-ascii character and reflow comment. llvm-svn: 296690	2017-03-01 22:10:43 +00:00
Matthias Braun	173e11439e	LIU:::Query: Query LiveRange instead of LiveInterval; NFC - We only need the information from the base class, not the additional details in the LiveInterval class. - Spread more `const` - Some code cleanup llvm-svn: 296684	2017-03-01 21:48:12 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Matthias Braun	702f55bb4a	LIU::Query: Remove always false member+getter; NFC llvm-svn: 296675	2017-03-01 21:02:52 +00:00
Nemanja Ivanovic	b223cfabcc	Improve scheduling with branch coalescing This patch adds a MachineSSA pass that coalesces blocks that branch on the same condition. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D28249 llvm-svn: 296670	2017-03-01 20:29:34 +00:00
Nirav Dave	0a4703b5ec	[DAG] Prevent Stale nodes from entering worklist Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 llvm-svn: 296668	2017-03-01 20:19:38 +00:00
Paul Robinson	d4f1c487f3	Alphabetize some cases (NFC) llvm-svn: 296655	2017-03-01 19:01:47 +00:00
Paul Robinson	91d74813a6	[DWARF] Default lower bound should respect requested DWARF version. DWARF may define a default lower-bound for arrays in languages defined in a particular DWARF version. But the logic to suppress an unnecessary lower-bound attribute was looking at the hard-coded default DWARF version, not the version that had been requested. Also updated the list with all languages defined in DWARF v5. Differential Revision: http://reviews.llvm.org/D30484 llvm-svn: 296652	2017-03-01 18:32:37 +00:00
Artur Pilipenko	e1b2d31468	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651	2017-03-01 18:12:29 +00:00
Ahmed Bougacha	20b3e9a835	[CodeGen] Remove dead FastISel code after SDAG emitted a tailcall. When SDAGISel (top-down) selects a tail-call, it skips the remainder of the block. If, before that, FastISel (bottom-up) selected some of the (no-op) next few instructions, we can end up with dead instructions following the terminator (selected by SDAGISel). We need to erase them, as we know they aren't necessary (in addition to being incorrect). We already do this when FastISel falls back on the tail-call itself. Also remove the FastISel-emitted code if we fallback on the instructions between the tail-call and the return. llvm-svn: 296552	2017-03-01 00:43:42 +00:00
Ahmed Bougacha	67d1c7c3c2	[GlobalISel] Replace all combined G_EXTRACT uses. Iterating on the use-list we're modifying doesn't work: after the first iteration, the use-list iterator will point to a MachineOperand referencing the new register. This caused us to skip the other uses to replace. Instead, use MRI.replaceRegWith(), which accounts for this behavior. llvm-svn: 296551	2017-03-01 00:43:39 +00:00
Paul Robinson	3443575c03	Add missing module/license header. NFC. llvm-svn: 296550	2017-03-01 00:14:42 +00:00
Paul Robinson	cddd60445e	[DWARFv5] Emit new unit header format. Requesting DWARF v5 will now get you the new compile-unit and type-unit headers. llvm-dwarfdump will also recognize them. Differential Revision: http://reviews.llvm.org/D30206 llvm-svn: 296514	2017-02-28 20:24:55 +00:00
Sanjay Patel	ea61ea9f19	[DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC llvm-svn: 296502	2017-02-28 18:41:49 +00:00
Craig Topper	419f145ebb	[DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. llvm-svn: 296486	2017-02-28 16:52:05 +00:00
David Bozier	5159968786	[Stack Protection] Add diagnostic information for why stack protection was applied to a function Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function. This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied. Patch by: James Henderson Differential Revision: https://reviews.llvm.org/D29023 llvm-svn: 296483	2017-02-28 16:02:37 +00:00
Daniel Sanders	983c9b98e9	Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478	2017-02-28 15:00:27 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Daniel Sanders	a5afdefec6	[globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474	2017-02-28 14:21:31 +00:00
Sanjoy Das	eef785c1a5	[ImplicitNullCheck] Add alias analysis usage Summary: With this change ImplicitNullCheck optimization uses alias analysis and can use load/store memory access for implicit null check if there are other load/store before but memory accesses do not alias. Patch by Serguei Katkov! Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30331 llvm-svn: 296440	2017-02-28 07:04:49 +00:00
Matthias Braun	81f68ec3a9	Revert "Add MIR-level outlining pass" Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426	2017-02-28 02:24:30 +00:00
Matthias Braun	d36410945f	Add MIR-level outlining pass This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418	2017-02-28 00:33:32 +00:00
Michael Kuperstein	13bf8a2684	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416	2017-02-28 00:11:34 +00:00
Zachary Turner	695ed56ba5	[PDB] Make streams carry their own endianness. Before the endianness was specified on each call to read or write of the StreamReader / StreamWriter, but in practice it's extremely rare for streams to have data encoded in multiple different endiannesses, so we should optimize for the 99% use case. This makes the code cleaner and more general, but otherwise has NFC. llvm-svn: 296415	2017-02-28 00:04:07 +00:00
Eugene Zelenko	fa912a7151	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 296404	2017-02-27 22:45:06 +00:00
Arnold Schwaighofer	b2605f31ed	ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock Otherwise, it will insert instructions before it. rdar://30536186 llvm-svn: 296395	2017-02-27 22:12:06 +00:00
Zachary Turner	120faca41b	[PDB] Partial resubmit of r296215, which improved PDB Stream Library. This was reverted because it was breaking some builds, and because of incorrect error code usage. Since the CL was large and contained many different things, I'm resubmitting it in pieces. This portion is NFC, and consists of: 1) Renaming classes to follow a consistent naming convention. 2) Fixing the const-ness of the interface methods. 3) Adding detailed doxygen comments. 4) Fixing a few instances of passing `const BinaryStream& X`. These are now passed as `BinaryStreamRef X`. llvm-svn: 296394	2017-02-27 22:11:43 +00:00
Matt Arsenault	4a7cc16e89	Revert "DAG: Check if extract_vector_elt is legal or custom" This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. llvm-svn: 296393	2017-02-27 21:59:07 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Taewook Oh	a49eb8578a	[TailDuplicator] Maintain DebugLoc for branch instructions Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions. Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie Reviewed By: dblaikie Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D30026 llvm-svn: 296371	2017-02-27 19:30:01 +00:00
Artur Pilipenko	f7196c8d9e	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336	2017-02-27 13:04:23 +00:00
Artur Pilipenko	c43b20a43b	[DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296332	2017-02-27 11:42:54 +00:00
Artur Pilipenko	f2c26e0bf2	[DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296331	2017-02-27 11:40:14 +00:00
Daniel Jasper	3ca4525612	Revert "[CGP] Split some critical edges coming out of indirect branches" This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295	2017-02-26 11:09:12 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Craig Topper	3b8aca2ecf	[ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions Summary: While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object. To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler. The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation. Fixes PR30284. Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30242 llvm-svn: 296260	2017-02-25 18:12:25 +00:00
Artyom Skrobov	ac56719231	No need to copy the variable [NFC] llvm-svn: 296259	2017-02-25 17:18:09 +00:00
NAKAMURA Takumi	05a75e40da	Revert r296215, "[PDB] General improvements to Stream library." and followings. r296215, "[PDB] General improvements to Stream library." r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily." r296220, "Re-enable BinaryStreamTest.StreamReaderObject." r296244, "[PDB] Disable some tests that are breaking bots." r296249, "Add static_cast to silence -Wc++11-narrowing." std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission. (Seek discussions around llvm/xray.) I could substitute s/no_buffer_space/others/g, but I revert whole them ATM. Could we define and use LLVM errors there? llvm-svn: 296258	2017-02-25 17:04:23 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Junmo Park	061bec802e	Minor code cleanup. NFC. llvm-svn: 296222	2017-02-25 01:50:45 +00:00
Zachary Turner	af299ea5d4	[PDB] General improvements to Stream library. This adds various new functionality and cleanup surrounding the use of the Stream library. Major changes include: * Renaming of all classes for more consistency / meaningfulness * Addition of some new methods for reading multiple values at once. * Full suite of unit tests for reader / writer functionality. * Full set of doxygen comments for all classes. * Streams now store their own endianness. * Fixed some bugs in a few of the classes that were discovered by the unit tests. llvm-svn: 296215	2017-02-25 00:44:30 +00:00
Zachary Turner	d2684b7969	[PDB] Rename Stream related source files. This is part of a larger effort to get the Stream code moved up to Support. I don't want to do it in one large patch, in part because the changes are so big that it will treat everything as file deletions and add, losing history in the process. Aside from that though, it's just a good idea in general to make small changes. So this change only changes the names of the Stream related source files, and applies necessary source fix ups. llvm-svn: 296211	2017-02-25 00:33:34 +00:00
Dan Gohman	82607f56bd	[WebAssembly] Add support for using a wasm global for the stack pointer. This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201	2017-02-24 23:46:05 +00:00
Dan Gohman	d934cb8806	[WebAssembly] Basic support for Wasm object file encoding. With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190	2017-02-24 23:18:00 +00:00
Stanislav Mekhanoshin	42259cf35e	Revert "Correct register pressure calculation in presence of subregs" This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182	2017-02-24 21:56:16 +00:00
Tim Northover	ef29e7284b	GlobalISel: check for CImm rather than Imm on G_CONSTANTs. All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm (i.e. a ConstantInt), so that's what the selector needs to look for. llvm-svn: 296176	2017-02-24 21:21:38 +00:00
Eli Friedman	c12a5a7595	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces. When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167	2017-02-24 20:51:36 +00:00
Michael Kuperstein	46b131e3f8	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149	2017-02-24 18:41:32 +00:00
Sanjay Patel	832b1622d8	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137	2017-02-24 17:17:33 +00:00
Daniel Sanders	066ebbfd46	[globalisel] Decouple src pattern operands from dst pattern operands. Summary: This isn't testable for AArch64 by itself so this patch also adds support for constant immediates in the pattern and physical register uses in the result. The new IntOperandMatcher matches the constant in patterns such as '(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold immediates into an instruction so this is the first rule that will match across multiple BB's. The Renderer hierarchy is responsible for adding operands to the result instruction. Renderers can copy operands (CopyRenderer) or add physical registers (in particular %wzr and %xzr) to the result instruction in any order (OperandMatchers now import the operand names from SelectionDAG to allow renderers to access any operand). This allows us to emit the result instruction for: %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0 %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0 although the latter is untested since the matcher/importer has not been taught about commutativity yet. Added BuildMIAction which can build new instructions and mutate them where possible. W.r.t the mutation aspect, MatchActions are now told the name of an instruction they can recycle and BuildMIAction will emit mutation code when the renderers are appropriate. They are appropriate when all operands are rendered using CopyRenderer and the indices are the same as the matcher. This currently assumes that all operands have at least one matcher. Finally, this change also fixes a crash in AArch64InstructionSelector::select() caused by an immediate operand passing isImm() rather than isCImm(). This was uncovered by the other changes and was detected by existing tests. Depends on D29711 Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar Reviewed By: rovka Subscribers: aemerson, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29712 llvm-svn: 296131	2017-02-24 15:43:30 +00:00
Justin Bogner	259a0cf3ee	Add missing initialization for MachineOptimizationRemarkEmitter This was missed in r293110. llvm-svn: 296096	2017-02-24 07:42:35 +00:00
Craig Topper	c446b1ffae	[ExecutionDepsFix] Use range-based for loop. NFC llvm-svn: 296093	2017-02-24 06:38:24 +00:00
Michael Kuperstein	581c9f4b20	Revert r269060 to pacify bots. llvm-svn: 296064	2017-02-24 01:22:19 +00:00
Michael Kuperstein	12e79d5002	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060	2017-02-24 00:56:21 +00:00
Ahmed Bougacha	7daaf88c70	[GlobalISel] Use the same name for all remarks. While there, switch to the explicit ctor. llvm-svn: 296059	2017-02-24 00:34:47 +00:00
Ahmed Bougacha	7c88a4e12b	[GlobalISel] Use the DISubprogram for translation failure remarks. Justin added support for DISubprogram locs in r295531 and r296052. Use that instead of no-loc for constants and arguments. llvm-svn: 296058	2017-02-24 00:34:44 +00:00
Ahmed Bougacha	8f9e99bcb6	[GlobalISel] Remove now-unnecessary variable. NFC. Since r296047, we're able to return early on failures. Don't track whether we succeeded. llvm-svn: 296057	2017-02-24 00:34:41 +00:00
Justin Bogner	369ba753aa	OptDiag: Summarize the instruction count in asm-printer Add an optimization remark to asm-printer that summarizes the number of instructions emitted per function. llvm-svn: 296053	2017-02-24 00:19:22 +00:00
Ahmed Bougacha	4f8dd0202d	[GlobalISel] Don't translate other blocks when one failed. We were stopping the translation of the parent block when the translation of an instruction failed, but we were still trying to translate the other blocks of the parent function. Don't do that. llvm-svn: 296047	2017-02-23 23:57:36 +00:00
Ahmed Bougacha	eceabddcfd	[GlobalISel] Finalize translated function on scope exit. NFC. This is the compromise between having a per-function IRTranslator and manually managing the per-function state. llvm-svn: 296046	2017-02-23 23:57:28 +00:00
Kyle Butt	ebe6cc4dad	CodeGen: MachineBlockPlacement: Rename member to more general name. NFC. Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of pre-computing edges. Differential Revision: https://reviews.llvm.org/D30308 llvm-svn: 296018	2017-02-23 21:22:24 +00:00
Ahmed Bougacha	ae9dadecf3	[GlobalISel] Emit opt remarks on isel fallbacks. Having more fine-grained information on the specific construct that caused us to fallback is valuable for large-scale data collection. We still have the fallback warning, that's also used for FastISel. We still need to remove the fallback warning, and teach FastISel to also emit remarks (it currently has a combination of the warning, stats, and debug prints: the remarks could unify all three). The abort-on-fallback path could also be better handled using remarks: one could imagine a "-Rpass-error", analoguous to "-Werror", which would promote missed/failed remarks to errors. It's not clear whether that would be useful for other remarks though, so we're not there yet. llvm-svn: 296013	2017-02-23 21:05:42 +00:00
Ahmed Bougacha	272739751d	[CodeGen] Teach opt remarks how to print MI instructions. This will be used with GISel opt remarks. llvm-svn: 296012	2017-02-23 21:05:33 +00:00
Ahmed Bougacha	97119d48db	[CodeGen] Print MI without a newline when skipping debugloc. NFC. This matches the behavior for skip-operands. While there, document it. This is a follow-up to r296007. llvm-svn: 296011	2017-02-23 21:05:29 +00:00
Stanislav Mekhanoshin	ce3ddd2de4	Correct register pressure calculation in presence of subregs If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009	2017-02-23 20:19:44 +00:00
Ahmed Bougacha	4319224628	[CodeGen] Add a way to SkipDebugLoc in MachineInstr::print(). NFC. llvm-svn: 296007	2017-02-23 19:17:31 +00:00
Ahmed Bougacha	1b3339447d	[GlobalISel] Simplify Select type cleanup using a ScopeExit. NFC. This lets us use more natural early-returns when selection fails. llvm-svn: 296006	2017-02-23 19:17:24 +00:00
Sanjay Patel	4a4fbe162f	[DAG] add convenience function to get -1 constant; NFCI llvm-svn: 296004	2017-02-23 19:02:33 +00:00
Adam Nemet	b516cf3f3f	[LazyMachineBFI] Reimplement with getAnalysisIfAvailable Since LoopInfo is not available in machine passes as universally as in IR passes, using the same approach for OptimizationRemarkEmitter as we did for IR will run LoopInfo and DominatorTree unnecessarily. (LoopInfo is not used lazily by ORE.) To fix this, I am modifying the approach I took in D29836. LazyMachineBFI now uses its client passes including MachineBFI itself that are available or otherwise compute them on the fly. So for example GreedyRegAlloc, since it's already using MBFI, will reuse that instance. On the other hand, AsmPrinter in Justin's patch will generate DT, LI and finally BFI on the fly. (I am of course wondering now if the simplicity of this approach is even preferable in IR. I will do some experiments.) Testing is provided by an updated version of D29837 which requires Justin's patch to bring ORE to the AsmPrinter. Differential Revision: https://reviews.llvm.org/D30128 llvm-svn: 295996	2017-02-23 17:30:01 +00:00
Simon Pilgrim	858d8e672d	Fix signed/unsigned comparison warning on MSVC llvm-svn: 295962	2017-02-23 12:00:34 +00:00
Eugene Zelenko	db56e5a89a	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295893	2017-02-22 22:32:51 +00:00
Bill Seurer	8e48f416ad	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Igor Breger	f7359d893a	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824	2017-02-22 12:25:09 +00:00
Dan Gohman	18eafb6c68	[WebAssembly] Add skeleton MC support for the Wasm container format This just adds the basic skeleton for supporting a new object file format. All of the actual encoding will be implemented in followup patches. Differential Revision: https://reviews.llvm.org/D26722 llvm-svn: 295803	2017-02-22 01:23:18 +00:00
Matt Arsenault	f0a4823b91	DAG: Check if extract_vector_elt is legal or custom Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782	2017-02-21 22:47:27 +00:00
Eugene Zelenko	49e2fc4f5f	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295773	2017-02-21 22:07:52 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Matthias Braun	9ab403942b	ScheduleDAG: Cleanup; NFC - Fix doxygen comments (do not repeat documented name, remove definition comment if there is already one at the declaration, add \p, ...) - Add some const modifiers - Use range based for llvm-svn: 295688	2017-02-21 01:27:33 +00:00
Taewook Oh	4cf5c1087c	[BranchFolding] Update debug location along with the update of branch instruction. Summary: Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of ``` !12 = !DILocation(line: 6, column: 3, scope: !11) ``` , but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue. Reviewers: qcolombet, aprantl, craig.topper, MatzeB Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29902 llvm-svn: 295684	2017-02-21 00:12:38 +00:00
Simon Pilgrim	c0dc9a4913	Strip trailing whitespace. llvm-svn: 295653	2017-02-20 11:56:43 +00:00
Simon Pilgrim	50b958c07a	[SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes. Thanks to Mikael Holmén for the initial test case llvm-svn: 295652	2017-02-20 11:55:58 +00:00
Artyom Skrobov	be31754094	Remove redundant call to GluedNodes.back() [NFC] llvm-svn: 295607	2017-02-19 16:56:18 +00:00
Matthias Braun	431305927f	MachineRegionInfo: Fix pass initialization - Adapt MachineBasicBlock::getName() to have the same behavior as the IR BasicBlock (Value::getName()). - Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in the CodeGen library. - MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region"). - MachineRegionInfo should depend on MachineDominatorTree, MachinePostDominatorTree and MachineDominanceFrontier instead of their respective IR versions. - Since there were no tests for this, add a X86 MIR test. Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com> llvm-svn: 295518	2017-02-18 00:41:16 +00:00
Eugene Zelenko	be37db1882	[CodeGen] Revert changes in LowLevelType to pre-r295499 to fix broken buildbots. llvm-svn: 295505	2017-02-17 22:23:34 +00:00
Eugene Zelenko	5db84df728	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295499	2017-02-17 21:43:25 +00:00
Adrian Prantl	67c2442210	Debug Info: Sort frame index expressions before emitting them. This fixes PR31381, which caused an assertion and/or invalid debug info. This affects debug variables that have multiple fragments in the MMI side (i.e.: in the stack frame) table. rdar://problem/30571676 llvm-svn: 295486	2017-02-17 19:42:32 +00:00
Tim Northover	88634996c7	GlobalISel: verify that generic loads & stores have a mem operand. The mem operand is used by GlobalISel to convey atomic constraints so dropping it is invalid. llvm-svn: 295476	2017-02-17 18:50:15 +00:00
Sanjay Patel	7f2e58972c	[DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI I can't find any tests of the non-i1 code path, so it may be unnecessary at this point. llvm-svn: 295463	2017-02-17 17:13:27 +00:00
Simon Pilgrim	0429c0cf8b	Fix signed/unsigned comparison warning. llvm-svn: 295453	2017-02-17 16:01:16 +00:00
Simon Pilgrim	511d788a95	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451	2017-02-17 15:14:48 +00:00
Sanjay Patel	5573042035	[DAGCombiner] improve readability; NFCI llvm-svn: 295447	2017-02-17 14:21:59 +00:00
Teresa Johnson	76b5b7493c	Handle link of NoDebug CU with a CU that has debug emission enabled Summary: This is an issue both with regular and Thin LTO. When we link together a DICompileUnit that is marked NoDebug (e.g when compiling with -g0 but applying an AutoFDO profile, which requires location tracking in the compiler) and a DICompileUnit with debug emission enabled, we can have failures during dwarf debug generation. Specifically, when we have inlined from the NoDebug compile unit into the debug compile unit, we can fail during construction of the abstract and inlined scope DIEs. This is because the SPMap does not include NoDebug CUs (they are skipped in the debug_compile_units_iterator). This patch fixes the failures by skipping locations from NoDebug CUs when extracting lexical scopes. Reviewers: dblaikie, aprantl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D29765 llvm-svn: 295384	2017-02-17 00:21:19 +00:00
Benjamin Kramer	3f6260cab4	[MachinePipeliner] Remove redundant destructor. NFC. llvm-svn: 295372	2017-02-16 20:26:51 +00:00
David Blaikie	b2fbb4b276	Refactor DebugHandlerBase a bit to common non-debug-having-function filtering llvm-svn: 295354	2017-02-16 18:48:33 +00:00
Artur Pilipenko	85d758299e	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Diana Picus	ca6a890d7f	[ARM] GlobalISel: Lower double precision FP args For the hard float calling convention, we just use the D registers. For the soft-fp calling convention, we use the R registers and move values to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we make sure to honor the endianness of the target, since the CCAssignFn doesn't do that for us. For pure soft float targets, we still bail out because we don't support the libcalls yet. llvm-svn: 295295	2017-02-16 07:53:07 +00:00
Hans Wennborg	a468601e0e	[X86] Re-enable conditional tail calls and fix PR31257. This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262	2017-02-16 00:04:05 +00:00
Tim Northover	9136617a3f	GlobalISel: legalize va_arg on AArch64. Uses a Custom implementation because the slot sizes being a multiple of the pointer size isn't really universal, even for the architectures that do have a simple "void *" va_list. llvm-svn: 295255	2017-02-15 23:22:50 +00:00
Tim Northover	4a652227dd	GlobalISel: support translating va_arg Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also need to attach the required alignment info. llvm-svn: 295254	2017-02-15 23:22:33 +00:00
Matt Arsenault	900b21c350	Fix typos llvm-svn: 295246	2017-02-15 22:19:06 +00:00
Matt Arsenault	5de8dc9cf5	DAG: Do not scalarize fsub if fneg is legal Tests will be included with future commit. llvm-svn: 295242	2017-02-15 22:02:42 +00:00
Kyle Butt	7fbec9bdf1	Codegen: Make chains from trellis-shaped CFGs Lay out trellis-shaped CFGs optimally. A trellis of the shape below: A B \|\ /\| \| \ / \| \| X \| \| / \ \| \|/ \\| C D would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors. because of this we can tail duplicate to extend existing trellises. As an example consider the following CFG: B D F H / \ / \ / \ / \ A---C---E---G---Ret Where A,C,E,G are all small (Currently 2 instructions). The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret. The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void } here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly. This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks. Here is the resulting concrete layout: straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit Differential Revision: https://reviews.llvm.org/D28522 llvm-svn: 295223	2017-02-15 19:49:14 +00:00
Xinliang David Li	538d666814	include function name in dot filename Differential Revision: http://reviews.llvm.org/D29975 llvm-svn: 295220	2017-02-15 19:21:04 +00:00
Michael Kuperstein	ba80db39d7	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213	2017-02-15 18:37:26 +00:00
Sagar Thakur	ec65792910	[LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit. Reviewed by sdardis, dberris Differential: D27697 llvm-svn: 295164	2017-02-15 10:48:11 +00:00
Craig Topper	96ec7a23e3	[SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask Summary: The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract. This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract. Reviewers: zvi, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29926 llvm-svn: 295152	2017-02-15 05:57:16 +00:00
Reid Kleckner	a622fc9bdf	[BranchFolding] Tail common all identical unreachable blocks Summary: Blocks ending in unreachable are typically cold because they end the program or throw an exception, so merging them with other identical blocks is usually profitable because it reduces the size of cold code. MachineBlockPlacement generally does not arrange to fall through to such blocks, so commoning these blocks will not introduce additional unconditional branches. Reviewers: hans, iteratee, haicheng Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29153 llvm-svn: 295105	2017-02-14 21:02:24 +00:00
Tim Northover	c2f8956313	GlobalISel: introduce G_PTR_MASK to simplify alloca handling. This instruction clears the low bits of a pointer without requiring (possibly dodgy if pointers aren't ints) conversions to and from an integer. Since (as far as I'm aware) all masks are statically known, the instruction takes an immediate operand rather than a register to specify the mask. llvm-svn: 295103	2017-02-14 20:56:18 +00:00
Eric Christopher	14303d1815	Reformat slightly. llvm-svn: 295096	2017-02-14 19:43:50 +00:00
Wolfgang Pieb	399dcfaa2a	Reapply r294532, reverted in r294787. Store instructions can have more than one memory operand as a result of optimizations that fold different stores into one. When we identify spill instructions to generate DBG_VALUE instructions to record the spilling of a variable, we disregard stores with multiple memory operands for now. We may miss some relevant spills but the handling is a bit more complex, so we'll do it in a different patch. This fixes PR31935. llvm-svn: 295093	2017-02-14 19:08:45 +00:00
Aditya Nandakumar	bb0483bc8e	[Tablegen] Instrumenting table gen DAGGenISelDAG To help assist in debugging ISEL or to prioritize GlobalISel backend work, this patch adds two more tables to <Target>GenISelDAGISel.inc - one which contains the patterns that are used during selection and the other containing include source location of the patterns Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV llvm-svn: 295081	2017-02-14 18:32:41 +00:00
Adam Nemet	bbb141c734	Add new pass LazyMachineBlockFrequencyInfo And use it in MachineOptimizationRemarkEmitter. A test will follow on top of Justin's changes to enable MachineORE in AsmPrinter. The approach is similar to the IR-level pass. It's a bit simpler because BPI is immutable at the Machine level so we don't need to make that lazy. Because of this, a new function mapping is introduced (BPIPassTrait::getBPI). This function extracts BPI from the pass. In case of the lazy pass, this is when the calculation of the BFI occurs. For Machine-level, this is the identity function. Differential Revision: https://reviews.llvm.org/D29836 llvm-svn: 295072	2017-02-14 17:21:09 +00:00
Artyom Skrobov	dc66a82dc7	Removing a redundant assignment llvm-svn: 295055	2017-02-14 14:44:01 +00:00
Eugene Zelenko	d96089b248	[MC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). Same changes in files affected by reduced MC headers dependencies. llvm-svn: 295009	2017-02-14 00:33:36 +00:00

... 3 4 5 6 7 ...

22465 Commits