llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	0fd5cdee3a	[X86] Add isel patterns for commuting X86adc_flag with a load in the LHS. The peephole pass likely gets this normally, but we should be doing it during isel. Ideally we'd just make the X86adc_flag pattern SDNPCommutable, but the tablegen doesn't handle that when one of the operands is a register reference. llvm-svn: 341596	2018-09-06 22:41:44 +00:00
Sanjay Patel	40aa86751a	[x86] add debug option for and-immediate shrinking The commit that added this functionality: rL322957 may be causing/exposing a miscompile in PR38648: https://bugs.llvm.org/show_bug.cgi?id=38648 so allow enabling/disabling to make debugging easier. llvm-svn: 340540	2018-08-23 15:58:07 +00:00
Chandler Carruth	ae0cafece8	[x86/retpoline] Split the LLVM concept of retpolines into separate subtarget features for indirect calls and indirect branches. This is in preparation for enabling only the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515	2018-08-23 06:06:38 +00:00
Craig Topper	538f8ab438	[X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can. This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct. To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order. Differential Revision: https://reviews.llvm.org/D48789 llvm-svn: 340441	2018-08-22 19:39:09 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Craig Topper	a7a12399a1	[X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824	2018-08-03 07:01:10 +00:00
Craig Topper	a80352c04e	[X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded. If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX. llvm-svn: 338812	2018-08-03 04:49:42 +00:00
Reid Kleckner	980c4df037	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740	2018-07-23 21:14:35 +00:00
Craig Topper	abc307e6fa	[X86] Connect the flags user from PCMPISTR instructions to the correct node from the instruction. We were accidentally connecting it to result 0 instead of result 1. This was caught by the machine verifier that noticed the flags were dead, but we were using them somehow. I'm still not clear what actually happened downstream. llvm-svn: 336925	2018-07-12 18:04:05 +00:00
Craig Topper	38b290f7d7	[X86] Remove patterns for inserting a load into a zero vector. We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move. This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load. llvm-svn: 336828	2018-07-11 18:09:04 +00:00
Craig Topper	08b81a5508	[X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority. This is a much more direct way to solve the issue than just giving extra priority. llvm-svn: 336639	2018-07-10 06:19:54 +00:00
Craig Topper	90317d1d94	[X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895	2018-06-28 17:58:01 +00:00
Jonas Devlieghere	b757fc3878	Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894	2018-06-28 17:56:43 +00:00
Craig Topper	ab70f58891	[X86] Change how we prefer shift by immediate over folding a load into a shift. BMI2 added new shift by register instructions that have the ability to fold a load. Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load. We used to enforce this priority by artificially lowering the complexity of the load pattern. This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky. llvm-svn: 335804	2018-06-28 00:47:41 +00:00
Craig Topper	880e34ed45	[X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it. If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed. Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode. llvm-svn: 335787	2018-06-27 20:58:46 +00:00
Reid Kleckner	88fee5fdbc	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508	2018-06-25 18:16:27 +00:00
Reid Kleckner	3a2fd1c2f3	Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models" MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300	2018-06-21 22:19:05 +00:00
Reid Kleckner	247fe6aeab	[X86] Implement more of x86-64 large and medium PIC code models Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297	2018-06-21 21:55:08 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Craig Topper	b0e986f88e	[X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908	2018-06-17 16:29:46 +00:00
Craig Topper	3efdb7ce19	[X86] Push some variable declarations down into the individual switch cases that need them. NFC All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope. llvm-svn: 334436	2018-06-11 20:50:58 +00:00
Simon Pilgrim	3d14158891	[X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042) Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH). Differential Revision: https://reviews.llvm.org/D47690 llvm-svn: 334083	2018-06-06 10:52:10 +00:00
Reid Kleckner	537917d13c	[X86] Simplify some X86 address mode folding code, NFCI This code should really do exactly the same thing for 32-bit x86 and 64-bit small code models, with the exception that RIP-relative addressing can't use base and index registers. llvm-svn: 332893	2018-05-21 21:03:19 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Nico Weber	432a38838d	IWYU for llvm-config.h in llvm, additions. See r331124 for how I made a list of files missing the include. I then ran this Python script: for f in open('filelist.txt'): f = f.strip() fl = open(f).readlines() found = False for i in xrange(len(fl)): p = '#include "llvm/' if not fl[i].startswith(p): continue if fl[i][len(p):] > 'Config': fl.insert(i, '#include "llvm/Config/llvm-config.h"\n') found = True break if not found: print 'not found', f else: open(f, 'w').write(''.join(fl)) and then looked through everything with `svn diff \| diffstat -l \| xargs -n 1000 gvim -p` and tried to fix include ordering and whatnot. No intended behavior change. llvm-svn: 331184	2018-04-30 14:59:11 +00:00
Craig Topper	d656410293	[X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block. Summary: Previously the flag intrinsics always used the index instructions even if a mask instruction also exists. To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction. Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs. I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions. Reviewers: chandlerc, RKSimon, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46202 llvm-svn: 331091	2018-04-27 22:15:33 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Craig Topper	88e38e3e3e	[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152	2018-04-04 07:00:16 +00:00
Craig Topper	afa22edcf0	[X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147	2018-04-04 04:38:55 +00:00
Nirav Dave	8c5f47ac40	[DAG, X86] Fix ISel-time node insertion ids As in SystemZ backend, correctly propagate node ids when inserting new unselected nodes into the DAG during instruction Seleciton for X86 target. Fixes PR36865. Reviewers: jyknight, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D44797 llvm-svn: 328233	2018-03-22 19:32:07 +00:00
Craig Topper	ad7c685791	[X86] Rename MOVSX32_NOREXrr8 to MOVSX32rr8_NOREX so that the scheduler model regular expressions will pick it up with the regular version. Do the same for MOVSX32_NOREXrm8, MOVZX32_NOREXrr8, and MOVZX32_NOREXrm8 llvm-svn: 327948	2018-03-20 05:00:20 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Craig Topper	25007c4f32	[X86] Pass SelectionDAG into X86ISelAddressMode::dump and on to SDNode::dump. This prevents a crash in SelectionDAGDumper with -debug when trying to print mem operands if one of the registers in the addressing mode comes from a load. llvm-svn: 327744	2018-03-16 21:10:07 +00:00
Craig Topper	e6913ec340	[X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724	2018-03-16 17:13:42 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Nirav Dave	0fab41782d	Correct load-op-store cycle detection analysis Add missing cycle dependency checks in load-op-store fusion. Fixes PR36274. Reviewers: craig.topper, bogner Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43154 llvm-svn: 327172	2018-03-09 20:58:07 +00:00
Nirav Dave	d668f69ee7	Improve Dependency analysis when doing multi-node Instruction Selection Relanding after fixing NodeId Invariant. Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 327171	2018-03-09 20:57:42 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Craig Topper	48d5ed265c	[X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308	2018-02-28 08:14:28 +00:00
Chandler Carruth	a1d6107b14	[DAG, X86] Revert r324797, r324491, and r324359. Sadly, r324359 caused at least PR36312. There is a patch out for review but it seems to be taking a bit and we've already had these crashers in tree for too long. We're hitting this PR in real code now and are blocked on shipping new compilers as a consequence so I'm reverting us back to green. Sorry for the churn due to the stacked changes that I had to revert. =/ llvm-svn: 325420	2018-02-17 02:26:25 +00:00
Craig Topper	2b2d8c5eb2	[X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation. We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions. For now only do this for optsize. Differential Revision: https://reviews.llvm.org/D37418 llvm-svn: 325287	2018-02-15 19:57:35 +00:00
Craig Topper	88939fefe8	[X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node. This removes a bunch of special case code for selecting the immediate and folding loads. llvm-svn: 324939	2018-02-12 21:18:11 +00:00
Craig Topper	b424fafa9f	[X86] Don't look for TEST instruction shrinking opportunities when the root node is a X86ISD::SUB. I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots. llvm-svn: 324865	2018-02-12 03:02:02 +00:00
Craig Topper	3ccbd3f32f	[X86] Remove check for X86ISD::AND with no flag users from the TEST instruction immediate shrinking code. We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG. llvm-svn: 324864	2018-02-12 03:02:01 +00:00
Nirav Dave	c8c9d4fe35	[DAG] Make early exit hasPredecessorHelper return true. NFCI. All uses conservatively assume in early exit case that it will be a predecessor. Changing default removes checking code in all uses. llvm-svn: 324797	2018-02-10 02:41:22 +00:00
Nirav Dave	27721e8617	[DAG, X86] Improve Dependency analysis when doing multi-node Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359	2018-02-06 16:14:29 +00:00
Craig Topper	57e0643160	[X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes of a 64 bit mask. If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops. This patch teachs shrinkAndImmediate not to break that optimization. Differential Revision: https://reviews.llvm.org/D42899 llvm-svn: 324249	2018-02-05 16:54:07 +00:00
Craig Topper	7e910a9e85	[X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just before isel to enable test instruction matching Summary: EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass. This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel. In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working. Reviewers: spatel, RKSimon, niravd, deadalnix Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42764 llvm-svn: 323982	2018-02-01 17:08:39 +00:00
Amaury Sechet	f9a9e9a251	[X86] Generate testl instruction through truncates. Summary: This was introduced in D42646 but ended up being reverted because the original implementation was buggy. Depends on D42646 Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42741 llvm-svn: 323899	2018-01-31 19:20:06 +00:00
Amaury Sechet	f89f188ddb	[X86] Avoid using high register trick for test instruction Summary: It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger. Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42646 llvm-svn: 323888	2018-01-31 16:48:54 +00:00
Eric Liu	0b69b5ed85	Revert "[X86] Avoid using high register trick for test instruction" This reverts commit r323690. This causes crash in llc. See the original commit thread for details. llvm-svn: 323761	2018-01-30 14:18:33 +00:00
Amaury Sechet	015184b79e	[X86] Avoid using high register trick for test instruction Summary: It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger. The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 . Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42646 llvm-svn: 323690	2018-01-29 20:54:33 +00:00
Craig Topper	15d69739e2	[X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm eq/ne with immallzeros. llvm-svn: 323612	2018-01-28 00:56:30 +00:00
Craig Topper	513d3fa674	[X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604	2018-01-27 20:19:02 +00:00
Craig Topper	8f324bb1a4	[SelectionDAGISel] Add a debug print before call to Select. Adjust where blank lines are printed during isel process to make things more sensibly grouped. Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table. It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search. There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line. llvm-svn: 323551	2018-01-26 19:34:20 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Sanjay Patel	74a1eef7c4	[x86] shrink 'and' immediate values by setting the high bits (PR35907) Try to reverse the constant-shrinking that happens in SimplifyDemandedBits() for 'and' masks when it results in a smaller sign-extended immediate. We are also able to detect dead 'and' ops here (the mask is all ones). In that case, we replace and return without selecting the 'and'. Other targets might want to share some of this logic by enabling this under a target hook, but I didn't see diffs for simple cases with PowerPC or AArch64, so they may already have some specialized logic for this kind of thing or have different needs. This should solve PR35907: https://bugs.llvm.org/show_bug.cgi?id=35907 Differential Revision: https://reviews.llvm.org/D42088 llvm-svn: 322957	2018-01-19 16:37:25 +00:00
Nirav Dave	72d32f24f5	[X86] Extend load-op-store fusion merge to ADC/SBB. Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42128 llvm-svn: 322952	2018-01-19 15:37:57 +00:00
Craig Topper	af4eb17223	[SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend. Most of this patch is just making sure we copy the scale around everywhere. Differential Revision: https://reviews.llvm.org/D40055 llvm-svn: 322210	2018-01-10 19:16:05 +00:00
Craig Topper	d58c165545	[X86] Make v2i1 and v4i1 legal types without VLX Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967	2018-01-07 18:20:37 +00:00
Craig Topper	eff84ed204	[X86] Improve the printing of address mode during isel matching. Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode. llvm-svn: 321368	2017-12-22 17:18:10 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Michael Zolotukhin	ad24af7f58	Remove redundant includes from lib/Target/X86. llvm-svn: 320636	2017-12-13 21:31:19 +00:00
Matt Morehouse	9e658c974b	Revert "[X86] Improvement in CodeGen instruction selection for LEAs." This reverts r319543, due to ASan bot breakage. llvm-svn: 319591	2017-12-01 22:20:26 +00:00
Jatin Bhateja	328199ec26	[X86] Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will now look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put. PR32755 will be taken care of by this pathc. Previous patch revisions : r313343 , r314886 Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja Reviewed By: lsaba, RKSimon, jbhateja Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 319543	2017-12-01 14:07:38 +00:00
Craig Topper	f31b0b850b	[X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register. llvm-svn: 318933	2017-11-23 18:41:21 +00:00
Craig Topper	ee74044f93	[X86] Add an X86ISD::MSCATTER node for consistency with the X86ISD::MGATHER. This makes the fact that X86 needs an explicit mask output not part of the type constraint for the ISD::MSCATTER. This also gives the X86ISD::MGATHER/MSCATTER nodes a common base class simplifying the address selection code in X86ISelDAGToDAG.cpp llvm-svn: 318823	2017-11-22 08:10:54 +00:00
Craig Topper	c314f461dd	[X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter address If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address. This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase. We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it. Differential Revision: https://reviews.llvm.org/D39927 llvm-svn: 318057	2017-11-13 17:53:59 +00:00
Craig Topper	bb001c6ddc	[X86] Merge the template method selectAddrOfGatherScatterNode into selectVectorAddr. NFCI Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method. llvm-svn: 317915	2017-11-10 19:26:04 +00:00
Craig Topper	61f81f9637	[X86] Preserve memory refs when folding loads into divides. This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load. llvm-svn: 317732	2017-11-08 22:26:39 +00:00
Craig Topper	55029d811f	[X86] Remove an if check on the result of a cast. NFC cast takes a non-null input and produces a non-null output. So this if can never fail. llvm-svn: 317731	2017-11-08 22:26:37 +00:00
Craig Topper	78a770402a	[X86] Correct the implementation of BEXTR load folding to use the shift as the parent node and pass a separate root. We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node. To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites. llvm-svn: 317720	2017-11-08 20:17:33 +00:00
Uriel Korach	bb86686a8b	[X86][AVX512] Improve lowering of AVX512 test intrinsics Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465	2017-11-06 09:22:38 +00:00
Uriel Korach	eb47d95d52	[X86] Replace duplicate function call with variable. NFC Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 \|\| N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 \|\| OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464	2017-11-06 08:32:45 +00:00
Craig Topper	40f0584f08	[X86] Fix a mistake in the X86ISelDAGToDAG.cpp code for MUL8r/IMUL8r. I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo. llvm-svn: 316840	2017-10-28 19:56:57 +00:00
Craig Topper	b8d7d4d683	[X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702	2017-10-26 21:12:03 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Craig Topper	9563cab961	[X86] Simplify some code in getInsertVINSERTImmediate and getExtractVEXTRACTImmediate. NFC Replace one of the divides with a multiply. llvm-svn: 315162	2017-10-08 01:33:42 +00:00
Hans Wennborg	2a6c9adb2f	Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919	2017-10-04 17:54:06 +00:00
Jatin Bhateja	3c29bacd43	[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886	2017-10-04 09:02:10 +00:00
Craig Topper	6255c7b675	[X86] Don't select (cmp (and, imm), 0) to testw Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474	2017-09-28 23:35:36 +00:00
Craig Topper	fd6b8a67fb	[X86] Remove dead code from X86ISelDAGToDAG.cpp multiply handling Summary: Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash. DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38276 llvm-svn: 314430	2017-09-28 16:56:36 +00:00
Craig Topper	ba3cc2e0da	[AVX-512] Replace large number of explicit patterns that check for insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133	2017-09-25 18:43:13 +00:00
Craig Topper	092c2f4357	[X86] Move the getInsertVINSERTImmediate and getExtractVEXTRACTImmediate helper functions over to X86ISelDAGToDAG.cpp Redefine them to call getI8Imm and return that directly. llvm-svn: 314059	2017-09-23 05:34:07 +00:00
Craig Topper	75370b9b49	[X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction selection to avoid duplicate patterns Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns. Differential Revision: https://reviews.llvm.org/D38022 llvm-svn: 313644	2017-09-19 17:19:45 +00:00
Craig Topper	e92327e236	[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets. Differential Revision: https://reviews.llvm.org/D37890 llvm-svn: 313558	2017-09-18 19:21:21 +00:00
Craig Topper	b2155159a8	[X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bit_hi I'm pretty sure that InstrEmitter::EmitSubregNode will take care of this itself by calling ConstrainForSubReg which in turn calls TRI->getSubClassWithSubReg. I think Jakob Stoklund Olesen alluded to this in his commit message for r141207 which added the code to EmitSubregNode. Differential Revision: https://reviews.llvm.org/D37843 llvm-svn: 313557	2017-09-18 19:21:19 +00:00
Hans Wennborg	534bfbd3ba	Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376	2017-09-15 18:40:26 +00:00
Jatin Bhateja	908c8b37c2	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343	2017-09-15 05:29:51 +00:00
Craig Topper	2b6bfda561	[X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126	2017-09-13 07:53:21 +00:00
Craig Topper	0a3bcebcc2	[X86] Use isUInt<32> to simplify some code. NFC llvm-svn: 313112	2017-09-13 02:29:59 +00:00
Craig Topper	958106d0f1	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Craig Topper	6bed9de3d5	[X86] Call removeDeadNode when we're done doing custom isel for mul, div and test Summary: Once we've done our custom isel for these nodes, I think we should be calling removeDeadNode to prune them out of the DAG. Table driven isel ultimately either calls morphNodeTo which modifies a node and doesn't leave dead nodes. Or it emits new nodes and then calls removeDeadNode as part of Opc_CompleteMatch. If you run a simple multiply test case like this through llc with -debug you'll see a umul_lohi node get printed as part of the dump for Instruction Selection ends. ``` define i64 @foo(i64 %a, i64 %b) local_unnamed_addr #0 { entry: %conv = zext i64 %a to i128 %conv1 = zext i64 %b to i128 %mul = mul nuw nsw i128 %conv1, %conv %shr = lshr i128 %mul, 64 %conv2 = trunc i128 %shr to i64 ret i64 %conv2 } ``` Reviewers: RKSimon, spatel, zvi, guyblank, niravd Reviewed By: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37547 llvm-svn: 312857	2017-09-09 05:57:20 +00:00
Craig Topper	63c5047a4e	[X86] Use ReplaceNode instead of ReplaceUses when converting X86ISD::SHRUNKBLEND to ISD::VSELECT during isel. This ensures that the SHRUNKBLEND node gets erased immediately. llvm-svn: 312856	2017-09-09 05:57:19 +00:00
Chandler Carruth	38e2b506db	[x86] Fix GCC pedantic warnings about default arguments for lambdas. llvm-svn: 312809	2017-09-08 18:23:42 +00:00
Chandler Carruth	acbcf06f03	[x86] Flesh out the custom ISel for RMW aritmetic ops with used flags to cover the bitwise operators. Nothing really exciting here, this just stamps out the rest of the core operations that can RMW memory and set flags. Still not implemented here: ADC, SBB. Those will require more interesting logic to channel the flags in, and I'm not currently planning to try to tackle that. It might be interesting for someone who wants to improve our code generation for bignum implementations. Differential Revision: https://reviews.llvm.org/D37141 llvm-svn: 312768	2017-09-08 00:17:12 +00:00

1 2 3 4 5 ...

818 Commits