llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	1f1bbac8da	[peephole] Enhance folding logic to work for STATEPOINTs The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510	2016-12-13 01:38:41 +00:00
Sanjay Patel	62104ee6d9	[x86] fix formatting; NFC llvm-svn: 289476	2016-12-12 22:31:01 +00:00
Eugene Zelenko	6a9226d9b8	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289475	2016-12-12 22:23:53 +00:00
Guozhi Wei	1fd553c934	[PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473	2016-12-12 22:09:02 +00:00
Simon Atanasyan	5048514c20	[mips] For PIC code convert unconditional jump to unconditional branch Unconditional branch uses relative addressing which is the right choice in case of position independent code. This is a fix for the bug: https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445 Differential revision: https://reviews.llvm.org/D27483 llvm-svn: 289448	2016-12-12 17:40:26 +00:00
Nicolai Haehnle	f45ea4bbc5	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447	2016-12-12 16:52:19 +00:00
Simon Pilgrim	4cbe1834e4	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	5ebd2b542b	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	369cd349b9	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Craig Topper	36ecce9bed	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424	2016-12-12 07:57:24 +00:00
Craig Topper	f2c6f7abf3	[X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423	2016-12-12 07:57:21 +00:00
Craig Topper	081c0e2864	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Simon Pilgrim	831435cb14	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Ayman Musa	7ec4ed55d3	[X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404	2016-12-11 20:11:17 +00:00
Oren Ben Simhon	9683ecbff6	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Craig Topper	e7166ce237	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Craig Topper	1f1b441267	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350	2016-12-11 01:26:44 +00:00
Dylan McKay	139c0c7c37	[AVR] Fix a signed vs unsigned compiler warning llvm-svn: 289349	2016-12-11 00:24:13 +00:00
Dylan McKay	658bb0964a	[AVR] Remove incorrect comment This should've been removed in r289323. llvm-svn: 289346	2016-12-10 23:50:30 +00:00
Craig Topper	edab02b50b	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344	2016-12-10 23:09:43 +00:00
Simon Pilgrim	a03e350e69	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Craig Topper	abe7c5b5e9	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics. llvm-svn: 289340	2016-12-10 21:15:52 +00:00
Matt Arsenault	fbc728853f	AMDGPU: Fix asan errors when folding operands This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337	2016-12-10 19:58:00 +00:00
Simon Pilgrim	fb58550d73	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	18b57da491	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	8e288e0b68	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	85f0e57c33	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Simon Atanasyan	edd7a7bb40	[mips] Eliminate else-after-return. NFC llvm-svn: 289331	2016-12-10 17:30:09 +00:00
Dylan McKay	41258cf07d	[AVR] Add a stub README file llvm-svn: 289326	2016-12-10 12:08:19 +00:00
Dylan McKay	d8a603c23b	[AVR] Fix and clean up the inline assembly tests There was a bug where we would hit an assertion if 'Q' was used as a constraint. I also removed hardcoded register names to prefer regexes so the tests don't break when the register allocator changes. llvm-svn: 289325	2016-12-10 11:49:07 +00:00
Dylan McKay	801a4bd4ed	[AVR] Fix an inline asm assertion which would always trigger It looks like some time in the past, constraint codes were changed from chars being passed around to enums. llvm-svn: 289323	2016-12-10 11:18:37 +00:00
Dylan McKay	5c90b8cb4f	[AVR] Use the register scavenger when expanding 'LDDW' instructions Summary: This gets rid of the hardcoded 'r0' that was used previously. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27567 llvm-svn: 289322	2016-12-10 10:51:55 +00:00
Dylan McKay	5d0233bea2	[AVR] Support stores to undefined pointers This would previously trigger an assertion error in AVRISelDAGToDAG. llvm-svn: 289321	2016-12-10 10:16:13 +00:00
Craig Topper	a39b650d72	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316	2016-12-10 06:02:48 +00:00
Dylan McKay	f368509543	[AVR] Fix a bunch of incorrect assertion messages These should've been checking whether the immediate is a 6-bit unsigned integer. If the immediate was '63', this would cause an assertion error which shouldn't have occurred. llvm-svn: 289315	2016-12-10 05:48:48 +00:00
Matt Arsenault	2402b95db0	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307	2016-12-10 00:52:50 +00:00
Matt Arsenault	4bd7236193	AMDGPU: Fix handling of 16-bit immediates Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306	2016-12-10 00:39:12 +00:00
Matt Arsenault	f0c862594b	AMDGPU: Fix vintrp disassembly llvm-svn: 289292	2016-12-10 00:29:55 +00:00
Matt Arsenault	618b330dd0	AMDGPU: Change vintrp printing to better match sc Some of the immediates need to be printed differently eventually. llvm-svn: 289291	2016-12-10 00:23:12 +00:00
Eugene Zelenko	2bc2f33ba2	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289282	2016-12-09 22:06:55 +00:00
Marek Olsak	23ae31cca0	AMDGPU/SI: Remove XNACK feature from CI Summary: CI doesn't have XNACK. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27175 llvm-svn: 289263	2016-12-09 19:49:58 +00:00
Marek Olsak	0f55fbae6c	AMDGPU/SI: Don't reserve XNACK when it's disabled Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262	2016-12-09 19:49:54 +00:00
Marek Olsak	693e9be918	AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261	2016-12-09 19:49:48 +00:00
Marek Olsak	91f22fbf4f	AMDGPU/SI: Allow using SGPRs 96-101 on VI Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260	2016-12-09 19:49:40 +00:00
Matt Arsenault	7b00cf4706	AMDGPU: Fix isTypeDesirableForOp for i16 This should do nothing for targets without i16. llvm-svn: 289235	2016-12-09 17:57:43 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Matt Arsenault	38d8ed2b75	AMDGPU: Fix i128 mul llvm-svn: 289231	2016-12-09 17:49:14 +00:00
Matt Arsenault	52facf0195	AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions Fixes assembler regressions. llvm-svn: 289230	2016-12-09 17:49:11 +00:00
Matt Arsenault	eb4a55e066	AMDGPU: Clean up instruction bits Sort the instruction bits by type and make sure there is one for each format. Also cleanup namespaces. llvm-svn: 289229	2016-12-09 17:49:08 +00:00
Sean Fertile	1c4109b4c2	[PPC] Add intrinsics for vector extract word and vector insert word. Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227	2016-12-09 17:21:42 +00:00
Nirav Dave	bedb5d906c	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	fd51ff4fd8	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Tom Stellard	2a48433fcf	AMDGPU/SI: Don't mark VINTRP instructions as mayLoad Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219	2016-12-09 15:57:15 +00:00
Craig Topper	38b1b5d44f	[X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match. sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case. There is a test case that does depend on this behavior. llvm-svn: 289193	2016-12-09 07:57:21 +00:00
Dylan McKay	18ae0f68f8	[AVR] Use a more appropriate integer type for wide IN/OUT instructions We could previously select an integer which would hit an assertion error in pseudo expansion. The new type will also generate the appropriate fixups if needed, which wasn't done beforehand. llvm-svn: 289192	2016-12-09 07:49:14 +00:00
Dylan McKay	a5d49dfbb3	[AVR] Add tests for a large number of pseudo instructions This adds MIR tests for 24 pseudo instructions. llvm-svn: 289191	2016-12-09 07:49:04 +00:00
Craig Topper	a55b483bb5	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Matt Arsenault	27c062932a	AMDGPU: Select i16 instructions to VOP3 forms These were selecting directly to the VOP2 form instead of VOP3 like the i32 instructions. Fixes regressions in future commits where an immediate isn't folded because it was initially used for the second operand. Because uniform 16-bit operations are promoted to i32, it's difficult to get a simple testcase where this matters. Fold failures in SIFoldOperands here tend to be hidden by commute and fold in SIShrinkInstructions. llvm-svn: 289189	2016-12-09 06:19:12 +00:00
Craig Topper	c4f2b0996d	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Craig Topper	2aeb456425	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Krzysztof Parzyszek	77a45576ef	[RDF] Fix incorrect lane mask calculation This was exposed by some code that used more than one level of sub- registers. There is no testcase, because there is no such code in the Hexagon backend. llvm-svn: 289099	2016-12-08 20:33:45 +00:00
Matt Arsenault	e96d03745d	AMDGPU: Make f16 ConstantFP legal Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096	2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin	73b54f4134	[AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use Differential Revision: https://reviews.llvm.org/D27225 llvm-svn: 289095	2016-12-08 20:07:23 +00:00
Matt Arsenault	6c06a6f48a	AMDGPU: Fix commuting v_sub_u16 The correct commutable opcode was set to itself, so this was simply swapping the operands to commute instead of also changing the opcode to v_subrev_u16. llvm-svn: 289093	2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin	50ea93a2bd	[AMDGPU] Add amdgpu-unify-metadata pass Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092	2016-12-08 19:46:04 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Alexander Timofeev	18009560c5	[AMDGPU] Scalarization of global uniform loads. Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076	2016-12-08 17:28:47 +00:00
NAKAMURA Takumi	9ccd966612	LanaiInstPrinter: Prune unused libdeps. llvm-svn: 289054	2016-12-08 14:26:30 +00:00
Nicolai Haehnle	2857dc3893	AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048	2016-12-08 14:08:02 +00:00
Dylan McKay	fac9ce5413	[AVR] Add an assertion to ensure we don't emit LPM when it's unsupported llvm-svn: 289030	2016-12-08 08:34:13 +00:00
Matthias Braun	0c989a893b	LivePhysReg: Use reference instead of pointer in init(); NFC llvm-svn: 289002	2016-12-08 00:15:51 +00:00
Tim Northover	05cc4859ad	GlobalISel: simplify MachineIRBuilder interface. MachineIRBuilder had weird before/after and beginning/end flags for the insert point. Unfortunately the non-default means that instructions will be inserted in reverse order which is almost never what anyone wants. Really, I think we just want (like IRBuilder has) the ability to insert at any C++ iterator-style point (i.e. before any instruction or before MBB.end()). So this fixes MIRBuilders to behave like IRBuilders in this respect. llvm-svn: 288980	2016-12-07 21:05:38 +00:00
Michael Kuperstein	5842b20633	[X86] Skip over DEBUG_VALUE while looking for start of call sequence If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g code. This fixes PR31242. Differential Revision: https://reviews.llvm.org/D27485 llvm-svn: 288965	2016-12-07 19:31:08 +00:00
Michael Kuperstein	18092cf2c3	[X86] Do not assume "ri" instructions always have an immediate operand The second operand of an "ri" instruction may be an immediate, but it may also be a globalvariable, so we should make any assumptions. This fixes PR31271. Differential Revision: https://reviews.llvm.org/D27481 llvm-svn: 288964	2016-12-07 19:29:18 +00:00
Simon Pilgrim	c3c6463ce0	[X86][SSE] Remove AND -> VZEXT combine This is now performed more generally by the target shuffle combine code. Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....). Differential Revision: https://reviews.llvm.org/D27510 llvm-svn: 288918	2016-12-07 17:02:41 +00:00
Dylan McKay	99b756eb40	[AVR] Expand 'SELECT_CC' nodes whereever possible llvm-svn: 288905	2016-12-07 12:34:47 +00:00
Simon Pilgrim	8893bd95f0	[X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain. Differential Revision: https://reviews.llvm.org/D27419 llvm-svn: 288902	2016-12-07 12:10:49 +00:00
Simon Pilgrim	d5bc5c16b2	[X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296) The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version. Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries. llvm-svn: 288898	2016-12-07 11:19:00 +00:00
Dylan McKay	8cec7eb6dd	[AVR] Allow loading from stack slots where src and dest registers are identical Fixes PR 31256 llvm-svn: 288897	2016-12-07 11:08:56 +00:00
Tom Stellard	8485fa096e	AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879	2016-12-07 02:42:15 +00:00
Haicheng Wu	f8b834049a	[AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode() In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511]. Differential Revision: https://reviews.llvm.org/D27480 llvm-svn: 288876	2016-12-07 01:45:04 +00:00
Tom Stellard	2187bb8a89	AMDGPU: Add llvm.amdgcn.interp.mov intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26725 llvm-svn: 288865	2016-12-06 23:52:13 +00:00
Matt Arsenault	269ffdac4e	AMDGPU: Fix crash on i16 constant expression llvm-svn: 288861	2016-12-06 23:18:06 +00:00
Matt Arsenault	ac066f354a	AMDGPU: Fix operand name for v_interp_* Other VOP instructions call the output vdst llvm-svn: 288856	2016-12-06 22:29:43 +00:00
Tom Stellard	175959e350	AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27416 llvm-svn: 288852	2016-12-06 21:53:10 +00:00
Tom Stellard	00cfa74715	AMDGPU/SI: Don't move copies of immediates to the VALU Summary: If we write an immediate to a VGPR and then copy the VGPR to an SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than moving the copy to the SALU. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27272 llvm-svn: 288849	2016-12-06 21:13:30 +00:00
Zvi Rackover	8bc7e4da51	[X86] Prefer reduced width multiplication over pmulld on Silvermont Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844	2016-12-06 19:35:20 +00:00
Tim Northover	c1a23854f3	GlobalISel: handle G_SEQUENCE fallbacks gracefully. There were two problems: + AArch64 was reusing random data from its binary op tables, which is complete nonsense for G_SEQUENCE. + Even when AArch64 gave up and said it couldn't handle G_SEQUENCE, the generic code asserted. llvm-svn: 288836	2016-12-06 18:38:38 +00:00
Daniel Sanders	4fd1e7c628	[globalisel][aarch64] Fix unintended assumptions about PartialMappingIdx. NFC. Summary: This is NFC but prevents assertions when PartialMappingIdx is tablegen-erated. The assumptions were: 1) FirstGPR is 0 2) FirstGPR is the first of the First* enumerators. GPR32 is changed to 1 to demonstrate that assumption #1 is fixed. #2 will be covered by a subsequent patch that tablegen-erates information and swaps the order of GPR and FPR as a side effect. Depends on D27336 Reviewers: ab, t.p.northover, qcolombet Subscribers: aemerson, rengolin, vkalintiris, dberris, rovka, llvm-commits Differential Revision: https://reviews.llvm.org/D27337 llvm-svn: 288812	2016-12-06 14:39:57 +00:00
Daniel Sanders	21765cb15e	[globalisel][aarch64] Replace magic numbers with corresponding enumerators in ValMappings. NFC Reviewers: ab, t.p.northover, qcolombet Subscribers: aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27336 llvm-svn: 288810	2016-12-06 13:55:01 +00:00
Daniel Sanders	605f8cd30d	[globalisel][aarch64] Correct argument names in comments. llvm-svn: 288809	2016-12-06 13:48:58 +00:00
Oliver Stannard	870b5cad45	[ARM] Better error message for invalid flag-preserving Thumb1 insts When we see a non flag-setting instruction for which only the flag-setting version is available in Thumb1, we should give a better error message than "invalid instruction". Differential Revision: https://reviews.llvm.org/D27414 llvm-svn: 288805	2016-12-06 12:59:08 +00:00
Ayman Musa	86c00b799f	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804	2016-12-06 12:24:14 +00:00
Nemanja Ivanovic	15748f4921	[PowerPC] Improvements for BUILD_VECTOR Vol. 4 This is the final patch in the series of patches that improves BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations to remove redundant instructions. It also adds a large test case which encompasses a large set of code patterns that build vectors - this test case was the motivator for this series of patches. Differential Revision: https://reviews.llvm.org/D26066 llvm-svn: 288800	2016-12-06 11:47:14 +00:00
Daniel Sanders	bfd5ff155a	[globalisel][aarch64] Prefix PartialMappingIdx enumerators with 'PMI_' to fit coding standards. This also stops things like 'None' polluting the llvm::AArch64 namespace. llvm-svn: 288799	2016-12-06 11:33:04 +00:00
Florian Hahn	7582c669bd	[framelowering] Improve tracking of first CS pop instruction. Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated. Reviewers: mkuper, aprantl, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27343 llvm-svn: 288794	2016-12-06 10:24:55 +00:00
Craig Topper	b34eef7b41	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern. This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784	2016-12-06 08:08:12 +00:00
Craig Topper	683470bf1b	[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781	2016-12-06 08:08:04 +00:00
Craig Topper	5fc7bc91f9	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779	2016-12-06 08:07:58 +00:00
Craig Topper	6413f8a8f2	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Chris Bieneman	1b5f563a61	[CMake] Cleanup TableGen include flags It is kinda crazy to have llvm/include and llvm/lib/Target in the include path for every tablegen invocation for every tablegen-like tool. This patch removes those flags from the tablgen function that is called everywhere by instead creating a variable LLVM_TABLEGEN_FLAGS which is setup in the LLVM source directories. This removes TableGen.cmake's dependency on LLVM_MAIN_SRC_DIR, and LLVM_MAIN_INCLUDE_DIR. llvm-svn: 288770	2016-12-06 04:45:11 +00:00
Matt Arsenault	ad55ee5869	AMDGPU: Don't required structured CFG The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744	2016-12-06 01:02:51 +00:00
Matt Arsenault	26faed3960	AMDGPU: Consolidate inline immediate predicate functions llvm-svn: 288718	2016-12-05 22:26:17 +00:00
Matt Arsenault	c7f28a5d95	AMDGPU: Minor assembler refactoring Fix return before else, check types for selecting fltSemantics, refactor immediate checks. llvm-svn: 288715	2016-12-05 22:07:21 +00:00
Tim Northover	9267ac5d47	GlobalISel: make G_CONSTANT take a ConstantInt rather than int64_t. This makes it more similar to the floating-point constant, and also allows for larger constants to be translated later. There's no real functional change in this patch though, just syntax updates. llvm-svn: 288712	2016-12-05 21:47:07 +00:00
Tim Northover	d1fd383b28	GlobalISel: handle 1-element aggregates during ABI lowering. llvm-svn: 288706	2016-12-05 21:25:33 +00:00
Michael Kuperstein	e3036abcf9	[X86] Fix non-intrinsic roundss/roundsd to not read the destination register This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703	2016-12-05 20:57:37 +00:00
Matt Arsenault	bf6bdac1ad	AMDGPU: Assembler support for exp compr is not currently parsed (or printed) correctly, but that should probably be fixed along with intrinsic changes. llvm-svn: 288698	2016-12-05 20:42:41 +00:00
Matt Arsenault	8a63cb9044	AMDGPU: Change how exp is printed This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697	2016-12-05 20:31:49 +00:00
Matt Arsenault	7bee6ac798	AMDGPU: Refactor exp instructions Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695	2016-12-05 20:23:10 +00:00
Quentin Colombet	0e6cccfb53	[AArch64][RegisterBankInfo] Fix typo in the logic used in assert. Thanks to David Binderman <dcb314@hotmail.com> for bringing it to my attention. llvm-svn: 288688	2016-12-05 19:02:37 +00:00
Sanjay Patel	f807f6a05f	[x86] fold fand (fxor X, -1) Y --> fandn X, Y I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675	2016-12-05 15:45:27 +00:00
Simon Pilgrim	5e922eb0a3	Use range based for loop. NFCI. llvm-svn: 288671	2016-12-05 14:25:04 +00:00
Nirav Dave	d6642c1163	[PPC] Slightly Improve Assembly Parsing errors and add EOL comment parsing tests. NFC intended. llvm-svn: 288667	2016-12-05 14:11:03 +00:00
Simon Dardis	8fe36cd77c	[mips][ias] N32/N64 must not sort the relocation table. Doing so changes the evaluation order for relocation composition. Patch By: Daniel Sanders Reviewers: vkalintiris, atanasyan Differential Revision: https://reviews.llvm.org/D26401 llvm-svn: 288666	2016-12-05 12:55:19 +00:00
Simon Pilgrim	b08c98f125	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH. llvm-svn: 288663	2016-12-05 11:25:13 +00:00
Simon Pilgrim	20b1409f35	[X86][SSE] Add helper function to create UNPCKL/UNPCKH shuffle masks. NFCI. llvm-svn: 288659	2016-12-05 11:00:25 +00:00
Diana Picus	f11f042ecb	[GlobalISel] Extract handleAssignments out of AArch64CallLowering This function seems target-independent so far: all the target-specific behaviour is isolated in the CCAssignFn and the ValueHandler (which we're also extracting into the generic CallLowering). The intention is to use this in the ARM backend. Differential Revision: https://reviews.llvm.org/D27045 llvm-svn: 288658	2016-12-05 10:40:33 +00:00
Sam Kolton	83102d99ce	[AMDGPU] Disassembler: fix s_buffer_store_dword instructions Summary: s_buffer_store_dword instructions sdata operand was called sdst in encoding. This caused disassembler to fail. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, nhaehnle, rampitec Differential Revision: https://reviews.llvm.org/D27100 llvm-svn: 288657	2016-12-05 09:58:51 +00:00
Craig Topper	088ba17f88	[X86] Remove unnecessary explicit uses of .SimpleTy just to do an equality comparison. MVT's operator== already takes care of this. NFCI llvm-svn: 288646	2016-12-05 06:09:55 +00:00
Craig Topper	db8467ae26	[AVX-512] Teach fast isel to handle 512-bit vector bitcasts. llvm-svn: 288641	2016-12-05 05:50:51 +00:00
Colin LeMahieu	5d19862b22	[Hexagon] Adding additional tokenization characters in preparation for removing spacing from syntax. llvm-svn: 288637	2016-12-05 04:52:28 +00:00
Craig Topper	7ef6ea324a	[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel. llvm-svn: 288636	2016-12-05 04:51:31 +00:00
Colin LeMahieu	8170754919	[Hexagon] Changing from literal numeric value to argument since #-1 will not parse when '-' is converted to a token. llvm-svn: 288634	2016-12-05 04:29:00 +00:00
Dan Gohman	66caac5735	[WebAssembly] Eliminate an ad-hoc command-line argument. Use the target triple to determine whether to run the explicit-locals pass, rather than using a separate command-line argument. llvm-svn: 288602	2016-12-03 23:00:12 +00:00
Saleem Abdulrasool	9c89ba7fa7	AMDGPU: remove a couple of unused variables lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass SubRC = nullptr; ^ lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger) const': lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable] const TargetRegisterClass SubRC = nullptr; ^ The variable was assigned to, but never used. The functions called did not mutate state. Simplify the logic and remove the variable. Identified by gcc 5.4.0. llvm-svn: 288601	2016-12-03 22:25:21 +00:00
Craig Topper	9d16bfa0f5	[AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table. llvm-svn: 288591	2016-12-03 19:37:39 +00:00
Matt Arsenault	b55f620ebc	AMDGPU: Clean up struct initializers llvm-svn: 288590	2016-12-03 18:22:49 +00:00
Craig Topper	c210827b53	[AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables. llvm-svn: 288587	2016-12-03 17:19:15 +00:00
Craig Topper	8e7498976a	[X86] Fix VEX encoded VPMADDUBSW to not be marked commutable. This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241. llvm-svn: 288578	2016-12-03 05:35:44 +00:00
Matthias Braun	1fbb0f6dd9	AArch64CollectLOH: Rewrite as block-local analysis. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 288561	2016-12-03 00:52:56 +00:00
Guozhi Wei	835de1f3ab	[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560	2016-12-03 00:41:43 +00:00
Jacques Pienaar	3bec3ef6cd	[lanai] Custom lowering of SHL_PARTS Summary: Implement custom lowering of SHL_PARTS to enable lowering of left shift with larger than 32-bit shifts. Reviewers: eliben, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27232 llvm-svn: 288541	2016-12-02 22:01:28 +00:00
Dan Gohman	f295cc8fb5	[WebAssembly] Fix a compiler warning. NFC. Fix a warning about a comparison between signed and unsigned integer expressions. llvm-svn: 288532	2016-12-02 20:13:05 +00:00
Ulrich Weigand	612d24badf	[SystemZ] Support remaining atomic instructions Add assembler support for all atomic instructions that weren't already supported. Some of those could be used to implement codegen for 128-bit atomic operations, but this isn't done here yet. llvm-svn: 288526	2016-12-02 18:24:16 +00:00
Ulrich Weigand	1c5a5c42de	[SystemZ] Support floating-point control register instructions Add assembler support for instructions manipulating the FPC. Also add codegen support via the GCC compatibility builtins: __builtin_s390_sfpc __builtin_s390_efpc llvm-svn: 288525	2016-12-02 18:21:53 +00:00
Ulrich Weigand	da951d3bdc	[SystemZ] Refactor hasSideEffects setting Move setting of hasSideEffects out of SystemZInstrFormats.td, to allow use of the format classes for instructions where this flag shouldn't be set. NFC. llvm-svn: 288524	2016-12-02 18:19:22 +00:00
Matt Arsenault	d4da0edd98	AMDGPU: Implement isCheapAddrSpaceCast llvm-svn: 288523	2016-12-02 18:12:53 +00:00
Simon Pilgrim	9cb74267ac	Tidyup code with indentation and clang-format. NFCI. llvm-svn: 288505	2016-12-02 15:44:30 +00:00
Daniel Cederman	ef62c59dd6	[Sparc] Fix parsing of double-precision %f18, %f20, and %f22 Summary: They are currently being parsed as %f14, %f16, and %f18. Reviewers: venkatra, jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27342 llvm-svn: 288503	2016-12-02 15:05:26 +00:00
Simon Pilgrim	cbf5f97018	[X86][SSE] Add support for extracting constant bit data from broadcasted constants llvm-svn: 288499	2016-12-02 13:16:08 +00:00
Simon Pilgrim	b3ae416839	[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI. getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well. Converted Constant bit extraction and Bitset splitting to helper lambda functions. llvm-svn: 288496	2016-12-02 11:58:05 +00:00
Craig Topper	4961fa9bba	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables. llvm-svn: 288484	2016-12-02 07:57:11 +00:00
Craig Topper	17ddb521ef	[AVX-512] Add EVEX PSHUFB instructions to load folding tables. llvm-svn: 288482	2016-12-02 07:06:30 +00:00
Craig Topper	f7866fad54	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables. llvm-svn: 288481	2016-12-02 06:24:38 +00:00
Peter Collingbourne	4568158c4d	IR: Change PointerType to derive from Type rather than SequentialType. As proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html This is for a couple of reasons: - Values of type PointerType are unlike the other SequentialTypes (arrays and vectors) in that they do not hold values of the element type. By moving PointerType we can unify certain aspects of how the other SequentialTypes are handled. - PointerType will have no place in the SequentialType hierarchy once pointee types are removed, so this is a necessary step towards removing pointee types. Differential Revision: https://reviews.llvm.org/D26595 llvm-svn: 288462	2016-12-02 03:05:41 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Matt Arsenault	c47701c0e9	AMDGPU: Use wider scalar spills for SGPR spilling Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445	2016-12-02 00:54:45 +00:00
Geoff Berry	7ffce7be0c	[AArch64] Fold more spilled/refilled COPYs. Summary: Make AArch64InstrInfo::foldMemoryOperandImpl more general by folding all full COPYs between register classes of the same size that are either spilled or refilled. Reviewers: MatzeB, qcolombet Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27271 llvm-svn: 288439	2016-12-01 23:43:55 +00:00
Dan Gohman	734c59d501	[MC] Refactor emitELFSize to make usage more consistent. NFC. Move the cast<MCSymbolELF> inside emitELFSize, so that: - it's done in one place instead of at each call - it's more consistent with similar functions like EmitCOFFSafeSEH - ambiguity between cast<> and dyn_cast<> is avoided (which also eliminates an unnecessary dyn_cast call) This also makes it easier to experiment with using ".size" directives on non-ELF targets. llvm-svn: 288437	2016-12-01 23:39:08 +00:00
Oleg Ranevskyy	e2ae41519f	[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0 Summary: This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion. Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality. Example of the broken C++ code: ``` std::atomic<long long> at(2); long long ll = 1; std::atomic_compare_exchange_strong(&at, &ll, 3); ``` Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3. The patch replaces SBC with CMPEQ. Reviewers: t.p.northover Subscribers: aemerson, rengolin, llvm-commits, asl Differential Revision: https://reviews.llvm.org/D27315 llvm-svn: 288433	2016-12-01 22:58:35 +00:00
Tim Northover	5bb87b6769	AArch64: fix 128-bit cmpxchg at -O0 (again, again). This time the issue is fortunately just a simple mistake rather than a horrible design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for comparing two 64-bit values, but they don't. The fix is slightly clunkier in AArch64 because we can't use conditional execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway. Thanks to Anton Korobeynikov for pointing out the issue. llvm-svn: 288418	2016-12-01 21:31:59 +00:00
Benjamin Kramer	215b22e612	Fix unused variable warning in Release builds. NFC. llvm-svn: 288416	2016-12-01 20:49:34 +00:00
David L Kreitzer	0e3ae305b6	Refactored X86InterleavedAccess into a class. NFCI. Patch by Farhana Aleen Differential Revision: https://reviews.llvm.org/D25986 llvm-svn: 288410	2016-12-01 19:56:39 +00:00
Matthias Braun	d0ee66c2e9	Move most EH from MachineModuleInfo to MachineFunction Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405	2016-12-01 19:32:15 +00:00
Simon Pilgrim	17d5b6b493	[X86][SSE] Moved shuffle mask widening/narrowing helper functions earlier in the file. Will be necessary for a future patch. llvm-svn: 288395	2016-12-01 18:27:19 +00:00
Ulrich Weigand	d36b31d03f	[SystemZ] Fix fallout from r288374 Avoid undefined behavior due to too-large shift count. llvm-svn: 288391	2016-12-01 18:00:50 +00:00
Ulrich Weigand	55082cddef	[SystemZ] Fix applyFixup for 12-bit fixups Now that we have fixups that only fill parts of a byte, it turns out we have to mask off the bits outside the fixup area when applying them. Failing to do so caused invalid object code to be emitted for bprp with a negative 12-bit displacement. llvm-svn: 288374	2016-12-01 17:10:27 +00:00
Simon Pilgrim	5fe6236035	[X86][SSE] Classify AND bitmasks as variable shuffle masks They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask. llvm-svn: 288367	2016-12-01 16:00:14 +00:00
Simon Pilgrim	1e4d870999	[X86][SSE] Add support for combining AND bitmasks to shuffles. llvm-svn: 288365	2016-12-01 15:41:40 +00:00
Asaf Badouh	7f6968ed0a	[LMT] Restrict nop length to one not all lakemont MCU support long nop. we can't assume we can generate long nop by default for MCU. Differential Revision: https://reviews.llvm.org/D26895 llvm-svn: 288363	2016-12-01 15:19:10 +00:00
Daniel Jasper	19b9284f1d	Silence GCC's -Wenum-compare after r288335 in the same way it is done in X86FastISel.cpp. llvm-svn: 288337	2016-12-01 14:33:50 +00:00
Simon Pilgrim	55066e5622	[X86][SSE] Add support for combining target shuffles to AND bitmasks. llvm-svn: 288335	2016-12-01 13:47:02 +00:00
Simon Pilgrim	947650e99d	[X86][SSE] Add support for combining ISD::AND with shuffles. Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask. llvm-svn: 288333	2016-12-01 11:52:37 +00:00
Eric Christopher	e70b7c3dfb	Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction" This apprears to have broken the global isel bot: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console This reverts commit r288293. llvm-svn: 288322	2016-12-01 07:50:12 +00:00
Derek Schuff	7747d703e3	[WebAssembly] Emit .import_global assembler directives Support a new assembler directive, .import_global, to declare imported global variables (i.e. those with external linkage and no initializer). The linker turns these into wasm imports. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D26875 llvm-svn: 288296	2016-12-01 00:11:15 +00:00
Matthias Braun	ed14cb0604	Move most EH from MachineModuleInfo to MachineFunction Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288293	2016-11-30 23:49:01 +00:00
Matthias Braun	f23ef437cc	Move FrameInstructions from MachineModuleInfo to MachineFunction This is per function data so it is better kept at the function instead of the module. This is a necessary step to have machine module passes work properly. Differential Revision: https://reviews.llvm.org/D27185 llvm-svn: 288291	2016-11-30 23:48:42 +00:00
Paul Robinson	78a695321e	[PS4] Tighten up a triple check. llvm-svn: 288286	2016-11-30 23:14:27 +00:00
Joel Jones	75818bc8f7	[AArch64] Refactor LSE support as feature separate from V8.1a support. Summary: This is preparation for ThunderX processors that have Large System Extension (LSE) atomic instructions, but not the other instructions introduced by V8.1a. This will mimic changes to GCC as described here: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00388.html LSE instructions are: LD/ST<op>, CAS*, SWP Reviewers: t.p.northover, echristo, jmolloy, rengolin Subscribers: aemerson, mehdi_amini Differential Revision: https://reviews.llvm.org/D26621 llvm-svn: 288279	2016-11-30 22:25:24 +00:00
Matthias Braun	c52fe2961c	Clarify rules for reserved regs, fix aarch64 ones. No test case necessary as the problematic condition is checked with the newly introduced assertAllSuperRegsMarked() function. Differential Revision: https://reviews.llvm.org/D26648 llvm-svn: 288277	2016-11-30 22:17:10 +00:00
Silviu Baranga	aab65b155e	[AArch64] Fix useful bits detection for BFM instructions Summary: When computing useful bits for a BFM instruction, we need to take into consideration the case where both operands of the BFM are equal and provide data that we need to track. Not doing this can cause us to miss useful bits. Fixes PR31138 (https://llvm.org/bugs/show_bug.cgi?id=31138) Reviewers: t.p.northover, jmolloy Subscribers: evandro, gberry, srhines, pirama, mcrosier, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27130 llvm-svn: 288253	2016-11-30 17:04:22 +00:00
Simon Pilgrim	288c088c17	[X86][SSE] Add support for target shuffle constant folding Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future. I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction. Differential Revision: https://reviews.llvm.org/D27220 llvm-svn: 288250	2016-11-30 16:33:46 +00:00
Krzysztof Parzyszek	31095d2ff5	[PowerPC] Preserve machine dominator tree in PPCVSXFMAMutate It is needed by LiveIntervalAnalysis. llvm-svn: 288243	2016-11-30 13:31:09 +00:00
Nemanja Ivanovic	f9b191f135	[PowerPC] Improvements for BUILD_VECTOR Vol. 2 This patch corresponds to review: https://reviews.llvm.org/D26023 This patch adds support for converting a vector of loads into a single load if the loads are consecutive (in either direction). llvm-svn: 288219	2016-11-29 23:57:54 +00:00
Nemanja Ivanovic	8c11e79b17	[PowerPC] Improvements for BUILD_VECTOR Vol. 2 This patch corresponds to review: https://reviews.llvm.org/D25980 This is the 2nd patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector of fp-to-int conversions into an fp-to-int conversion of a build vector of fp values. For example: Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...) Into (fp_to_[su]i (build_vector $A, $B, ...))). Which is a natural match for much cleaner code. llvm-svn: 288218	2016-11-29 23:36:03 +00:00
Jacques Pienaar	fc13bdd2db	[lanai] Manually match 0/-1 with R0/R1. Summary: Previously 0 and -1 was matched via tablegen rules. But this could cause problems where a physical register was being used where a virtual register was expected (seen in optimizeSelect and TwoAddressInstructionPass). Instead follow AArch64 and match in DAGToDAGISel. Reviewers: eliben, majnemer Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D27171 llvm-svn: 288215	2016-11-29 23:01:09 +00:00
Nemanja Ivanovic	f57f150b1b	Revert https://reviews.llvm.org/rL287679 This commit caused some miscompiles that did not show up on any of the bots. Reverting until we can investigate the cause of those failures. llvm-svn: 288214	2016-11-29 23:00:33 +00:00
Sanjay Patel	47f7f30df9	[AArch64] allow and-not-compare transform to form 'bics' This target hook was added with D19087: https://reviews.llvm.org/D19087 Differential Revision: https://reviews.llvm.org/D27221 llvm-svn: 288206	2016-11-29 22:28:58 +00:00
Chad Rosier	d34c26eb08	[AArch64] Add a basic SchedMachineModel for Falkor. Differential Revision: https://reviews.llvm.org/D26972 llvm-svn: 288194	2016-11-29 20:00:27 +00:00
Matt Arsenault	640c44b893	AMDGPU: Disallow exec as SMEM instruction operand This is not in the list of valid inputs for the encoding. When spilling, copies from exec can be folded directly into the spill instruction which results in broken stores. This only fixes the operand constraints, more codegen work is required to avoid emitting the invalid spills. This sort of breaks the dbg.value test. Because the register class of the s_load_dwordx2 changes, there is a copy to SReg_64, and the copy is the operand of dbg_value. The copy is later dead, and removed from the dbg_value. llvm-svn: 288191	2016-11-29 19:39:53 +00:00
Matt Arsenault	cdad316cc2	AMDGPU: Use SGPR_64 for argument lowerings llvm-svn: 288190	2016-11-29 19:39:48 +00:00
Matt Arsenault	97279a8ca3	AMDGPU: Rename flat operands to match mubuf Use vaddr/vdst for the same purposes. This also fixes a beg in SIInsertWaits for the operand check. The stored value operand is currently called data0 in the single offset case, not data. llvm-svn: 288188	2016-11-29 19:30:44 +00:00
Matt Arsenault	437fd71f5b	AMDGPU: Use else if llvm-svn: 288187	2016-11-29 19:30:41 +00:00
Matt Arsenault	f96eeec005	AMDGPU: Materialize frame index before add It isn't generally safe to fold the frame index directly into the operand since it will possibly not be an inline immediate after it is expanded. This surprisingly seems to produce better code, since the FI doesn't prevent folding other immediate operands. llvm-svn: 288185	2016-11-29 19:20:48 +00:00
Matt Arsenault	ff8bb49bf4	AMDGPU: Refactor immediate folding logic Change the logic for when to fold immediates to consider the destination operand rather than the source of the materializing mov instruction. No change yet, but this will allow for correctly handling i16/f16 operands. Since 32-bit moves are used to materialize constants for these, the same bitvalue will not be in the register. llvm-svn: 288184	2016-11-29 19:20:42 +00:00
Geoff Berry	7c078fc035	[AArch64] Fold spills of COPY of WZR/XZR Summary: In AArch64InstrInfo::foldMemoryOperandImpl, catch more cases where the COPY being spilled is copying from WZR/XZR, but the source register is not in the COPY destination register's regclass. For example, when spilling: %vreg0 = COPY %XZR ; %vreg0:GPR64common without this change, the code in TargetInstrInfo::foldMemoryOperand() and canFoldCopy() that normally handles cases like this would fail to optimize since %XZR is not in GPR64common. So the spill code generated would be: %vreg0 = COPY %XZR STR %vreg instead of the new code generated: STR %XZR Reviewers: qcolombet, MatzeB Subscribers: mcrosier, aemerson, t.p.northover, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26976 llvm-svn: 288176	2016-11-29 18:28:32 +00:00
Simon Pilgrim	edccc1254b	Avoid repeated calls to MVT getSizeInBits and getScalarSizeInBits(). NFCI. llvm-svn: 288170	2016-11-29 17:57:48 +00:00
Nemanja Ivanovic	df1cb520df	[PowerPC] Improvements for BUILD_VECTOR Vol. 1 This patch corresponds to review: https://reviews.llvm.org/D25912 This is the first patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. llvm-svn: 288152	2016-11-29 16:11:34 +00:00
Simon Pilgrim	001368abc8	[X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI. llvm-svn: 288147	2016-11-29 15:32:58 +00:00
Simon Pilgrim	35c47c494d	[X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX. We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output. llvm-svn: 288140	2016-11-29 14:18:51 +00:00
Simon Pilgrim	923020a652	Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI. llvm-svn: 288138	2016-11-29 13:43:08 +00:00
Tom Stellard	0bc688116c	AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar branches Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23417 llvm-svn: 288095	2016-11-29 00:46:46 +00:00
Matthias Braun	115efcd3d1	MachineScheduler: Export function to construct "default" scheduler. This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load\|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057	2016-11-28 20:11:54 +00:00
Stanislav Mekhanoshin	0ee250eee8	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053	2016-11-28 18:58:49 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Sanjay Patel	100bc01a72	[x86] fix formatting; NFC llvm-svn: 288045	2016-11-28 17:39:21 +00:00
Simon Pilgrim	3f10e66981	[X86][SSE] Added support for combining bit-shifts with shuffles. Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040	2016-11-28 16:25:01 +00:00
Daniel Cederman	59168e28e0	Test commit llvm-svn: 288036	2016-11-28 15:33:03 +00:00
Ulrich Weigand	a29bf16ed5	[SystemZ] Fix build bot fallout from r288030 Remove unused variable that came in due to a copy-and-paste bug and caused build bot failures. llvm-svn: 288033	2016-11-28 14:24:14 +00:00
Ulrich Weigand	84404f30b3	[SystemZ] Support execution hint instructions This adds assembler support for the instructions provided by the execution-hint facility (NIAI and BP(R)P). This required adding support for the new relocation types for 12-bit and 24-bit PC- relative offsets used by the BP(R)P instructions. llvm-svn: 288031	2016-11-28 14:01:51 +00:00
Ulrich Weigand	2d9e3d9d3b	[SystemZ] Support load-and-trap instructions This adds support for the instructions provided with the load-and-trap facility. llvm-svn: 288030	2016-11-28 13:59:22 +00:00
Ulrich Weigand	758399131a	[SystemZ] Add remaining branch instructions This patch adds assembler support for the remaining branch instructions: the non-relative branch on count variants, and all variants of branch on index. The only one of those that can be readily exploited for code generation is BRCTH (branch on count using a high 32-bit register as count). Do use it, however, it is necessary to also introduce a hew CHIMux pseudo to allow comparisons of a 32-bit value agains a short immediate to go into a high register as well (implemented via CHI/CIH). This causes a bit of codegen changes overall, but those have proven to be neutral (or even beneficial) in performance measurements. llvm-svn: 288029	2016-11-28 13:40:08 +00:00
Ulrich Weigand	524f276c74	[SystemZ] Improve use of conditional instructions This patch moves formation of LOC-type instructions from (late) IfConversion to the early if-conversion pass, and in some cases additionally creates them directly from select instructions during DAG instruction selection. To make early if-conversion work, the patch implements the canInsertSelect / insertSelect callbacks. It also implements the commuteInstructionImpl and FoldImmediate callbacks to enable generation of the full range of LOC instructions. Finally, the patch adds support for all instructions of the load-store-on-condition-2 facility, which allows using LOC instructions also for high registers. Due to the use of the GRX32 register class to enable high registers, we now also have to handle the cases where there are still no single hardware instructions (conditional move from a low register to a high register or vice versa). These are converted back to a branch sequence after register allocation. Since the expandRAPseudos callback is not allowed to create new basic blocks, this requires a simple new pass, modelled after the ARM/AArch64 ExpandPseudos pass. Overall, this patch causes significantly more LOC-type instructions to be used, and results in a measurable performance improvement. llvm-svn: 288028	2016-11-28 13:34:08 +00:00
Craig Topper	17786f77f0	[X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output. llvm-svn: 288011	2016-11-27 21:37:04 +00:00
Craig Topper	13b27a2748	[X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns. llvm-svn: 288010	2016-11-27 21:37:02 +00:00
Craig Topper	ff9d45875a	[X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions. llvm-svn: 288009	2016-11-27 21:37:00 +00:00
Craig Topper	3674f44e40	[X86] Add SHL by 1 to the load folding tables. I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007	2016-11-27 21:36:54 +00:00
Simon Pilgrim	91d6f5fbc1	[X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts llvm-svn: 288006	2016-11-27 21:08:19 +00:00
Craig Topper	4fab487265	[AVX-512] Add integer and fp unpck instructions to load folding tables. llvm-svn: 288004	2016-11-27 19:51:41 +00:00
Simon Pilgrim	cdb2ce661d	[X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI. Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit). llvm-svn: 288003	2016-11-27 19:28:39 +00:00
Craig Topper	7ad961cc70	[X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001	2016-11-27 18:51:13 +00:00
Craig Topper	c3b3926f8b	[AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287995	2016-11-27 08:55:31 +00:00
Craig Topper	fb64a25ba1	[X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991	2016-11-27 01:52:51 +00:00
Craig Topper	837ff25da1	[X86] Remove hasOneUse check that is redundant with the one in IsProfitableToFold. llvm-svn: 287987	2016-11-26 18:43:26 +00:00
Craig Topper	e266e126ff	[X86] Fix the zero extending load detection in X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold. Previously we were passing the SCALAR_TO_VECTOR node. llvm-svn: 287986	2016-11-26 18:43:24 +00:00
Craig Topper	d3ab1a3905	[X86] Simplify control flow. NFCI llvm-svn: 287985	2016-11-26 18:43:21 +00:00
Craig Topper	991d1ca3ba	[X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times. Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied. Reviewers: RKSimon, zvi, delena, spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D26790 llvm-svn: 287983	2016-11-26 17:29:25 +00:00
Craig Topper	10d5eec1a1	[AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables. llvm-svn: 287975	2016-11-26 08:21:52 +00:00
Craig Topper	97169ea5f9	[AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables. llvm-svn: 287974	2016-11-26 08:21:48 +00:00
Craig Topper	53b33de1e3	[AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables. llvm-svn: 287972	2016-11-26 07:21:00 +00:00
Craig Topper	6677bb4e50	[AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971	2016-11-26 07:20:57 +00:00
Craig Topper	39265bb1ce	[AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables. llvm-svn: 287970	2016-11-26 07:20:53 +00:00
Tom Stellard	1473f07ceb	AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26724 llvm-svn: 287962	2016-11-26 02:26:04 +00:00
Craig Topper	7f76c23781	[X86][XOP] Add a reversed reg/reg form for VPROT instructions. The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions. llvm-svn: 287961	2016-11-26 02:14:00 +00:00
Craig Topper	516fd7abfe	[X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960	2016-11-26 02:13:58 +00:00
Craig Topper	a363d42973	[AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956	2016-11-25 23:21:34 +00:00
Marek Olsak	79c05871a2	AMDGPU/SI: Add back reverted SGPR spilling code, but disable it suggested as a better solution by Matt llvm-svn: 287942	2016-11-25 17:37:09 +00:00
Simon Pilgrim	8e8ae7219f	Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI llvm-svn: 287940	2016-11-25 17:19:53 +00:00
Craig Topper	88071b37ab	[AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939	2016-11-25 16:48:05 +00:00
Craig Topper	1e48829747	[AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables. llvm-svn: 287937	2016-11-25 16:33:53 +00:00
Marek Olsak	e3895bfb47	Revert "AMDGPU: Implement SGPR spilling with scalar stores" This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936	2016-11-25 16:03:34 +00:00
Marek Olsak	dad553a5cf	Revert "AMDGPU: Fix MMO when splitting spill" This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1. llvm-svn: 287935	2016-11-25 16:03:27 +00:00
Marek Olsak	8cbbf65361	Revert "AMDGPU: Fix adding extra implicit def of register" This reverts commit e834ce5976567575621901fb967b8018b9916d71. llvm-svn: 287934	2016-11-25 16:03:22 +00:00
Marek Olsak	713e6fc531	Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling" This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb. llvm-svn: 287933	2016-11-25 16:03:19 +00:00
Marek Olsak	a45dae458d	Revert "AMDGPU: Make m0 unallocatable" This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932	2016-11-25 16:03:15 +00:00
Marek Olsak	ea848df84c	Revert "AMDGPU: Remove m0 spilling code" This reverts commit f18de36554eb22416f8ba58e094e0272523a4301. llvm-svn: 287931	2016-11-25 16:03:06 +00:00
Marek Olsak	18a95bcb3c	Revert "AMDGPU: Preserve m0 value when spilling" This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce. llvm-svn: 287930	2016-11-25 16:03:02 +00:00
Simon Dardis	c08af6db5b	[mips] Correct jal expansion for local symbols in .local directives. This patch corrects the behaviour of code such as: .local foo jal foo foo: to use the correct jal expansion when writing ELF files. Patch by: Daniel Sanders Reviewers: zoran.jovanovic, seanbruno, vkalintiris Differential Revision: https://reviews.llvm.org/D24722 llvm-svn: 287918	2016-11-25 11:06:43 +00:00
Craig Topper	d4091494d3	[X86] Invert an 'if' and early out to fix a weird indentation. NFCI llvm-svn: 287909	2016-11-25 02:29:24 +00:00
Craig Topper	a46936185a	[X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI llvm-svn: 287908	2016-11-25 02:29:21 +00:00
Simon Pilgrim	f1ee930db0	Fix unused variable warning llvm-svn: 287889	2016-11-24 15:24:47 +00:00
Benjamin Kramer	fc54e35d94	[X86] Don't round trip a unique_ptr through a raw pointer for assignment. No functional change. llvm-svn: 287888	2016-11-24 15:17:39 +00:00
Simon Pilgrim	9c71e07276	[X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64 Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit). The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32. Differential Revision: https://reviews.llvm.org/D26938 llvm-svn: 287886	2016-11-24 15:12:56 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	7c26a6f9ef	[X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result llvm-svn: 287878	2016-11-24 14:02:30 +00:00
Simon Pilgrim	ab323ec411	[X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering llvm-svn: 287877	2016-11-24 13:38:59 +00:00
Nikolai Bozhenov	3a8d108b2b	[x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875	2016-11-24 13:23:35 +00:00
Nikolai Bozhenov	bb64aa14a3	[x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter Move the definitions of three variables out of the switch. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25192 llvm-svn: 287874	2016-11-24 13:15:49 +00:00
Nikolai Bozhenov	a2dabed3b6	[x86] Rewrite getAddressFromInstr helper function - It does not modify the input instruction - Second operand of any address is always an Index Register, make sure we actually check for that, instead of a check for an immediate value Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D24938 llvm-svn: 287873	2016-11-24 13:05:43 +00:00
Simon Pilgrim	a3af79678e	[X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870	2016-11-24 12:13:46 +00:00
Matt Arsenault	7b54dd039e	AMDGPU: Preserve m0 value when spilling llvm-svn: 287844	2016-11-24 00:26:50 +00:00
Matt Arsenault	94b32ffe8e	TRI: Add hook to pass scavenger during frame elimination The scavenger was not passed if requiresFrameIndexScavenging was enabled. I need to be able to test for the availability of an unallocatable register here, so I can't create a virtual register for it. It might be better to just always use the scavenger and stop creating virtual registers. llvm-svn: 287843	2016-11-24 00:26:47 +00:00
Matt Arsenault	5ee3325358	AMDGPU: Remove m0 spilling code Since m0 isn't allocatable it should never be spilled anymore. llvm-svn: 287842	2016-11-24 00:26:44 +00:00
Matt Arsenault	9e5c7b1031	AMDGPU: Make m0 unallocatable m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841	2016-11-24 00:26:40 +00:00
Simon Pilgrim	3ce6a545c7	[X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq llvm-svn: 287835	2016-11-23 22:35:06 +00:00
Matt Arsenault	a24d84beb9	AMDGPU: Cleanup immediate folding code Move code down to use, reorder to avoid hard to follow immediate folding logic. llvm-svn: 287818	2016-11-23 21:51:07 +00:00
Matt Arsenault	391c3ea9bc	AMDGPU: Fix debug printing The uint8_t was printed as a char which didn't really work. llvm-svn: 287817	2016-11-23 21:51:05 +00:00
Matt Arsenault	997a9abf4c	AMDGPU: Fix not setting kill flag on temp reg when spilling llvm-svn: 287808	2016-11-23 21:00:12 +00:00
Matt Arsenault	dd0cb2a3e5	AMDGPU: Fix adding extra implicit def of register In the scalar case, there's no reason to add an additional def of the same register. llvm-svn: 287807	2016-11-23 21:00:10 +00:00
Matt Arsenault	2669a76f01	AMDGPU: Fix MMO when splitting spill The size and offset were wrong. The size of the object was being used for the size of the access, when here it is really being split into 4-byte accesses. The underlying object size is set in the MachinePointerInfo, which also didn't have the offset set. llvm-svn: 287806	2016-11-23 20:52:53 +00:00
Michael Kuperstein	47eb85a003	[X86] Allow folding of stack reloads when loading a subreg of the spilled reg We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792	2016-11-23 18:33:49 +00:00
Nemanja Ivanovic	10fc3cfc63	[PowerPC] Remove InstAlias definitions that cause incorrect assembly In rL283190, I added some InstAlias definitions to generate extended mnemonics for some uses of the XXPERMDI instruction. However, when the assembler matches these extended mnemonics, it matches the new instruction in situations where it should match the old one. This patch removes these definitions and accomplishes that by defining these mnemonics with additional instructions that are isCodeGenOnly. Fixes PR31127. llvm-svn: 287765	2016-11-23 15:51:52 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Craig Topper	f57e17def0	[AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles. llvm-svn: 287744	2016-11-23 06:54:55 +00:00
Zvi Rackover	14aba43ea9	[X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors. Reviewers: RKSimon, craig.topper, delena, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26985 llvm-svn: 287743	2016-11-23 06:45:25 +00:00
Kuba Mracek	06995e866b	[xray] Add XRay support for Mach-O in CodeGen Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support. Differential Revision: https://reviews.llvm.org/D26983 llvm-svn: 287734	2016-11-23 02:07:04 +00:00
Matthias Braun	7f423442d1	TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays (getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(), enableRALocalReassignment(), ... also do not seem to be universal enough to make sense outside of CodeGen. Differential Revision: https://reviews.llvm.org/D26948 llvm-svn: 287708	2016-11-22 22:09:03 +00:00
Simon Dardis	6efb8dd2e3	[mips] seb, seh instruction aliases Add the single operand form. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D26961 llvm-svn: 287681	2016-11-22 19:17:23 +00:00
Nemanja Ivanovic	b8e30d6db6	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE This patch corresponds to review: https://reviews.llvm.org/D26861 It also fixes PR30730. Committing on behalf of Lei Huang. llvm-svn: 287679	2016-11-22 19:02:07 +00:00
Simon Pilgrim	4aa876ca7c	[X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles. This occurs during UINT_TO_FP v2f64 lowering. We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle llvm-svn: 287676	2016-11-22 17:50:06 +00:00
Vasileios Kalintiris	04dc211e6a	[mips] Add support for unaligned load/store macros. Add missing unaligned store macros (ush/usw) and fix the exisiting implementation of the unaligned load macros in order to generate identical expansions with the GNU assembler. llvm-svn: 287646	2016-11-22 16:43:49 +00:00
Tim Northover	b64fb453ea	CodeGen: simplify TargetMachine::getSymbol interface. NFC. No-one actually had a mangler handy when calling this function, and getSymbol itself went most of the way towards getting its own mangler (with a local TLOF variable) so forcing all callers to supply one was just extra complication. llvm-svn: 287645	2016-11-22 16:17:20 +00:00
Zvi Rackover	9a355219d1	[X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC. llvm-svn: 287644	2016-11-22 15:33:28 +00:00
Zvi Rackover	0aa1c32d14	[X86] Remove dead code from LowerVectorBroadcast Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish. Reviewers: craig.topper, delena, RKSimon, andreadb Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D26678 llvm-svn: 287643	2016-11-22 15:17:52 +00:00
Chad Rosier	ecc77273a0	[AArch64] Set the max interleave factor for Falkor. llvm-svn: 287642	2016-11-22 14:25:02 +00:00
Chad Rosier	2abc29c593	[AArch64] Maximize 80-column. NFC. llvm-svn: 287640	2016-11-22 14:12:09 +00:00
Coby Tayree	49b3733d57	[AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly. This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]"). GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches: "vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier. "vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier. This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand. For "vaddps xmm1, xmm2, [a]" "char a;" will imply "BYTE PTR" qualifier "short a;" will imply "WORD PTR" qualifier. This commit aligns LLVM to GCC’s behavior. This is the LLVM part of the review. The Clang part of the review: https://reviews.llvm.org/D26587 Differential Revision: https://reviews.llvm.org/D26586 llvm-svn: 287630	2016-11-22 09:30:29 +00:00
Craig Topper	3dcf45f08d	[X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version. I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX. There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table. llvm-svn: 287622	2016-11-22 05:31:43 +00:00
Craig Topper	cada9f2275	[AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621	2016-11-22 04:57:34 +00:00
Craig Topper	da22267055	[AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type Summary: Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking. This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018. I plan to add support for more operations in future patches. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26902 llvm-svn: 287612	2016-11-22 03:51:53 +00:00
Stanislav Mekhanoshin	ae0f6620e4	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608	2016-11-22 01:42:34 +00:00
Geoff Berry	e0bf52f394	[AArch64LoadStoreOptimizer] Don't treat write to XZR/WZR as a clobber. Summary: When searching for load/store instructions to pair/merge don't treat writes to WZR/XZR as clobbers since they don't change the value read from WZR/XZR (which is always 0). Reviewers: mcrosier, junbuml, jmolloy, t.p.northover Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26921 llvm-svn: 287592	2016-11-21 22:51:10 +00:00
Simon Dardis	43115a1ce4	[mips] seq macro support This patch adds the seq macro. This partially resolves PR/30381. Thanks to Sean Bruno for reporting the issue! Reviewers: zoran.jovanovic, vkalintiris, seanbruno Differential Revision: https://reviews.llvm.org/D24607 llvm-svn: 287573	2016-11-21 20:30:41 +00:00
Coby Tayree	94ddbb4a04	small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax. Differential Revision: https://reviews.llvm.org/D26913 llvm-svn: 287548	2016-11-21 15:50:56 +00:00
Simon Pilgrim	b7bbaa669b	[X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input). This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit. llvm-svn: 287535	2016-11-21 12:05:49 +00:00
Michael Zuckerman	8462faeaba	Fixing a small typo (A->U). This seem to fixes PR30992. - HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX + HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX llvm-svn: 287532	2016-11-21 11:52:11 +00:00
Craig Topper	9f2d632ee7	[AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX. llvm-svn: 287523	2016-11-21 07:51:31 +00:00
Alexei Starovoitov	7ab125dbf3	[bpf] fix dwarf elf relocs and line numbers - teach RelocVisitor to recognize bpf relocations - fix AsmInfo->PointerSize to make sure dwarf is emitted correctly - add a test for the above Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 287521	2016-11-21 06:21:23 +00:00
Craig Topper	0dfc09372f	[X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC llvm-svn: 287519	2016-11-21 04:07:56 +00:00
Dean Michael Berris	31761f300d	[XRay][AArch64] Implemented a test for the compile-time sleds emitted, and fixed a bug in the jump instruction This patch adds a test for the assembly code emitted with XRay instrumentation. It also fixes a bug where the operand of a jump instruction must be not the number of bytes to jump over, but rather the number of 4-byte instructions. Author: rSerge Reviewers: dberris, rengolin Differential Revision: https://reviews.llvm.org/D26805 llvm-svn: 287516	2016-11-21 03:01:43 +00:00
Simon Dardis	1dcb911061	[mips] Restrict tail call optimization The tail call optimization was being used without proper consideration of ABI requirements for saving and restoring the GP. This patch restricts tail call optimization to functions within the same translation unit. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D24763 llvm-svn: 287505	2016-11-20 21:23:08 +00:00
Coby Tayree	99a6639047	The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it. More specifically - (MS dialect) broadcasting variants were implemented falsely. Differential Revision: https://reviews.llvm.org/D26257 llvm-svn: 287501	2016-11-20 17:19:55 +00:00
Coby Tayree	97e9cf62f4	Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list: vcvtps2pd vcvtudq2pd vcvtps2qq vcvttps2qq vcvtps2uqq vcvttps2uqq variants are: [Dst]XMM(zero-masked/merge-masked/unmasked) [Src]Mem64 Differential Revision: https://reviews.llvm.org/D26799 llvm-svn: 287500	2016-11-20 17:09:56 +00:00
Simon Pilgrim	5fadce4a3f	[X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible llvm-svn: 287497	2016-11-20 16:11:36 +00:00
Simon Pilgrim	5401bae523	[X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines llvm-svn: 287496	2016-11-20 15:24:38 +00:00
Simon Pilgrim	3f40412e0f	[X86][AVX512] Add support for VBMI VPERMV target shuffle combines llvm-svn: 287495	2016-11-20 15:05:45 +00:00
Simon Pilgrim	c17e1b74b8	[X86][AVX512VL] Removed duplicate operation action Basic AVX512F already declared uint_to_fp v4i32 as legal llvm-svn: 287493	2016-11-20 14:19:29 +00:00
Simon Pilgrim	3f10e9953d	Strip trailing whitespace llvm-svn: 287492	2016-11-20 14:05:23 +00:00
Simon Pilgrim	096b6d4f81	[X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64) llvm-svn: 287491	2016-11-20 14:03:23 +00:00
Simon Pilgrim	fbd2221de5	Fix comment typos. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287486	2016-11-20 13:10:51 +00:00
Oren Ben Simhon	c0f073b67f	[X86] RegCall - Handling long double arguments The change is part of RegCall calling convention support for LLVM. Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack). This review present the change and the corresponding tests. Differential Revision: https://reviews.llvm.org/D26151 llvm-svn: 287485	2016-11-20 11:06:07 +00:00
Coby Tayree	179ff0e541	[X86][InlineAsm]Test commit. Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false" Differential Revision: https://reviews.llvm.org/D26797 llvm-svn: 287484	2016-11-20 09:31:11 +00:00
Alexei Starovoitov	e6ddac0def	[bpf] add BPF disassembler add BPF disassembler, so tools like llvm-objdump can be used: $ llvm-objdump -d -no-show-raw-insn ./sockex1_kern.o ./sockex1_kern.o: file format ELF64-BPF Disassembly of section socket1: bpf_prog1: 0: r6 = r1 8: r0 = (u8 )skb[23] 10: (u32 )(r10 - 4) = r0 18: r1 = (u32 )(r6 + 4) 20: if r1 != 4 goto 8 28: r2 = r10 30: r2 += -4 ld_imm64 (the only 16-byte insn) and special ld_abs/ld_ind instructions had to be treated in a special way. The decoders for the rest of the insns are automatically generated. Add tests to cover new functionality. Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 287477	2016-11-20 02:25:00 +00:00
Benjamin Kramer	ffd3715d16	Give some helper classes/functions internal linkage. NFC. llvm-svn: 287462	2016-11-19 20:44:26 +00:00
Simon Pilgrim	a14e0cb852	[X86][SSE] Improve PSHUFB lowering from either input Canonicalization may leave the zeroable vector in the first input. llvm-svn: 287461	2016-11-19 20:41:48 +00:00
Simon Pilgrim	623a7c57b5	[X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets llvm-svn: 287459	2016-11-19 20:12:34 +00:00
Craig Topper	893ea9fb2c	[X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI llvm-svn: 287443	2016-11-19 17:33:17 +00:00
Daniel Sanders	c95590bc45	Try again to fix unused variable warning on lld-x86_64-darwin13 after r287439. The previous attempt didn't work. I assume LLVM_ATTRIBUTE_UNUSED isn't available on that machine. llvm-svn: 287442	2016-11-19 14:47:41 +00:00
Daniel Sanders	c6d1986a84	Try to fix unused variable warning on lld-x86_64-darwin13 after r287439. Whether the variable is used or not depends on NDEBUG. llvm-svn: 287440	2016-11-19 13:50:32 +00:00
Daniel Sanders	72db2a390a	Check that emitted instructions meet their predicates on all targets except ARM, Mips, and X86. Summary: * ARM is omitted from this patch because this check appears to expose bugs in this target. * Mips is omitted from this patch because this check either detects bugs or deliberate emission of instructions that don't satisfy their predicates. One deliberate use is the SYNC instruction where the version with an operand is correctly defined as requiring MIPS32 while the version without an operand is defined as an alias of 'SYNC 0' and requires MIPS2. * X86 is omitted from this patch because it doesn't use the tablegen-erated MCCodeEmitter infrastructure. Patches for ARM and Mips will follow. Depends on D25617 Reviewers: tstellarAMD, jmolloy Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D25618 llvm-svn: 287439	2016-11-19 13:05:44 +00:00
Dylan McKay	1a55f201ef	[AVR] Remove a bunch of unused variables llvm-svn: 287416	2016-11-19 01:33:42 +00:00
Dylan McKay	19270f3438	[AVR] Remove a variable which was unused in release mode In release mode where assertions are not enabled, this caused an 'unused variable' warning. llvm-svn: 287414	2016-11-19 01:14:44 +00:00
Konstantin Zhuravlyov	aefee42e0f	[AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input Differential Revision: https://reviews.llvm.org/D26862 llvm-svn: 287389	2016-11-18 22:31:08 +00:00
Matthias Braun	9f15a79e5d	Timer: Track name and description. The previously used "names" are rather descriptions (they use multiple words and contain spaces), use short programming language identifier like strings for the "names" which should be used when exporting to machine parseable formats. Also removed a unused TimerGroup from Hexxagon. Differential Revision: https://reviews.llvm.org/D25583 llvm-svn: 287369	2016-11-18 19:43:18 +00:00
Matt Arsenault	afe614cb38	AMDGPU: Fix unused variable warning llvm-svn: 287362	2016-11-18 18:33:36 +00:00
Ehsan Amiri	395be572f0	[PPC] limit line width to 80 characters NFC. Forgot to fix this in the original commit. llvm-svn: 287350	2016-11-18 16:24:27 +00:00
Simon Dardis	0e2ee3b4b9	[mips][msa] Implement f16 support The MIPS MSA ASE provides instructions to convert to and from half precision floating point. This patch teaches the MIPS backend to treat f16 as a legal type and how to promote such values to f32 for the usual set of operations. As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during type legalization. Reviewers: zoran.jovanvoic, vkalintiris Differential Revision: https://reviews.llvm.org/D26398 llvm-svn: 287349	2016-11-18 16:17:44 +00:00
Tom Stellard	01e65d2cfc	AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342	2016-11-18 13:53:34 +00:00
Simon Pilgrim	7938bd666e	Cleanup function with clang-format. NFCI. llvm-svn: 287340	2016-11-18 12:16:18 +00:00
Nicolai Haehnle	ce2b589df5	AMDGPU: Fix legalization of MUBUF instructions in shaders Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339	2016-11-18 11:55:52 +00:00
Simon Pilgrim	dcd8433597	Fix spelling mistakes in MIPS target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287338	2016-11-18 11:53:36 +00:00
Ehsan Amiri	ff0942e6ea	[Power9] Add patterns for vnegd, vnegw Exploit new instructions by adding patterns to .td file. https://reviews.llvm.org/D26551 llvm-svn: 287334	2016-11-18 11:05:55 +00:00
Simon Pilgrim	e995a8088d	Fix spelling mistakes in AMDGPU target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287333	2016-11-18 11:04:02 +00:00
Simon Pilgrim	fd8bf984f4	Fix typo in comment. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287331	2016-11-18 10:52:12 +00:00
Ehsan Amiri	85818684c6	[PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329	2016-11-18 10:41:44 +00:00
Craig Topper	02b5a1b50f	[AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects. The same thing was done to 32-bit and 64-bit element sizes previously. This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics. llvm-svn: 287312	2016-11-18 05:04:44 +00:00
Matt Arsenault	eff1ad8d8e	AMDGPU: Move redundant setting of inst properties llvm-svn: 287311	2016-11-18 04:42:59 +00:00
Matt Arsenault	742deb2495	AMDGPU: Fix crash on illegal type for inlineasm There are still crashes on non-MVT types in other places. llvm-svn: 287310	2016-11-18 04:42:57 +00:00
Alexei Starovoitov	8f9f8210c1	convert bpf assembler to look like kernel verifier output since bpf instruction set was introduced people learned to read and understand kernel verifier output whereas llvm asm output stayed obscure and unknown. Convert llvm to emit assembler text similar to kernel to avoid this discrepancy Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 287300	2016-11-18 02:32:35 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Simon Pilgrim	6ba672e542	Fix spelling mistakes in Hexagon target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287248	2016-11-17 19:21:20 +00:00
Simon Pilgrim	9d15fb3c10	Fix spelling mistakes in X86 target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287247	2016-11-17 19:03:05 +00:00
Konstantin Zhuravlyov	0a1a7b6b23	Revert "AMDGPU: Enable ConstrainCopy DAG mutation" This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233	2016-11-17 16:41:49 +00:00
Simon Pilgrim	67ef3b984a	Wdocumentation fix llvm-svn: 287224	2016-11-17 12:21:45 +00:00
Simon Pilgrim	8eca5520dc	[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves. If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer). This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases). Partial fix for PR30845 Differential Revision: https://reviews.llvm.org/D26590 llvm-svn: 287223	2016-11-17 12:14:49 +00:00
Pablo Barrio	c41e856f53	[ARM] Relax restriction on variadic functions for tailcall optimization Summary: Variadic functions can be treated in the same way as normal functions with respect to the number and types of parameters. Reviewers: grosbach, olista01, t.p.northover, rengolin Subscribers: javed.absar, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26748 llvm-svn: 287219	2016-11-17 10:56:58 +00:00
Oren Ben Simhon	489d6eff4f	[X86] RegCall - Handling v64i1 in 32/64 bit target Register Calling Convention defines a new behavior for v64i1 types. This type should be saved in GPR. However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit). Differential Revision: https://reviews.llvm.org/D26181 llvm-svn: 287217	2016-11-17 09:59:40 +00:00
Craig Topper	05b0fcd168	[X86] Fix formatting. NFC llvm-svn: 287211	2016-11-17 05:59:55 +00:00
Dean Michael Berris	3234d3a4bd	[XRay] Support AArch64 in LLVM This patch adds XRay support in LLVM for AArch64 targets. This patch is one of a series: Clang: https://reviews.llvm.org/D26415 compiler-rt: https://reviews.llvm.org/D26413 Author: rSerge Reviewers: rengolin, dberris Subscribers: amehsan, aemerson, llvm-commits, iid_iunknown Differential Revision: https://reviews.llvm.org/D26412 llvm-svn: 287209	2016-11-17 05:15:37 +00:00
Chris Bieneman	05c279fc4b	[CMake] NFC. Updating CMake dependency specifications This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system. llvm-svn: 287206	2016-11-17 04:36:50 +00:00
Konstantin Zhuravlyov	d709efb0da	[AMDGPU] Custom lower f16 = fp_round f64 llvm-svn: 287203	2016-11-17 04:28:37 +00:00
Konstantin Zhuravlyov	3f0cdc7a11	[AMDGPU] Promote f16/i16 conversions to f32/i32 llvm-svn: 287201	2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov	662e01dfbe	[AMDGPU] Expand `br_cc` for f16 Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199	2016-11-17 03:49:01 +00:00
Dylan McKay	017a55b092	[AVR] Wrap all methods in the pseudo expansion pass in an anon namespace The '-fpermissive' compiler flag complains if the template specializations used in the class are used in a different namespace. llvm-svn: 287176	2016-11-16 23:06:14 +00:00
Dylan McKay	5810c7ee6e	[AVR] Remove unused method from AVRTargetMachine llvm-svn: 287173	2016-11-16 22:48:30 +00:00
Sanjay Patel	066139a3ec	[x86] allow FP-logic ops when one operand is FP and result is FP We save an inter-register file move this way. If there's any CPU where the FP logic is slower, we could transform this back to int-logic in MachineCombiner. This helps, but doesn't solve, PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 The 'andn' test shows that we're missing a pattern match to recognize the xor with -1 constant as a 'not' op. llvm-svn: 287171	2016-11-16 22:34:05 +00:00
Dylan McKay	a789f40002	[AVR] Add the pseudo instruction expansion pass Summary: A lot of the pseudo instructions are required because LLVM assumes that all integers of the same size as the pointer size are legal. This means that it will not currently expand 16-bit instructions to their 8-bit variants because it thinks 16-bit types are legal for the operations. This also adds all of the CodeGen tests that required the pass to run. Reviewers: arsenm, kparzysz Subscribers: wdng, mgorny, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D26577 llvm-svn: 287162	2016-11-16 21:58:04 +00:00
Peter Collingbourne	7d0c869b86	X86: Simplify X86ISD::Wrapper operand checks. NFCI. We only ever create TargetConstantPool, TargetJumpTable, TargetExternalSymbol, TargetGlobalAddress, TargetGlobalTLSAddress, MCSymbol and TargetBlockAddress nodes as operands of X86ISD::Wrapper nodes, so we can remove one check and invert the other. Also update the documentation comment for X86ISD::Wrapper. Differential Revision: https://reviews.llvm.org/D26731 llvm-svn: 287160	2016-11-16 21:48:59 +00:00

... 5 6 7 8 9 ...

40780 Commits