llvm-project

Commit Graph

Author	SHA1	Message	Date
Dan Gohman	26c6765bd6	[WebAssembly] Define WebAssembly-specific relocation codes. Currently WebAssembly has two kinds of relocations; data addresses and function addresses. This adds ELF relocations for them, as well as an MC symbol kind to indicate which type of relocation is needed. llvm-svn: 257416	2016-01-11 23:38:05 +00:00
Sanjay Patel	e896ede7f1	[LibCallSimplifier] use instruction-level fast-math-flags to transform log calls Also, add tests to verify that we're checking 'fast' on both calls of each transform pair, tighten the CHECK lines, and give the tests more meaningful names. This is a continuation of: http://reviews.llvm.org/rL255555 http://reviews.llvm.org/rL256871 http://reviews.llvm.org/rL256964 http://reviews.llvm.org/rL257400 http://reviews.llvm.org/rL257404 llvm-svn: 257414	2016-01-11 23:31:48 +00:00
Rafael Espindola	36a425b618	Remove a bugs assert. There is no reason the value being printed has to be positive. Fixes pr25802. llvm-svn: 257412	2016-01-11 23:21:45 +00:00
Sanjay Patel	6c1ddbb7b6	[LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe Fix the FIXME added with: http://reviews.llvm.org/rL257400 llvm-svn: 257404	2016-01-11 22:50:36 +00:00
Justin Bogner	0fb7ed5726	LoopUnroll: Use the optsize threshold for minsize as well Currently we're unrolling loops more in minsize than in optsize, which means -Oz will have a larger code size than -Os. That doesn't make any sense. This resolves the FIXME about this in LoopUnrollPass and extends the optsize test to make sure we use the smaller threshold for minsize as well. llvm-svn: 257402	2016-01-11 22:39:43 +00:00
Sanjay Patel	683f29735f	[LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 The intent of the patch is to preserve the current behavior of the transform except that we use the sqrt instruction's 'fast' attribute as a trigger rather than the function-level attribute. But this raises a bug noted by the new FIXME comment. In order to do this transform: sqrt((x * x) * y) ---> fabs(x) * sqrt(y) ...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. If any of those ops is strict, we should bail out. Differential Revision: http://reviews.llvm.org/D15937 llvm-svn: 257400	2016-01-11 22:34:19 +00:00
Rafael Espindola	5a7756044f	Add a missing error handling to llvm-lto. llvm-svn: 257395	2016-01-11 22:08:22 +00:00
Matt Arsenault	5e0bdb8b95	AMDGPU: Implement {{s\|u}}int_to_fp i64 -> f32 The old lowering for uint_to_fp failed opencl conformance. It might be OK for fast math mode, but I'm not sure. llvm-svn: 257393	2016-01-11 22:01:48 +00:00
Lang Hames	4b6e021fad	XFAIL the LLI remote JIT tests on Win32. llvm-svn: 257391	2016-01-11 21:41:34 +00:00
Matt Arsenault	9dc82567c9	AMDGPU: Cleanup udiv test llvm-svn: 257387	2016-01-11 21:18:40 +00:00
Matt Arsenault	800fecf9de	AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA target It might be better to let this be a select failure instead. llvm-svn: 257386	2016-01-11 21:18:33 +00:00
Ahmed Bougacha	30bd60785b	[X86] Add AVX512 testcase for r248965/PR24512. llvm-svn: 257385	2016-01-11 21:16:21 +00:00
Adhemerval Zanella	e600c99a4e	[sanitizer] [msan] Fix origin store of array types This patch fixes the memory sanitizer origin store instrumentation for array types. This can be triggered by cases where frontend lowers function return to array type instead of aggregation. For instance, the C code: -- struct mypair { int64_t x; int y; }; mypair my_make_pair(int64_t x, int y) { mypair p; p.x = x; p.y = y; return p; } int foo (int p) { mypair z = my_make_pair(p, 0); return z.y + z.x; } -- It will be lowered with target set to aarch64-linux and -O0 to: -- [...] define i32 @_Z3fooi(i32 %p) #0 { [...] %call = call [2 x i64] @_Z12my_make_pairxi(i64 %conv, i32 0) %1 = bitcast %struct.mypair* %z to [2 x i64]* store [2 x i64] %call, [2 x i64]* %1, align 8 [...] -- The origin store will emit a 'icmp' to test each store value again the TLS origin array. However since 'icmp' does not support ArrayType the memory instrumentation phase will bail out with an error. This patch change it by using the same strategy used for struct type on array. It fixes the 'test/msan/insertvalue_origin.cc' for aarch64 (the -O0 case). llvm-svn: 257375	2016-01-11 19:55:27 +00:00
Lang Hames	4073e6f7e2	Remove the remote-JIT small code model tests for now. They're causing intermittent XPASSes on some builders. These can be reinstated when we have proper support for small-code model in the JIT. llvm-svn: 257359	2016-01-11 17:38:25 +00:00
Lang Hames	a692ec43a8	XFAIL the remote small code model tests on x86. Small code model is not properly supported, and only worked previously because we weren't really running them out-of-process. llvm-svn: 257355	2016-01-11 17:09:58 +00:00
Matt Arsenault	fe453a7da9	AMDGPU: int_to_fp test cleanups llvm-svn: 257354	2016-01-11 17:02:10 +00:00
Matt Arsenault	5319b0add5	AMDGPU: Fix ctlz combine for sub 32-bit types llvm-svn: 257353	2016-01-11 17:02:06 +00:00
Matt Arsenault	de5fbe9c60	AMDGPU: Pattern match ffbh pattern to instruction. The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352	2016-01-11 17:02:00 +00:00
Matt Arsenault	f058d67643	AMDGPU: Custom lower i64 ctlz llvm-svn: 257348	2016-01-11 16:50:29 +00:00
Matt Arsenault	5ca3c72c5a	LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal llvm-svn: 257345	2016-01-11 16:37:46 +00:00
Lang Hames	9d7a269f47	[LLI] Replace the LLI remote-JIT support with the new ORC remote-JIT components. The new ORC remote-JITing support provides a superset of the old code's functionality, so we can replace the old stuff. As a bonus, a couple of previously XFAILed tests have started passing. llvm-svn: 257343	2016-01-11 16:35:55 +00:00
Silviu Baranga	603954ef0e	Revert r257164 - it has caused spec2k6 failures in LTO mode llvm-svn: 257340	2016-01-11 16:19:38 +00:00
Daniel Sanders	4d32300cfd	[mips] Never select JAL for calls to an absolute immediate address. Summary: It actually takes an offset into the current PC-region. This fixes the 'expr' command in lldb. Reviewers: vkalintiris, jaydeep, bhushan Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16054 llvm-svn: 257339	2016-01-11 15:57:46 +00:00
Junmo Park	7ceec0b82f	[BranchFolding] Set correct mem refs (2nd try) This is a recommit of r257253 which was reverted in r257270. Previous testcase can make failure on some targets due to using opt with O3 option. Original Summary: Merge MBBICommon and MBBI's MMOs. Differential Revision: http://reviews.llvm.org/D15990 llvm-svn: 257317	2016-01-11 07:15:38 +00:00
Craig Topper	9d2cab7742	[AVX-512] Remove another extra space from the Intel syntax asm strings. llvm-svn: 257304	2016-01-11 01:03:40 +00:00
Craig Topper	686d4c79cb	[AVX-512] Fix test case update missed in r257299. llvm-svn: 257303	2016-01-11 00:56:48 +00:00
Craig Topper	156622ad9d	[AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used. llvm-svn: 257300	2016-01-11 00:44:56 +00:00
Craig Topper	bfe13ff6ca	[AVX-512] Make spacing between comma and {sae} operand consistent in asm strings. llvm-svn: 257299	2016-01-11 00:44:52 +00:00
Elena Demikhovsky	542dfcf44c	Optimized instruction sequence for sitofp operation on X86-32 Optimized sitofp i64 %x to double. The current sequence movl %ecx, 8(%esp) movl %edx, 12(%esp) fildll 8(%esp) is replaced with: movd %ecx, %xmm0 movd %edx, %xmm1 punpckldq %xmm1, %xmm0 movq %xmm0, 8(%esp) Differential Revision: http://reviews.llvm.org/D15946 llvm-svn: 257285	2016-01-10 09:41:22 +00:00
Michael Zuckerman	885f61c534	[AVX512] add PRORVQ and PRORVD Intrinsic Differential Revision:http://reviews.llvm.org/D15955 llvm-svn: 257283	2016-01-10 09:16:41 +00:00
David Majnemer	7a99dd4d3f	Add test for r257279. llvm-svn: 257280	2016-01-10 07:13:33 +00:00
Chen Li	1689c2f54b	[SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad. Summary: This is a fix of D13718. D13718 was committed but then reverted because of the following bug: https://llvm.org/bugs/show_bug.cgi?id=25299 This patch fixes the issue shown in the bug. Reviewers: majnemer, reames Subscribers: jevinskie, llvm-commits Differential Revision: http://reviews.llvm.org/D14308 llvm-svn: 257277	2016-01-10 05:48:01 +00:00
Joseph Tremoulet	a9a05cbcf9	[WinEH] Fix catchpad pred verification Summary: The code was simply ensuring that the catchpad's pred is its catchswitch, which was letting cases slip through where the flow edge was the unwind edge of the catchswitch rather than one of its catch clauses. Reviewers: andrew.w.kaylor, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16011 llvm-svn: 257275	2016-01-10 04:32:03 +00:00
Joseph Tremoulet	8ea8086322	[WinEH] Disallow cyclic unwinds Summary: Funclet-based EH personalities/tables likely can't handle these, and they can't be generated at source, so make them officially illegal in IR as well. Reviewers: andrew.w.kaylor, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15963 llvm-svn: 257274	2016-01-10 04:31:05 +00:00
Joseph Tremoulet	81e81960e3	[WinEH] Verify consistent funclet unwind exits Summary: A funclet EH pad may be exited by an unwind edge, which may be a cleanupret exiting its cleanuppad, an invoke exiting a funclet, or an unwind out of a nested funclet transitively exiting its parent. Funclet EH personalities require all such exceptional exits from a given funclet to have the same unwind destination, and EH preparation / state numbering / table generation implicitly depends on this. Formalize it as a rule of the IR in the LangRef and verifier. Reviewers: rnk, majnemer, andrew.w.kaylor Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15962 llvm-svn: 257273	2016-01-10 04:30:02 +00:00
Joseph Tremoulet	e28885e693	[WinEH] Verify unwind edges against EH pad tree Summary: Funclet EH personalities require a tree-like nesting among funclets (enforced by the ParentPad linkage in the IR), and also require that unwind edges conform to certain rules with respect to the tree: - An unwind edge may exit 0 or more ancestor pads - An unwind edge must enter exactly one EH pad, which must be distinct from any exited pads - A cleanupret's edge must exit its cleanuppad Describe these rules in the LangRef, and enforce them in the verifier. Reviewers: rnk, majnemer, andrew.w.kaylor Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15961 llvm-svn: 257272	2016-01-10 04:28:38 +00:00
Michael Zolotukhin	0fc89c67cc	Revert "[BranchFolding] Set correct mem refs" This reverts commit 1ff11017d2669b933b29fcbb6451cfcda34ad693. llvm-svn: 257270	2016-01-09 23:53:16 +00:00
Simon Pilgrim	c7bebcbfd8	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur. This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars. llvm-svn: 257266	2016-01-09 20:59:39 +00:00
Simon Pilgrim	2e7a1849c9	[X86][AVX] Add support for i64 broadcast loads on 32-bit targets Added 32-bit AVX1/AVX2 broadcast tests. llvm-svn: 257264	2016-01-09 19:59:27 +00:00
Junmo Park	e1582cec34	[BranchFolding] Set correct mem refs Merge MBBICommon and MBBI's MMOs. Differential Revision: http://reviews.llvm.org/D15990 llvm-svn: 257253	2016-01-09 07:30:13 +00:00
Manuel Jacob	734e73342d	[RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector(). Summary: This is analogous to r256079, which removed an overly strong assertion, and r256812, which simplified the code by replacing three conditionals by one. Reviewers: reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D16019 llvm-svn: 257250	2016-01-09 04:02:16 +00:00
Philip Reames	5715f576ea	[rs4gc] Optionally directly relocated vector of pointers This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates. The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications. Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR. At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested. Differential Revision: http://reviews.llvm.org/D15982 llvm-svn: 257244	2016-01-09 01:31:13 +00:00
Mike Aizatsky	17dbc2831e	[llvm-symbolizer] -print-source-context-lines option to print source code around the line. Differential Revision: http://reviews.llvm.org/D15909 llvm-svn: 257236	2016-01-09 00:14:35 +00:00
Sanjay Patel	1dc7dfb9d9	[DAGCombiner] don't dereference an operand that doesn't exist (PR26070) The bug was introduced with changes for x86-64 fp128: http://reviews.llvm.org/rL254653 I don't know why an x86 change is here, so I'll follow up in: http://reviews.llvm.org/D15134 Should fix: https://llvm.org/bugs/show_bug.cgi?id=26070 llvm-svn: 257200	2016-01-08 19:53:24 +00:00
Haicheng Wu	a6a3279bd3	[JumpThreading] Split select that has constant conditions coming from the PHI node Look for PHI/Select in the same BB of the form bb: %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ... %s = select p, trueval, falseval And expand the select into a branch structure. This later enables jump-threading over bb in this pass. Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold select if the associated PHI has at least one constant. If the unfolded select is not jump-threaded, it will be folded again in the later optimizations. llvm-svn: 257198	2016-01-08 19:39:39 +00:00
Justin Bogner	e9fb228d59	LoopInfo: Simplify ownership of Loop objects It's strange that LoopInfo mostly owns the Loop objects, but that it defers deleting them to the loop pass manager. Instead, change the oddly named "updateUnloop" to "markAsRemoved" and have it queue the Loop object for deletion. We can't delete the Loop immediately when we remove it, since we need its pointer identity still, so we'll mark the object as "invalid" so that clients can see what's going on. llvm-svn: 257191	2016-01-08 19:08:53 +00:00
Weiming Zhao	4b3b13d3bc	RBIT Instruction only available for ARMv6t2 and above. Summary: r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse. RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5. Patch by Z. Zheng <zhaoshiz@codeaurora.org> Reviewers: apazos, jmolloy, weimingz Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15932 llvm-svn: 257188	2016-01-08 18:43:41 +00:00
Pirama Arumuga Nainar	bf5ccdccb2	Do not ASSERTZEXT for i16 result of bitcast from f16 operand Summary: During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16. Directly return an FP_TO_FP16 node with return type as the promote-to-type of i16. This patch also removes extraneous length check. This legalization should be valid even if integer and float types are of different lengths. This patch breaks a hard-float test for fp16 args. The test is changed to allow a vmov to zero-out the top bits, and also ensure that the return value is in an FP register. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15438 llvm-svn: 257184	2016-01-08 17:46:05 +00:00
David Majnemer	2a6368f609	[WinEH] CatchHandler which don't have catch objects in StackColoring StackColoring rewrites the frame indicies of operations involving allocas if it can find that the life time of two objects do not overlap. MSVC EH needs to be kept aware of this if happens in the event that a catch object has moved around. However, we represent the non-existance of a catch object with a sentinel frame index (INT_MAX). This sentinel also happens to be the EmptyKey of the SlotRemap DenseMap. Testing for whether or not we need to translate the frame index fails in this case because we call the count method on the DenseMap with the EmptyKey, leading to assertions. Instead, check if it is our sentinel value before trying to look into the DenseMap. This fixes PR26073. llvm-svn: 257182	2016-01-08 17:24:47 +00:00
Tom Stellard	4c4c72db48	AMDGPU/SI: Emit global variable sizes when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173	2016-01-08 14:50:28 +00:00
Tom Stellard	ad8f5e8111	AMDGPU: Emit functions sizes Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172	2016-01-08 14:50:23 +00:00
Teresa Johnson	a1080ee6f0	[ThinLTO] Delay metadata materializtion in function importer The function importer was still materializing metadata when modules were loaded for function importing. We only want to materialize it when we are going to invoke the metadata linking postpass. Materializing it before function importing is not only unnecessary, but also causes metadata referenced by imported functions to be mapped in early, and then not connected to the rest of the module level metadata when it is ultimately linked in. Augmented the test case to specifically check for the metadata being properly connected, which it wasn't before this fix. llvm-svn: 257171	2016-01-08 14:17:41 +00:00
Silviu Baranga	9e007efad2	Re-commit r257064, this time with a fixed assert In setInsertionPoint if the value is not a PHI, Instruction or Argument it should be a Constant, not a ConstantExpr. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257164	2016-01-08 11:11:04 +00:00
Chandler Carruth	1926b70e37	[attrs] Split the late-revisit pattern for deducing norecurse in a top-down manner into a true top-down or RPO pass over the call graph. There are specific patterns of function attributes, notably the norecurse attribute, which are most effectively propagated top-down because all they us caller information. Walk in RPO over the call graph SCCs takes the form of a module pass run immediately after the CGSCC pass managers postorder walk of the SCCs, trying again to deduce norerucrse for each singular SCC in the call graph. This removes a very legacy pass manager specific trick of using a lazy revisit list traversed during finalization of the CGSCC pass. There is no analogous finalization step in the new pass manager, and a lazy revisit list is just trying to produce an RPO iteration of the call graph. We can do that more directly if more expensively. It seems unlikely that this will be the expensive part of any compilation though as we never examine the function bodies here. Even in an LTO run over a very large module, this should be a reasonable fast set of operations over a reasonably small working set -- the function call graph itself. In the future, if this really is a compile time performance issue, we can look at building support for both post order and RPO traversals directly into a pass manager that builds and maintains the PO list of SCCs. Differential Revision: http://reviews.llvm.org/D15785 llvm-svn: 257163	2016-01-08 10:55:52 +00:00
David Majnemer	086fec23ec	[WinEH] Update WinEHFuncInfo if StackColoring merges allocas Windows EH keeping track of which frame index corresponds to a catchpad in order to inform the runtime where the catch parameter should be initialized. LLVM's optimizations are able to prove that the memory used by the catch parameter can be reused with another memory optimization, changing it's frame index. We need to keep WinEHFuncInfo up to date with respect to this or we will miscompile/assert. This fixes PR26069. llvm-svn: 257158	2016-01-08 08:03:55 +00:00
Craig Topper	04493fda81	[X86] Don't print the aliased version of CVTSD2SI64rm. This appears to be a mistake I made years ago. llvm-svn: 257149	2016-01-08 06:09:18 +00:00
Xinliang David Li	062cde9cc3	[PGO] Ensure vp data in indexed profile always sorted Done in InstrProfWriter to eliminate the need for client code to do the sorting. The operation is done once and reused many times so it is more efficient. Update unit test to remove sorting. Also update expected output of affected tests. llvm-svn: 257145	2016-01-08 05:45:21 +00:00
Kyle Butt	bfcff3856a	Add call sequence start and end for __tls_get_addr This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839. For a PIC TLS variable access in a function, prologue (mflr followed by std and stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but no one saves/restores it. Also added a test for save/restore clobbered registers during calling __tls_get_addr. Patch by Tim Shen llvm-svn: 257137	2016-01-08 02:06:19 +00:00
Kyle Butt	a02ce98bd4	[Vectorization] Actually return from error case in isStridedPtr The early return seems to be missed. This causes a radical and wrong loop optimization on powerpc. It isn't reproducible on x86_64, because "UseInterleaved" is false. Patch by Tim Shen. llvm-svn: 257134	2016-01-08 01:55:13 +00:00
Sanjay Patel	d72a458d28	[InstCombine] insert a new shuffle in a safe place (PR25999) Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133	2016-01-08 01:39:16 +00:00
Eric Christopher	b793230797	Add some testing for thumb1 and thumb2 inline asm immediate constraints and fix a couple of bugs on inspection. Also fixes PR26061. llvm-svn: 257122	2016-01-08 00:34:44 +00:00
Mike Aizatsky	54a7c69a34	[llvm-symbolizer] Print out non-address lines verbatim. Differential Revision: http://reviews.llvm.org/D15876 llvm-svn: 257115	2016-01-07 23:57:41 +00:00
Aditya Nandakumar	f94c149f7f	Instructions to be redone only if from the same BB While adding instructions(possible roots) to be redone, make sure they are from the same basic block. llvm-svn: 257112	2016-01-07 23:22:55 +00:00
JF Bastien	b9ec4c6cea	WebAssembly: use .skip instead of .zero directive .zero is confusing when used with two arguments. Documentation: This directive emits SIZE 0-valued bytes. SIZE must be an absolute expression. This directive is actually an alias for the '.skip' directive so in can take an optional second argument of the value to store in the bytes instead of zero. Using '.zero' in this way would be confusing however. Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353 Hexagon and Sparc do the same, and it's all the same to WebAssembly so let's pick the less confusing of the two. llvm-svn: 257111	2016-01-07 23:18:29 +00:00
Keno Fischer	ea33a25816	Temporarily revert r257105 "[Verifier] Check that debug values have proper size" Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107	2016-01-07 22:39:11 +00:00
Keno Fischer	b3326be6ad	[Verifier] Check that debug values have proper size Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105	2016-01-07 22:18:37 +00:00
Dimitry Andric	2c36421337	Turn off lldb debug tuning by default for FreeBSD Summary: In rL242338, debugger tuning was introduced, and the tuning for FreeBSD was set to lldb by default. However, for the foreseeable future we still need to default to gdb tuning, since lldb is not ready for all of FreeBSD's architectures, and some system tools (like objcopy, etc) have not yet been adapted to cope with the lldb tuned format, which has .apple sections. Therefore, let FreeBSD use gdb by default for now. Reviewers: emaste, probinson Subscribers: llvm-commits, emaste Differential Revision: http://reviews.llvm.org/D15966 llvm-svn: 257103	2016-01-07 22:09:12 +00:00
David Majnemer	f1a9c9e148	[SCCP] Don't violate the lattice invariants We marked values which are 'undef' as constant instead of undefined which violates SCCP's invariants. If we can figure out that a computation results in 'undef', leave it in the undefined state. This fixes PR16052. llvm-svn: 257102	2016-01-07 21:36:16 +00:00
David Majnemer	867bbc775f	Add test for r256912 I forgot to add this with the rest of r256912. llvm-svn: 257088	2016-01-07 19:27:16 +00:00
David Majnemer	bae945735a	[SCCP] Can't go from overdefined to constant The fix for PR23999 made us mark loads of null as producing the constant undef which upsets the lattice. Instead, keep the load as "undefined". This fixes PR26044. llvm-svn: 257087	2016-01-07 19:25:39 +00:00
Derek Schuff	9bfea27c26	[WebAssembly] Support combining GEP and FrameIndex offsets in memory operand offset field Previously we only supported putting the FI into memory operand offset fields if there was nothing there already. Now combine them. Differential Revision: http://reviews.llvm.org/D15941 llvm-svn: 257084	2016-01-07 18:55:52 +00:00
Dan Gohman	a4730cf0b4	[WebAssembly] Use the default private label prefixes. The MC assembler doesn't like using the empty string as a private label prefix because then it treats all labels as private. This commit reverts back to the default prefix, which is .L, which is common in ELF targets and consistent with the LLVM name mangler. llvm-svn: 257083	2016-01-07 18:49:53 +00:00
Nicolai Haehnle	82fc962c20	AMDGPU/SI: Fold operands with sub-registers Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074	2016-01-07 17:10:29 +00:00
Nicolai Haehnle	3c05d6d3b5	AMDGPU/SI: xnack_mask is always reserved on VI Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073	2016-01-07 17:10:20 +00:00
Michael Zuckerman	5b1cad87aa	[avx512] Fix test avx512bw-intrinsics.ll Change the CHECK lablel into AVX512BW And fix declare lable of llvm.x86.avx512.mask.psrav32_hi llvm-svn: 257071	2016-01-07 16:25:42 +00:00
Michael Zuckerman	3aca221b31	[AVX512] add PSLLW and PSLLV Intrinsic Differential Revision: http://reviews.llvm.org/D15889 llvm-svn: 257070	2016-01-07 16:02:51 +00:00
Silviu Baranga	dd68d46ec1	Revert r257064. It caused failures in some sanitizer tests. llvm-svn: 257069	2016-01-07 15:46:43 +00:00
Nico Weber	4324b9b236	Revert r257055, it caused PR26064. llvm-svn: 257066	2016-01-07 15:01:46 +00:00
Silviu Baranga	57b1b90996	[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257064	2016-01-07 14:56:08 +00:00
Michael Zuckerman	354152d590	[AVX512] add PSRAV Intrinsic Differential Revision: http://reviews.llvm.org/D15856 llvm-svn: 257063	2016-01-07 14:42:20 +00:00
Amjad Aboud	d7cfb48485	Added support for macro emission in dwarf (supporting DWARF version 4). Differential Revision: http://reviews.llvm.org/D15495 llvm-svn: 257060	2016-01-07 14:28:20 +00:00
James Molloy	9971a6841c	[GlobalsAA] Partially back out r248576 See PR25822 for a more full summary, but we were conflating the concepts of "capture" and "escape". We were proving nocapture and using that proof to infer noescape, which is not true. Escaped-ness is a function-local property - as soon as a value is used in a call argument it escapes. Capturedness is a related but distinct property. It implies a temporally limited escape. Consider: static int a; int b; int g(int * nocapture arg); int f() { a = 2; // Even though a escapes to g, it is not captured so can be treated as non-escaping here. g(&a); // But here it must be treated as escaping. g(&b); // Now that g(&a) has returned we know it was not captured so we can treat it as non-escaping again. } The original commit did not sufficiently understand this nuance and so caused PR25822 and PR26046. r248576 included both a performance improvement (which has been backed out) and a related conformance fix (which has been kept along with its testcase). llvm-svn: 257058	2016-01-07 13:33:28 +00:00
Michael Zuckerman	a6df006b50	[AVX512] add PSHUFHW and PSHUFLW Intrinsic Differential Revision: http://reviews.llvm.org/D15925 llvm-svn: 257056	2016-01-07 12:35:43 +00:00
Simon Pilgrim	bcc11a059e	[X86][AVX] Match broadcast loads through a bitcast AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur. Follow up to D15310 llvm-svn: 257055	2016-01-07 11:34:27 +00:00
Simon Pilgrim	83e44c66ae	[X86][SSE} Add INSERTPS as a target shuffle Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS. llvm-svn: 257046	2016-01-07 10:24:19 +00:00
Michael Zuckerman	4a1566827d	[AVX512] add PSHUFD Intrinsic Differential Revision: http://reviews.llvm.org/D15934 llvm-svn: 257044	2016-01-07 09:24:12 +00:00
Tim Northover	bd41cf880c	ARM: support TLS accesses on Darwin platforms Darwin TLS accesses most closely resemble ELF's general-dynamic situation, since they have to be able to handle all possible situations. The descriptors and so on are obviously slightly different though. llvm-svn: 257039	2016-01-07 09:03:03 +00:00
NAKAMURA Takumi	7e887bd80d	llvm/test/CodeGen/X86/statepoint-vector.ll REQUIRES asserts due to a debug option. llvm-svn: 257031	2016-01-07 05:40:37 +00:00
Philip Reames	cffc628ca1	One more attempt at stablizing a test on all platforms. llvm-svn: 257026	2016-01-07 04:20:52 +00:00
Philip Reames	afdbcc6a84	[Statepoints] Add test cases around vectors and stablize test Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them. Add a couple of tests which show this working. Also, add a triple to the same test file to hopefully fix a failing bot. It turns out we do han llvm-svn: 257025	2016-01-07 04:15:31 +00:00
Haicheng Wu	08b9462540	[AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path Allow fadd/fmul to be reassociated in aarch64. llvm-svn: 257024	2016-01-07 04:01:02 +00:00
Philip Reames	3e2cf5320c	[Statepoints] Initial support for relocating vectors of pointers Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong. The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate. Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases. Differential Revision: http://reviews.llvm.org/D15632 llvm-svn: 257022	2016-01-07 03:32:11 +00:00
Dan Gohman	0c6f5ac50a	[WebAssembly] Add -m:e to the target triple. This enables ELF-style name mangling, which primarily means using ".L" for private symbols. llvm-svn: 257020	2016-01-07 03:19:23 +00:00
Ahmed Bougacha	a7324a2823	[Linker] Also treat a DIImportedEntity scope DISubprogram as needed. Follow-up to r257000: DIImportedEntity can reach a DISubprogram via its entity, but also via its scope. Handle the latter case as well. PR26037. llvm-svn: 257019	2016-01-07 03:14:59 +00:00
Quentin Colombet	9ed52e9a9e	[ShrinkWrapping] Give up on irreducible CFGs. We need to know whether or not a given basic block is in a loop for the analysis to be correct. Loop information may be incomplete on irreducible CFGs, therefore we may generate incorrect code if we use it in those situations. This fixes PR25988. llvm-svn: 257012	2016-01-07 01:23:49 +00:00
Teresa Johnson	b951558294	Always treat DISubprogram reached by DIImportedEntity as needed. It is illegal to have a null entity in a DIImportedEntity, so we must link in a DISubprogram metadata node referenced by one, even if the associated function is not linked in or inlined anywhere. Fixes PR26037. llvm-svn: 257000	2016-01-07 00:06:27 +00:00
Mehdi Amini	0535003bef	Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position This is a conservative fix, I expect Amaury to relax this. Follow-up for r256923 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 256999	2016-01-06 23:50:22 +00:00
Vedant Kumar	e83538bcf9	[Bitcode] Remove superflous compatibility tests With r256990, bogner introduced comprehensive tests for constant arrays and vectors. We no longer need the existing ones because they are redundant. llvm-svn: 256991	2016-01-06 23:22:38 +00:00
Justin Bogner	d99e71833c	Bitcode: Move these tests into compatibility.ll I added a couple of tests in r256982, but vedantk suggested that they fit better into compatibility.ll, since they could catch format breaks later on there. llvm-svn: 256990	2016-01-06 23:16:37 +00:00
Weiming Zhao	0f1762caf9	Recommit r256952 "Filtering IR printing for print-after-all/print-before-all" Fix lit test fail due to outputting an extra line. Differential Revision: http://reviews.llvm.org/D15776 llvm-svn: 256987	2016-01-06 22:55:03 +00:00
Justin Bogner	a43eacbf9e	Bitcode: Fix reading and writing of ConstantDataVectors of halfs In r254991 I allowed ConstantDataVectors to contain elements of HalfTy, but I missed updating the bitcode reader and writer to handle this, so now we crash if we try to emit bitcode on programs that have constant vectors of half. This fixes the issue and adds test coverage for reading and writing constant sequences in bitcode. llvm-svn: 256982	2016-01-06 22:31:32 +00:00
Nicolai Haehnle	a61e5a8d4e	AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader Summary: This is admittedly something that you could only run into by manually playing around with shader assembly because the SITypeWriter pass is skipped for compute. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15902 llvm-svn: 256980	2016-01-06 22:01:04 +00:00
Chen Li	78bde83003	[SplitLandingPadPredecessors] Create a PHINode for the original landingpad only if it has some uses Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15835 llvm-svn: 256972	2016-01-06 20:32:05 +00:00
Amaury Sechet	3235c08253	Promote aggregate store to memset when possible Summary: As per title. This will allow the optimizer to pick up on it. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15923 llvm-svn: 256969	2016-01-06 19:47:24 +00:00
Sanjay Patel	cddcd7256c	[LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform llvm-svn: 256964	2016-01-06 19:23:35 +00:00
Quentin Colombet	eb61e8e6b0	[X86] Correctly model TLS calls w.r.t. frame requirements. TLS calls need the stack frame to be properly set up and this implies that such calls need ADJUSTSTACK_xxx markers. Fixes PR25820. llvm-svn: 256959	2016-01-06 19:09:26 +00:00
Nico Weber	891419adc2	Make WinCOFFObjectWriter.cpp's timestamp writing not use ENABLE_TIMESTAMPS LLVM_ENABLE_TIMESTAMPS controls if timestamps are embedded into llvm's binaries. Turning it off is useful for deterministic builds. r246905 made it so that the define suddenly also controls if the binaries that the llvm binaries _create_ embed timestamps or not – but this shouldn't be a configure-time option. r256203/r256204 added a driver option to toggle this on and off, so this patch now passes this driver option in LLVM_ENABLE_TIMESTAMPS builds so that if LLVM_ENABLE_TIMESTAMPS is set, the build of LLVM is deterministic – but the built clang can still write timestamps into other executables when requested. This also allows removing some of the test machinery added in r292012 to work around this problem. See PR24740 for background. http://reviews.llvm.org/D15783 llvm-svn: 256958	2016-01-06 19:05:19 +00:00
Michael Kuperstein	037c9984db	[ShrinkWrap] Fix FindIDom to only have one kind of failure. FindIDom() can fail in two different ways - it can either return nullptr or the block itself, depending on the circumstances. Some users of FindIDom() check one error condition, while others check the other. Change it to always return nullptr on failure. This fixes PR26004. Differential Revision: http://reviews.llvm.org/D15847 llvm-svn: 256955	2016-01-06 18:40:11 +00:00
Weiming Zhao	b243c95c6a	Revert r256952 due to lit test fails. llvm-svn: 256954	2016-01-06 18:31:44 +00:00
Dan Gohman	8f59cf756f	[WebAssembly] Don't use range-based loop for a list that's being modified The first instruction in a block is what the rend() iterator points to, so if it moves, we need to re-evaluate rend() so that we continue to iterate through the rest of the instructions. llvm-svn: 256953	2016-01-06 18:29:35 +00:00
Weiming Zhao	eac0636805	Filtering IR printing for print-after-all/print-before-all Summary: This patch implements "-print-funcs" option to support function filtering for IR printing like -print-after-all, -print-before etc. Examples: -print-after-all -print-funcs=foo,bar Reviewers: mcrosier, joker.eph Subscribers: tejohnson, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D15776 llvm-svn: 256952	2016-01-06 18:20:25 +00:00
Geoff Berry	12fe2279f3	ScheduleDAGInstrs: Bug fix for missed memory dependency. Summary: In buildSchedGraph(), when adding memory dependencies for loads, move the call to adjustChainDeps() after the call to addChainDependency(AliasChain) to handle the case where addChainDependency(AliasChain) ends up not adding a dependency and instead putting the SU on the RejectMemNodes list. The call to adjustChainDeps() must be done after the call to addChainDependency() in order to process the SU added to the RejectMemNodes list to create memory dependencies for it. Reviewers: hfinkel, atrick, jonpa, resistor Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15927 llvm-svn: 256950	2016-01-06 18:14:26 +00:00
Dan Gohman	c04ccb66eb	[WebAssembly] Add -asm-verbose=false to llc tests. In general, disabling comments in the output reduces the chances of a CHECK line accidentally matching a comment instead of its intended text. llvm-svn: 256946	2016-01-06 16:45:05 +00:00
Amaury Sechet	457cc4db9e	Revert "GlobalsAA: Take advantage of ArgMemOnly, InaccessibleMemOnly and InaccessibleMemOrArgMemOnly attributes" Summary: This reverts commit 5a9e526f29cf8510ab5c3d566fbdcf47ac24e1e9. As per discussion in D15665 This also add a test case so that regression introduced by that diff are not reintroduced. Reviewers: vaivaswatha, jmolloy, hfinkel, reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15919 llvm-svn: 256932	2016-01-06 13:23:52 +00:00
Artyom Skrobov	51f2d11be9	PR25754: avoid generating UDIVREM8_ZEXT_HREG nodes with i64 result Reviewers: spatel, srking Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15331 llvm-svn: 256924	2016-01-06 09:41:10 +00:00
Amaury Sechet	d3b2c0fd94	Improve load/store to memcpy for aggregate Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15903 llvm-svn: 256923	2016-01-06 09:30:39 +00:00
Simon Pilgrim	267163e713	[X86][SSE] There is no zmm addsubpd/addsubps instruction. Replace the assert in combineShuffleToAddSub with an early out. llvm-svn: 256922	2016-01-06 09:08:49 +00:00
Philip Reames	ae050a5703	[BasicAA] Remove special casing of memset_pattern16 in favor of generic attribute inference Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are: - We don't yet have a writeonly attribute for the first argument. - We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp. Differential Revision: http://reviews.llvm.org/D15879 llvm-svn: 256911	2016-01-06 04:53:16 +00:00
Dan Gohman	797f639e79	[SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store offsets. In an inbounds getelementptr, when an index produces a constant non-negative offset to add to the base, the add can be assumed to not have unsigned overflow. This relies on the assumption that addresses can't occupy more than half the address space, which isn't possible in C because it wouldn't be possible to represent the difference between the start of the object and one-past-the-end in a ptrdiff_t. Setting the NoUnsignedWrap flag is theoretically useful in general, and is specifically useful to the WebAssembly backend, since it permits stronger constant offset folding. Differential Revision: http://reviews.llvm.org/D15544 llvm-svn: 256890	2016-01-06 00:43:06 +00:00
Manuel Jacob	3eedd11329	[Statepoints] Check for the "gc-leaf-function" attribute on call sites as well. Reviewers: sanjoy, reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15900 llvm-svn: 256875	2016-01-05 23:59:08 +00:00
Sanjay Patel	29095ea1b0	[LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms llvm-svn: 256871	2016-01-05 20:46:19 +00:00
Nicolai Haehnle	6035504ab3	AMDGPU/SI: Do not move scratch resource register on Tonga & Iceland Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870	2016-01-05 20:42:49 +00:00
Amaury Sechet	a0c242cdfd	Implement load to store => memcpy in MemCpyOpt for aggregates Summary: Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations. This is one such opportunity. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15894 llvm-svn: 256868	2016-01-05 20:17:48 +00:00
Manuel Jacob	68b753a4fb	Correct my last commit (revision 256860). I forgot to save a small wording improvement before committing. llvm-svn: 256862	2016-01-05 19:45:54 +00:00
Manuel Jacob	b8060cd88a	[PlaceSafepoints] Add a test. Calls of functions with the "gc-leaf-function" attribute shouldn't be turned into a safepoint. llvm-svn: 256860	2016-01-05 19:40:58 +00:00
Sanjay Patel	a1c5347982	[InstCombine] insert a new shuffle before its uses (PR26015) Although this solves the test case in PR26015: https://llvm.org/bugs/show_bug.cgi?id=26015 And may solve PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 ...I suspect this is not the best solution. I think we want to insert the new shuffle just ahead of the earliest ExtractElementInst that we're replacing, but I don't know how that should be implemented. Differential Revision: http://reviews.llvm.org/D15878 llvm-svn: 256857	2016-01-05 19:09:47 +00:00
Michael Zuckerman	5cbae95916	[AVX512] add PSLLD and PSLLQ Intrinsic Differential Revision: http://reviews.llvm.org/D15885 llvm-svn: 256840	2016-01-05 15:17:39 +00:00
MinSeong Kim	a7385ebf78	[AArch64] Add support for Samsung Exynos-M1 Adds core tuning support for new Samsung Exynos-M1 core (ARMv8-A). Differential Revision: http://reviews.llvm.org/D15663 llvm-svn: 256828	2016-01-05 12:51:59 +00:00
David Majnemer	59eb733af1	[SimplifyCFG] Further improve our ability to remove redundant catchpads In r256814, we managed to remove catchpads which were trivially redudant because they were the same SSA value. We can do better using the same algorithm but with a smarter datastructure by hashing the SSA values within the catchpad and comparing them structurally. llvm-svn: 256815	2016-01-05 07:42:17 +00:00
David Majnemer	2fa8651a8f	[SimplifyCFG] Remove redundant catchpads Remove duplicate catchpad handlers from a catchswitch. llvm-svn: 256814	2016-01-05 06:27:50 +00:00
Tom Stellard	5cd09ade38	AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions for HSA Summary: This fixes a regression caused by r256282. Reviewers: arsenm, cfang Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15736 llvm-svn: 256810	2016-01-05 03:40:16 +00:00
Joseph Tremoulet	0d808888c1	[WinEH] Simplify unreachable catchpads Summary: At least for CoreCLR, a catchpad which immediately executes an `unreachable` instruction indicates that the exception can never have a matching type, and so such catchpads can be removed, and so can their catchswitches if the catchswitch becomes empty. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15846 llvm-svn: 256809	2016-01-05 02:37:41 +00:00
David Majnemer	869be0a4a6	Revert "[X86] Use push-pop for materializing small constants under 'minsize'" The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808	2016-01-05 02:32:06 +00:00
Matthias Braun	d9fe082ba7	X86: Add a testcase for PR25951 llvm-svn: 256801	2016-01-05 00:48:16 +00:00
Matthias Braun	7e762e4f9c	MachineInstrBundle: Fix reversed isSuperRegisterEq() call Unfortunately this fix had the effect of exposing the -verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for which I disabled it for now. Two testcases also have additional pushq/popq where the corrected code cannot prove that %rax is dead any longer. Looking at the examples, this could potentially be fixed by improving computeRegisterLiveness() to check the live-in lists of the successors blocks when reaching the end of a block. This fixes http://llvm.org/PR25951. llvm-svn: 256799	2016-01-05 00:45:35 +00:00
Nicolai Haehnle	5b50497617	AMDGPU: add +xnack feature Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794	2016-01-04 23:35:53 +00:00
Chen Li	c6021038f6	[InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction. Reviewers: reames, majnemer Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15859 llvm-svn: 256792	2016-01-04 23:28:57 +00:00
David Majnemer	b33f3a239a	[LICM] Fix a small oversight introduced in r256763 r256763 had promoteLoopAccessesToScalars check for the existence of a catchswitch when the exit blocks were populated but promoteLoopAccessesToScalars may be called with a prepopulated set of exit blocks which would also need to be checked. This fixes PR26019. llvm-svn: 256788	2016-01-04 23:16:22 +00:00
Philip Reames	2466719e44	[MemoryBuiltins] Remove isOperatorNewLike by consolidating non-null inference handling This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have. Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site. Differential Revision: http://reviews.llvm.org/D15820 llvm-svn: 256787	2016-01-04 22:49:23 +00:00
Simon Pilgrim	e6955f3211	[X86][SSE] Ensure BLENDPD/BLENDPS/PBLEND inputs are both of the correct input type llvm-svn: 256782	2016-01-04 21:41:11 +00:00
Aditya Nandakumar	12d060481a	Remove dead instructions before Redoing Before reevaluating instructions, iterate over all instructions to be reevaluated and remove trivially dead instructions and if any of it's operands become trivially dead, mark it for deletion until all trivially dead instructions have been removed llvm-svn: 256773	2016-01-04 19:48:14 +00:00
Geoff Berry	9e934b0cc2	[AArch64] Optimize some simple TBZ/TBNZ cases. Summary: Add some AArch64 dag combines to optimize some simple TBZ/TBNZ cases: (tbz (and x, m), b) -> (tbz x, b) (tbz (shl x, c), b) -> (tbz x, b-c) (tbz (shr x, c), b) -> (tbz x, b+c) (tbz (xor x, -1), b) -> (tbnz x, b) Reviewers: jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D15702 llvm-svn: 256765	2016-01-04 18:55:47 +00:00
David Majnemer	219055f9df	[LICM] Don't insert instructions after a catchswitch when performing loop promotion Inserting after a catchswitch results in verifier errors, bail out on promotion if a catchswitch is a loop exit. llvm-svn: 256763	2016-01-04 17:42:19 +00:00
Joseph Tremoulet	52f729a613	[WinEH] Update CoreCLR EH state numbering Summary: Fix the CLR state numbering to generate correct tables, and update the lit test to verify them. The CLR numbering assigns one state number to each catchpad and cleanuppad. It also computes two tree-like relations over states: 1) Each state has a "HandlerParentState", which is the state of the next outer handler enclosing this state's handler (same as nearest ancestor per the ParentPad linkage on EH pads, but skipping over catchswitches). 2) Each state has a "TryParentState", which: a) for a catchpad that's not the last handler on its catchswitch, is the state of the next catchpad on that catchswitch. b) for all other pads, is the state of the pad whose try region is the next outer try region enclosing this state's try region. The "try regions are not present as such in the IR, but will be inferred based on the placement of invokes and pads which reach each other by exceptional exits. Catchswitches do not get their own states, but each gets mapped to the state of its first catchpad. Table generation requires each state's "unwind dest" state to have a lower state number than the given state. Since HandlerParentState can be computed as a function of a pad's ParentPad, and TryParentState can be computed as a function of its unwind dest and the TryParentStates of its children, the CLR state numbering algorithm first computes HandlerParentState in a top-down pass, then computes TryParentState in a bottom-up pass. Also reword some comments/names in the CLR EH table generation to make the distinction between the different kinds of "parent" clear. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: AndyAyers, llvm-commits Differential Revision: http://reviews.llvm.org/D15325 llvm-svn: 256760	2016-01-04 16:16:01 +00:00
Michael Zuckerman	cf0b6db9ef	[AVX512] add PSRAD and PSRAQ Intrinsic Differential Revision: http://reviews.llvm.org/D15851 llvm-svn: 256754	2016-01-04 13:45:45 +00:00
Michael Zuckerman	000fca44a8	[AVX512] add PSRAW Intrinsic Differential Revision: http://reviews.llvm.org/D15850 llvm-svn: 256751	2016-01-04 12:50:36 +00:00
Michael Zuckerman	068bc2f219	[AVX512] add PSRLV Intrinsic Differential Revision: http://reviews.llvm.org/D15838 llvm-svn: 256747	2016-01-04 11:39:06 +00:00
David Majnemer	42a0730c42	[LICM] Make instruction sinking funclet-aware We had two bugs here: - We might try to sink into a catchswitch, causing verifier failures. - We will succeed in sinking into a cleanuppad but we didn't update the funclet operand bundle. This fixes PR26000. llvm-svn: 256728	2016-01-04 03:37:39 +00:00
Dimitry Andric	17f3b0957b	Fix one file that I didn't convert properly in r256707. llvm-svn: 256720	2016-01-03 22:33:32 +00:00
Simon Pilgrim	6e69cbe342	[X86][MMX] Regenerated vector insertion test. Shows the true horror of what is going on.... llvm-svn: 256713	2016-01-03 19:17:37 +00:00
Simon Pilgrim	3ac854931f	[X86][SSE] Added tests for insertion of zero elements into vectors Many of these could be much better if we just lowered them all as shuffles - especially for the 256-bit vectors. llvm-svn: 256708	2016-01-03 17:33:32 +00:00
Dimitry Andric	227b928abc	Fix several accidental DOS line endings in source files Summary: There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings. There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those. Reviewers: joerg, aaron.ballman Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15848 llvm-svn: 256707	2016-01-03 17:22:03 +00:00
Simon Pilgrim	569106fe99	[X86][SSE41] Added test cases for improving insertps shuffles As mentioned on D14261, an upcoming patch will improve combines of insertps instructions. llvm-svn: 256706	2016-01-03 17:14:15 +00:00
Simon Pilgrim	d17a1df783	[X86][SSE] Added v4f32 shuffle with zero tests This is mainly test cases for improvements to insertps matching, but pre-SSE41 shuffles could be improved as well llvm-svn: 256705	2016-01-03 17:02:56 +00:00
Joseph Tremoulet	131a462690	[WinEH] Verify catchswitch handlers Summary: The handler list must be nonempty and consist solely of CatchPads. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15842 llvm-svn: 256691	2016-01-02 15:25:25 +00:00
Joseph Tremoulet	06125e52a7	[WinEH] Tighten parentPad verifier checks Summary: A catchswitch cannot be a parent of a cleanuppad or another catchswitch. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15841 llvm-svn: 256690	2016-01-02 15:24:24 +00:00
Joseph Tremoulet	71e5676de4	[WinEH] Update catchrets with cloned successors Summary: Add a pass to update catchrets when their successors get cloned; the existing pass doesn't catch these because it walks the funclet whose blocks are being cloned but the catchret is in a child funclet. Also update the test for removing incoming PHI values; when the predecessor is a catchret, the relevant color is the catchret's parentPad, not its block's color. Reviewers: andrew.w.kaylor, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15840 llvm-svn: 256689	2016-01-02 15:22:36 +00:00
David Majnemer	011980cd50	[X86] Add intrinsics for reading and writing to the flags register LLVM's targets need to know if stack pointer adjustments occur after the prologue. This is needed to correctly determine if the red-zone is appropriate to use or if a frame pointer is required. Normally, LLVM can figure this out very precisely by reasoning about the contents of the MachineFunction. There is an interesting corner case: inline assembly. The vast majority of inline assembly which will perform a push or pop is done so to pair up with pushf or popf as appropriate. Unfortunately, this inline assembly doesn't mark the stack pointer as clobbered because, well, it isn't. The stack pointer is decremented and then immediately incremented. Because of this, LLVM was changed in r256456 to conservatively assume that inline assembly contain a sequence of stack operations. This is unfortunate because the vast majority of inline assembly will not end up manipulating the stack pointer in any way at all. Instead, let's provide a more principled solution: an intrinsic. FWIW, other compilers (MSVC and GCC among them) also provide this functionality as an intrinsic. llvm-svn: 256685	2016-01-01 06:50:01 +00:00
Sanjay Patel	bee05caa6b	[LibCallSimplifier] propagate FMF when shrinking binary calls llvm-svn: 256682	2015-12-31 23:40:59 +00:00
Sanjay Patel	aa23114cb4	[LibCallSimplifier] propagate FMF when shrinking unary calls llvm-svn: 256679	2015-12-31 21:52:31 +00:00
Sanjay Patel	f6f32bcaa4	change function names to avoid accidentally matching the substring llvm-svn: 256678	2015-12-31 21:25:25 +00:00
Sanjay Patel	4e8b300400	add 'fast' attribute to calls to show that the flag isn't being propagated llvm-svn: 256677	2015-12-31 21:12:19 +00:00
Michael Zuckerman	0dc468880d	[AVX512] add PSRLQ and PSRLD Intrinsic Differential Revision: http://reviews.llvm.org/D15770 llvm-svn: 256673	2015-12-31 15:22:04 +00:00
Michael Kuperstein	d36e24a166	[X86] Avoid folding scalar loads into unary sse intrinsics Not folding these cases tends to avoid partial register updates: sqrtss (%eax), %xmm0 Has a partial update of %xmm0, while movss (%eax), %xmm0 sqrtss %xmm0, %xmm0 Has a clobber of the high lanes immediately before the partial update, avoiding a potential stall. Given this, we only want to fold when optimizing for size. This is consistent with the patterns we already have for some of the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl() Differential Revision: http://reviews.llvm.org/D15741 llvm-svn: 256671	2015-12-31 09:45:16 +00:00
Asaf Badouh	af6569afd2	[X86][PKU] Add {RD,WR}PKRU intrinsics Differential Revision: http://reviews.llvm.org/D15808 llvm-svn: 256670	2015-12-31 08:31:13 +00:00
Sanjay Patel	41160c2094	[ValueTracking] fix bug computing isKnownToBeAPowerOfTwo() with arithmetic shift right (PR25900) This is a fix for: https://llvm.org/bugs/show_bug.cgi?id=25900 If we think that an arithmetic right shift of a power of two is always a power of two, an sdiv gets wrongly converted to udiv. Differential Revision: http://reviews.llvm.org/D15827 llvm-svn: 256655	2015-12-30 22:40:52 +00:00
Geoff Berry	43dc285915	[JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost() The code that was meant to adjust the duplication cost based on the terminator opcode was not being executed in cases where the initial threshold was hit inside the loop. Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15536 llvm-svn: 256568	2015-12-29 18:10:16 +00:00
Michael Zuckerman	80821ee77c	[AVX512] add PSRLW Intrinsic Differential Revision: http://reviews.llvm.org/D15751 llvm-svn: 256558	2015-12-29 13:04:35 +00:00
James Y Knight	992904a0af	Fix gold test after r256465. That commit added a new pass, and this test is sensitive to what the first pass after verify is called. llvm-svn: 256532	2015-12-29 03:48:37 +00:00
Eric Christopher	2ae7180b24	Accept dwarf version 5 for CIE versions. llvm-svn: 256527	2015-12-28 23:02:42 +00:00
Artyom Skrobov	2aca0c622a	[Thumb] Fix assembler error 'cannot honor width suffix pop {lr}' Summary: * avoid generating POP {LR} in Thumb1 epilogues * combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass * combine POP {LR} + B + BX LR -> POP {PC} on v5T+ Test cases by Ana Pazos Differential Revision: http://reviews.llvm.org/D15707 llvm-svn: 256523	2015-12-28 21:40:45 +00:00
Sanjay Patel	b3c53e512f	[x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 http://reviews.llvm.org/rL256510 llvm-svn: 256522	2015-12-28 21:16:55 +00:00
Manuel Jacob	9db5b93ffc	[RS4GC] Fix rematerialization of bitcast of bitcast. Summary: Previously, only the outer (last) bitcast was rematerialized, resulting in a use of the unrelocated inner (first) bitcast after the statepoint. See the test case for an example. Reviewers: igor-laevsky, reames Subscribers: reames, alex, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15789 llvm-svn: 256520	2015-12-28 20:14:05 +00:00
Elena Demikhovsky	5494698828	Implemented cost model for masked gather and scatter operations The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519	2015-12-28 20:10:59 +00:00
Sanjay Patel	9da2b647c7	[x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 llvm-svn: 256510	2015-12-28 19:20:19 +00:00
Sanjay Patel	e0696ef613	Specify triple so 'make check' passes on darwin x86-64 The check lines were added with: http://reviews.llvm.org/rL256458 http://reviews.llvm.org/rL256460 but on a darwin target, the output looks like: ## InlineAsm Start rorq %rdi ## InlineAsm End ## InlineAsm Start rorq %rsi ## InlineAsm End leaq (%rsi,%rdi), %rax retq llvm-svn: 256507	2015-12-28 18:28:44 +00:00
Roman Divacky	73fc84761f	Support clrex instruction on ARMv6k. Patch by Andrew Turner. llvm-svn: 256505	2015-12-28 17:47:23 +00:00
Michael Kuperstein	2ea81baf3a	[X86] Better support for the MCU psABI (LLVM part) This adds support for the MCU psABI in a way different from r251223 and r251224, basically reverting most of these two patches. The problem with the approach taken in r251223/4 is that it only handled libcalls that originated from the backend. However, the mid-end also inserts quite a few libcalls and assumes these use the platform's default calling convention. The previous patch tried to insert inregs when necessary both in the FE and, somewhat hackily, in the CG. Instead, we now define a new default calling convention for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does. Differential Revision: http://reviews.llvm.org/D15054 llvm-svn: 256494	2015-12-28 14:39:21 +00:00
Asaf Badouh	fba562004b	[X86][AVX512] Lower broadcast sub vector to vector inrtrinsics lower broadcast<type>x<vector> to shuffles. there are two cases: 1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0. 2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op). Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256490	2015-12-28 08:26:26 +00:00
Asaf Badouh	5546f51011	[X86][AVX512] add fp scalar broadcast intrinsics Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256489	2015-12-28 08:09:25 +00:00
Craig Topper	c648c9b92d	[AVX512] Bring vmovq instructions names into alignment with the AVX and SSE names. Add a missing encoding to disassembler and assembler. I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there. llvm-svn: 256483	2015-12-28 06:11:42 +00:00
Igor Breger	756c289dd8	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE. Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470	2015-12-27 13:56:16 +00:00
Chandler Carruth	3a040e6d47	[attrs] Extract the pure inference of function attributes into a standalone pass. There is no call graph or even interesting analysis for this part of function attributes -- it is literally inferring attributes based on the target library identification. As such, we can do it using a much simpler module pass that just walks the declarations. This can also happen much earlier in the pass pipeline which has benefits for any number of other passes. In the process, I've cleaned up one particular aspect of the logic which was necessary in order to separate the two passes cleanly. It now counts inferred attributes independently rather than just counting all the inferred attributes as one, and the counts are more clearly explained. The two test cases we had for this code path are both ... woefully inadequate and copies of each other. I've kept the superset test and updated it. We need more testing here, but I had to pick somewhere to stop fixing everything broken I saw here. Differential Revision: http://reviews.llvm.org/D15676 llvm-svn: 256466	2015-12-27 08:41:34 +00:00
Chandler Carruth	f49f1a87ef	[attrs] Split off the forced attributes utility into its own pass that is (by default) run much earlier than FuncitonAttrs proper. This allows forcing optnone or other widely impactful attributes. It is also a bit simpler as the force attribute behavior needs no specific iteration order. I've added the pass into the default module pass pipeline and LTO pass pipeline which mirrors where function attrs itself was being run. Differential Revision: http://reviews.llvm.org/D15668 llvm-svn: 256465	2015-12-27 08:13:45 +00:00
David Majnemer	2f2625c056	Make the test properly constrained llvm-svn: 256460	2015-12-27 06:26:41 +00:00
David Majnemer	0b3661b7ce	Try to passify buildbot llvm-svn: 256458	2015-12-27 06:18:48 +00:00
NAKAMURA Takumi	c86e9b79b8	Prune the feature "tls". No one is using it since TLS is enabled for Cygwin. llvm-svn: 256457	2015-12-27 06:14:33 +00:00
David Majnemer	334676355a	[X86, Win64] Use a frame pointer if pushf is emitted A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456	2015-12-27 06:07:26 +00:00
David Majnemer	081e8fe4c0	[WinEH] Add comments explaining the EH tables This is aids in debugging WinEH, similar functionality is present for DWARF EH. llvm-svn: 256455	2015-12-27 06:07:12 +00:00
Sanjay Patel	bcff3f7d92	[x86] lower calls to llvm.maxnum.v4f32 using maxps This is a follow-on to: http://reviews.llvm.org/rL255700 llvm-svn: 256454	2015-12-26 21:44:55 +00:00
Benjamin Kramer	13dfb7df32	Fix safepoint intrinsic signatures in test. Should bring back the bots after r256443. llvm-svn: 256450	2015-12-26 11:40:48 +00:00
Chen Li	d71999ef1b	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443	2015-12-26 07:54:32 +00:00
Craig Topper	afefb1f13b	Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct." llvm-svn: 256435	2015-12-26 04:58:05 +00:00
Craig Topper	ff04fb0487	Revert r256432 "Test" This is the test case for r256433, but it got committed incorrectly in my local repo. llvm-svn: 256434	2015-12-26 04:56:51 +00:00
Craig Topper	23561f8215	Test llvm-svn: 256432	2015-12-26 04:50:01 +00:00
Dan Gohman	8887d1faed	[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify. Move RegStackify after coalescing and teach it to use LiveIntervals instead of depending on SSA form. This avoids a problem where a register in a COPY instruction is stackified and then subsequently coalesced with a register that is not stackified. This also puts it after the scheduler, which allows us to simplify the EXPR_STACK constraint, as we no longer have instructions being reordered after stackification and before coloring. llvm-svn: 256402	2015-12-25 00:31:02 +00:00
Sanjay Patel	ae945e7927	[InstCombine] transform more extract/insert pairs into shuffles (PR2109) This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394	2015-12-24 21:17:56 +00:00
Asaf Badouh	9a5a83a518	[X86][PKU] Add {RD,WR}PKRU encoding Differential Revision: http://reviews.llvm.org/D15711 llvm-svn: 256366	2015-12-24 08:25:00 +00:00
Elena Demikhovsky	9e225a2f52	AVX-512: Kreg set 0/1 optimization The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365	2015-12-24 08:12:22 +00:00
Igor Breger	268f6f53c5	AVX512: VPMOVM2B/W/D/Q intrinsic implementation. Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364	2015-12-24 07:11:53 +00:00

... 2 3 4 5 6 ...

33992 Commits