llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjoy Das	021de058df	Introduce a @llvm.experimental.guard intrinsic Summary: As discussed on llvm-dev[1]. This change adds the basic boilerplate code around having this intrinsic in LLVM: - Changes in Intrinsics.td, and the IR Verifier - A lowering pass to lower @llvm.experimental.guard to normal control flow - Inliner support [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18527 llvm-svn: 264976	2016-03-31 00:18:46 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Matt Arsenault	5cd4f8f89f	AMDGPU: Constant folding for frexp_mant llvm-svn: 264943	2016-03-30 22:28:26 +00:00
Peter Collingbourne	2bc252acd5	Cloning: Reduce complexity of debug info cloning and fix correctness issue. Commit r260791 contained an error in that it would introduce a cross-module reference in the old module. It also introduced O(N^2) complexity in the module cloner by requiring the entire module to be visited for each function. Fix both of these problems by avoiding use of the CloneDebugInfoMetadata function (which is only designed to do intra-module cloning) and cloning function-attached metadata in the same way that we clone all other metadata. Differential Revision: http://reviews.llvm.org/D18583 llvm-svn: 264935	2016-03-30 22:05:13 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
David Majnemer	5d518386b6	[IndVarSimplify] Don't insert after a catchswitch Widening a PHI requires us to insert a trunc. The logical place for this trunc is in the same BB as the PHI. This is not possible if the BB is terminated by a catchswitch. This fixes PR27133. llvm-svn: 264926	2016-03-30 21:12:06 +00:00
Justin Lebar	2fe1323112	[PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer. Summary: This gives callers flexibility to pass lambdas with captures, which lets callers avoid the C-style void*-ptr closure style. (Currently, callers in clang store state in the PassManagerBuilderBase arg.) No functional change, and the new API is backwards-compatible. Reviewers: chandlerc Subscribers: joker.eph, cfe-commits Differential Revision: http://reviews.llvm.org/D18613 llvm-svn: 264918	2016-03-30 20:39:29 +00:00
Hal Finkel	2e0ff2b244	[LoopVectorize] Don't vectorize loops when everything will be scalarized This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904	2016-03-30 19:37:08 +00:00
Rong Xu	b534166fd4	[PGO] PGOFuncName in LTO optimizations PGOFuncNames are used as the key to retrieve the Function definition from the MD5 stored in the profile. For internal linkage function, we prefix the source file name to the PGOFuncNames. LTO's internalization privatizes many global linkage symbols. This happens after value profile annotation, but those internal linkage functions should not have a source prefix. To differentiate compiler generated internal symbols from original ones, PGOFuncName meta data are created and attached to the original internal symbols in the value profile annotation step. If a symbol does not have the meta data, its original linkage must be non-internal. Also add a new map that maps PGOFuncName's MD5 value to the function definition. Differential Revision: http://reviews.llvm.org/D17895 llvm-svn: 264902	2016-03-30 18:37:52 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
George Burgess IV	49cad7d70b	[MemorySSA] Make the visitor more careful with calls. Prior to this patch, the MemorySSA caching visitor would cache all calls that it visited. When paired with phi optimization, this can be problematic. Consider: define void @foo() { ; 1 = MemoryDef(liveOnEntry) call void @clobberFunction() br i1 undef, label %if.end, label %if.then if.then: ; MemoryUse(??) call void @readOnlyFunction() ; 2 = MemoryDef(1) call void @clobberFunction() br label %if.end if.end: ; 3 = MemoryPhi(...) ; MemoryUse(?) call void @readOnlyFunction() ret void } When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to cache them later. We ultimately end up not being able to optimize passed the Phi, so we set MemoryUse(?) to point to the Phi. We then cache the clobbering call for def 1 to be the Phi. This commit changes this behavior so that we wipe out any calls added to VisistedCalls while visiting the defs of a phi we couldn't optimize. Aside: With this patch, we now can bootstrap clang/LLVM without a single MemorySSA verifier failure. Woohoo. :) llvm-svn: 264820	2016-03-30 03:12:08 +00:00
Xinliang David Li	a55fd1a9dc	[PGO] Handle invoke inst in IR based icall instrumentation Differential Revision: http://reviews.llvm.org/D18580 llvm-svn: 264818	2016-03-30 02:16:07 +00:00
George Burgess IV	82ee942a8c	[MemorySSA] Change how the walker views/walks visited phis. This patch teaches the caching MemorySSA walker a few things: 1. Not to walk Phis we've walked before. It seems that we tried to do this before, but it didn't work so well in cases like: define void @foo() { %1 = alloca i8 %2 = alloca i8 br label %begin begin: ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2}) ; 1 = MemoryDef(3) store i8 0, i8* %2 br label %end end: ; MemoryUse(?) load i8, i8* %1 ; 2 = MemoryDef(1) store i8 0, i8* %2 br label %begin } Because we wouldn't put Phis in Q.Visited until we tried to visit them. So, when trying to optimize MemoryUse(?): - We would visit 3 above - ...Which would make us put {%0,liveOnEntry} in Q.Visited - ...Which would make us visit {%0,liveOnEntry} - ...Which would make us put {%end,2} in Q.Visited - ...Which would make us visit {%end,2} - ...Which would make us visit 3 - ...Which would realize we've already visited everything in 3 - ...Which would make us conservatively return 3. In the added test-case, (@looped_visitedonlyonce) this behavior would cause us to give incorrect results. Specifically, we'd visit 4 twice in the same query, but on the second visit, we'd skip while.cond because it had been visited, visit if.then/if.then2, and cache "1" as the clobbering def on the way back. 2. If we try to walk the defs of a {Phi,MemLoc} and see it has been visited before, just hand back the Phi we're trying to optimize. I promise this isn't as terrible as it seems. :) We now insert {Phi,MemLoc} pairs just before walking the Phi's upward defs. So, we check the cache for the {Phi,MemLoc} pair before checking if we've already walked the Phi. The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache entry if we've already fully walked it, because we cache as we go. So, if the {Phi,MemLoc} pair isn't in cache, either: (a) we must be in the process of visiting it (in which case, we can't give a better answer in a cache-as-we-go DFS walker) (b) we visited it, but didn't cache it on the way back (...which seems to require `ModifyingAccess` to not dominate `StartingAccess`, so I'm 99% sure that would be an error. If it's not an error, I haven't been able to get it to happen locally, so I suspect it's rare.) - - - - - As a consequence of this change, we no longer skip upward defs of phis, so we can kill the `VisitedOnlyOne` check. This gives us better accuracy than we had before, at the cost of potentially doing a bit more work when we have a loop. llvm-svn: 264814	2016-03-30 00:26:26 +00:00
Adam Nemet	1428d41f9a	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806	2016-03-29 23:45:52 +00:00
Anna Zaks	1a470b6f7c	[tsan] Do not instrument reads/writes to instruction profile counters. We have known races on profile counters, which can be reproduced by enabling -fsanitize=thread and -fprofile-instr-generate simultaneously on a multi-threaded program. This patch avoids reporting those races by not instrumenting the reads and writes coming from the instruction profiler. llvm-svn: 264805	2016-03-29 23:19:40 +00:00
Duncan P. N. Exon Smith	e8eb94a9a5	ADCE: Remove debug info intrinsics in dead scopes During ADCE, track which debug info scopes still have live references from the code, and delete debug info intrinsics for the dead ones. These intrinsics describe the locations of variables (in registers or stack slots). If there's no code left corresponding to a variable's scope, then there's no way to reference the variable in the debugger and it doesn't matter what its value is. I add a DEBUG printout when the described location in an SSA register, in case it helps some trying to track down why locations get lost. However, we still delete these; the scope itself isn't attached to any real code, so the ship has already sailed. llvm-svn: 264800	2016-03-29 22:57:12 +00:00
Adam Nemet	85fba39390	[LoopDataPrefetch] Make more member functions private, NFC. llvm-svn: 264798	2016-03-29 22:40:02 +00:00
Teresa Johnson	b703c77b03	[ThinLTO] Remove post-pass metadata linking support Since we have moved to a model where functions are imported in bulk from each source module after making summary-based importing decisions, there is no longer a need to link metadata as a postpass, and all users have been removed. This essentially reverts r255909 and follow-on fixes. llvm-svn: 264763	2016-03-29 18:24:19 +00:00
Justin Lebar	3db0b85fc8	Make InlineSimple's one-arg constructor explicit. NFC llvm-svn: 264744	2016-03-29 16:26:06 +00:00
Justin Lebar	bd145b3cb2	Reformat a comment in InlineSimple.cpp. NFC llvm-svn: 264743	2016-03-29 16:26:03 +00:00
Teresa Johnson	efeae0e210	[ThinLTO] Use new GlobalValue::getGUID helper (NFC) This was already being used for functions and aliases, was missed when handling global variables. llvm-svn: 264734	2016-03-29 14:49:26 +00:00
Hyojin Sung	4673f10568	[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264697	2016-03-29 04:08:57 +00:00
Evgeniy Stepanov	f575b2687c	Remove personality for declarations in CloneModule. Personality is copied as part of copyFunctionAttributes, but it is invalid on a declaration. Remove the personality attribute it the function body is not cloned. Also add a verifier run over output modules in the llvm-split tool. llvm-svn: 264667	2016-03-28 21:37:02 +00:00
Ryan Govostes	653f9d0273	[asan] Support dead code stripping on Mach-O platforms On OS X El Capitan and iOS 9, the linker supports a new section attribute, live_support, which allows dead stripping to remove dead globals along with the ASAN metadata about them. With this change __asan_global structures are emitted in a new __DATA,__asan_globals section on Darwin. Additionally, there is a __DATA,__asan_liveness section with the live_support attribute. Each entry in this section is simply a tuple that binds together the liveness of a global variable and its ASAN metadata structure. Thus the metadata structure will be alive if and only if the global it references is also alive. Review: http://reviews.llvm.org/D16737 llvm-svn: 264645	2016-03-28 20:28:57 +00:00
Reid Kleckner	ba85781f58	Revert "[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops" This reverts commit r264596. It does not compile. llvm-svn: 264604	2016-03-28 18:07:40 +00:00
Hyojin Sung	0ada5b0d14	[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264596	2016-03-28 17:22:25 +00:00
Rong Xu	6090afd744	[PGO] Don't set the function hotness attribute when populating counters Don't set the function hotness attribute on the fly. This changes the CFG branch probability of the caller function, which leads to inconsistent BB ordering. This patch moves the attribute setting to a separated loop after the counts in all functions are populated. Fixes PR27024 - PGO instrumentation profile data is not reflected in correct basic blocks. Differential Revision: http://reviews.llvm.org/D18491 llvm-svn: 264594	2016-03-28 17:08:56 +00:00
Davide Italiano	6db1dcbf6b	[SimplifyLibCalls] Transform printf("%s", "a") -> putchar('a'). llvm-svn: 264588	2016-03-28 15:54:01 +00:00
Hal Finkel	5c83a090bc	[SROA] Fix typo in comment llvm-svn: 264573	2016-03-28 11:23:21 +00:00
Hal Finkel	29f5131daf	C++11 is required, remove some preprocessor checks for it We require C++11 to build, so remove a few remaining preprocessor checks for '__cplusplus >= 201103L'. This should always be true. llvm-svn: 264572	2016-03-28 11:13:03 +00:00
Teresa Johnson	d29478f70e	[ThinLTO] Add optional import message and statistics Summary: Add a statistic to count the number of imported functions. Also, add a new -print-imports option to emit a trace of imported functions, that works even for an NDEBUG build. Note that emitOptimizationRemark does not work for the above printing as it expects a Function object and DebugLoc, neither of which we have with summary-based importing. This is part 2 of D18487, the first part was committed separately as r264536. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18487 llvm-svn: 264537	2016-03-27 15:27:30 +00:00
Teresa Johnson	9aae395fa8	[ThinLTO] Don't try to import alias unless aliasee can be imported With r264503, aliases are now being added to the GlobalsToImport set even when their aliasees can't be imported due to their linkage type. While the importing worked correctly (the aliases imported as declarations) due to the logic in doImportAsDefinition, there is no point to adding them to the GlobalsToImport set. Additionally, with D18487 it was resulting in incorrectly printing a message indicating that the alias was imported. To avoid this, delay adding aliases to the GlobalsToImport set until after the linkage type of the aliasee is checked. This patch is part of D18487. llvm-svn: 264536	2016-03-27 15:01:11 +00:00
Sanjay Patel	796db35f62	[SimplifyCFG] propagate branch metadata when creating select (PR26636) llvm-svn: 264527	2016-03-26 23:30:50 +00:00
Mehdi Amini	01e321306b	ThinLTO: use the callgraph from the combined index to drive the FunctionImporter Summary: Now that the summary contains the full reference/call graph, we can replace the existing function importer that loads and inspect the IR to iteratively walk the call graph by a traversal based purely on the summary information. Decouple the actual importing decision from any IR manipulation. Reviewers: tejohnson Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18343 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 264503	2016-03-26 05:40:34 +00:00
Sanjoy Das	d4c783335b	[RS4GC] Lower calls to @llvm.experimental.deoptimize This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize`` to gc.statepoints wrapping ``__llvm_deoptimize``, and changes ``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize`` as a non GC leaf function. I've had to hard code the ``"__llvm_deoptimize"`` name in RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only during codegen. This isn't without precedent in the codebase, so I'm not overtly concerned. llvm-svn: 264456	2016-03-25 20:12:13 +00:00
David L Kreitzer	8d441eb936	Enable non-power-of-2 #pragma unroll counts. Patch by Evgeny Stupachenko. Differential Revision: http://reviews.llvm.org/D18202 llvm-svn: 264407	2016-03-25 14:24:52 +00:00
David Majnemer	e09d035dad	[LoopStrengthReduce] Don't hoist into a catchswitch We try to hoist the insertion point as high as possible to encourage sharing. However, we must be careful not to hoist into a catchswitch as it is both an EHPad and a terminator. llvm-svn: 264344	2016-03-24 21:40:22 +00:00
Adam Nemet	7aba60c853	[LLE] Check for mismatching types between the store and the load earlier isDependenceDistanceOfOne asserts that the store and the load access through the same type. This function is also used by removeDependencesFromMultipleStores so we need to make sure we filter out mismatching types before reaching this point. Now we do this when the initial candidates are gathered. This is a refinement of the fix made in r262267. Fixes PR27048. llvm-svn: 264313	2016-03-24 17:59:26 +00:00
Mike Aizatsky	9987f43ffa	[sancov] code readability improvement. Summary: Reply to http://reviews.llvm.org/D18341 Differential Revision: http://reviews.llvm.org/D18406 llvm-svn: 264213	2016-03-23 23:15:03 +00:00
George Burgess IV	0e4898685f	Fix bugs in the MemorySSA walker. There are a few bugs in the walker that this patch addresses. Primarily: - Caching can break when we have multiple BBs without phis - We weren't optimizing some phis properly - Because of how the DFS iterator works, there were times where we wouldn't cache any results of our DFS I left the test cases with FIXMEs in, because I'm not sure how much effort it will take to get those to work (read: We'll probably ultimately have to end up redoing the walker, or we'll have to come up with some creative caching tricks), and more test coverage = better. Differential Revision: http://reviews.llvm.org/D18065 llvm-svn: 264180	2016-03-23 18:31:55 +00:00
Junmo Park	820964e9c6	Minor code cleanup. NFC. llvm-svn: 264124	2016-03-23 01:38:35 +00:00
Davide Italiano	1a911e204d	[ModuleUtils] Use range-based loop. NFC. llvm-svn: 264122	2016-03-23 00:43:35 +00:00
Adam Nemet	8b47e0d0ea	[LoopVersioning] Relax an assert for LCSSA PHIs When you have multiple LCSSA (single-operand) PHIs that are converted into two-operand PHIs due to versioning, only assert that the PHI currently being converted has a single operand. I.e. we don't want to check PHIs that were converted earlier in the loop. Fixes PR27023. Thanks to Karl-Johan Karlsson for the minimized testcase! llvm-svn: 264081	2016-03-22 18:38:15 +00:00
Zinovy Nis	07ac2bd4d0	[PATCH] Force LoopReroll to reset the loop trip count value after reroll. It's a bug fix. For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes. My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested. I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev. Differential Revision: http://reviews.llvm.org/D18316 llvm-svn: 264051	2016-03-22 13:50:57 +00:00
Mike Aizatsky	602f79275d	[sancov] do not instrument nodes that are full pre-dominators Summary: Without tree pruning clang has 2,667,552 points. Wiht only dominators pruning: 1,515,586. With both dominators & predominators pruning: 1,340,534. Resubmit of r262103. Differential Revision: http://reviews.llvm.org/D18341 llvm-svn: 264003	2016-03-21 23:08:16 +00:00
George Burgess IV	3887a41725	[MemorySSA] Consider def-only BBs for live-in calculations. If we have a BB with only MemoryDefs, live-in calculations will ignore it. This means we get results like this: define void @foo(i8* %p) { ; 1 = MemoryDef(liveOnEntry) store i8 0, i8* %p br i1 undef, label %if.then, label %if.end if.then: ; 2 = MemoryDef(1) store i8 1, i8* %p br label %if.end if.end: ; 3 = MemoryDef(1) store i8 2, i8* %p ret void } ...When there should be a MemoryPhi in the `if.end` BB. This patch fixes that behavior. llvm-svn: 263991	2016-03-21 21:25:39 +00:00
Chad Rosier	2e5c526bb1	[SLP] Remove unnecessary member variables by using container APIs. This changes the debug output, but still retains its usefulness. Differential Revision: http://reviews.llvm.org/D18324 llvm-svn: 263975	2016-03-21 19:47:44 +00:00
David Majnemer	abae6b588b	[SimplifyLibCalls] Only consider sinpi/cospi functions within the same function The sinpi/cospi can be replaced with sincospi to remove unnecessary computations. However, we need to make sure that the calls are within the same function! This fixes PR26993. llvm-svn: 263875	2016-03-19 04:53:02 +00:00
David Majnemer	cdf2873e36	[InstCombine] Don't insert instructions before a catch switch CatchSwitches are not splittable, we cannot insert casts, etc. before them. This fixes PR26992. llvm-svn: 263874	2016-03-19 04:39:52 +00:00
Mehdi Amini	8d05185a26	Rework linkInModule(), making it oblivious to ThinLTO Summary: ThinLTO is relying on linkInModule to import selected function. However a lot of "magic" was hidden in linkInModule and the IRMover, who would rename and promote global variables on the fly. This is moving to an approach where the steps are decoupled and the client is reponsible to specify the list of globals to import. As a consequence some test are changed because they were relying on the previous behavior which was importing the definition of every single global without control on the client side. Now the burden is on the client to decide if a global has to be imported or not. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18122 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263863	2016-03-19 00:40:31 +00:00
Mike Aizatsky	759aca01ce	[sancov] clang-formatting SanitizerCoverage.cpp and fully pleasing clang-tidy. Differential Revision: http://reviews.llvm.org/D18288 llvm-svn: 263852	2016-03-18 23:29:29 +00:00
Chandler Carruth	3006115cfe	Revert "Revert "[sancov] specifying sanitizer coverage dependencies."" This reverts commit r263825, re-instating r263797. llvm-svn: 263847	2016-03-18 22:43:42 +00:00
Chandler Carruth	e2b7021a91	[sancov] Fix the sancov pass to initialize itself inside its constructor. This should fix the recent crashes on certain architectures. llvm-svn: 263845	2016-03-18 22:35:58 +00:00
Sanjoy Das	74af78e3b0	[IndVars] Make the fix for PR26973 more obvious; NFCI llvm-svn: 263828	2016-03-18 20:37:11 +00:00
Sanjoy Das	60fb899f28	[IndVars] Pass the right loop to isLoopInvariantPredicate The loop on IVOperand's incoming values assumes IVOperand to be an induction variable on the loop over which `S Pred X` is invariant; otherwise loop invariant incoming values to IVOperand are not guaranteed to dominate the comparision. This fixes PR26973. llvm-svn: 263827	2016-03-18 20:37:07 +00:00
Mike Aizatsky	075ed3eec1	Revert "[sancov] specifying sanitizer coverage dependencies." This fails on arm. This reverts commit 52c8e0f7119d1ea1050c0708565a8c92b73386d2. llvm-svn: 263825	2016-03-18 20:34:58 +00:00
Mike Aizatsky	4f7994c8cb	[sancov] specifying sanitizer coverage dependencies. Summary: These dependencies would be used in the future to reduce the number of instrumented blocks(http://reviews.llvm.org/rL262103) This is submitted as a separate CL because of previous problems with ARM. Subscribers: aemerson Differential Revision: http://reviews.llvm.org/D18227 llvm-svn: 263797	2016-03-18 17:33:21 +00:00
Adam Nemet	709e3046ee	[LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead Summary: It can hurt performance to prefetch ahead too much. Be conservative for now and don't prefetch ahead more than 3 iterations on Cyclone. Reviewers: hfinkel Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17949 llvm-svn: 263772	2016-03-18 00:27:43 +00:00
Adam Nemet	6d8beeca53	[LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses Summary: And use this TTI for Cyclone. As it was explained in the original RFC (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW prefetcher work up to 2KB strides. I am also adding tests for this and the previous change (D17943): * Cyclone prefetching accesses with a large stride * Cyclone not prefetching accesses with a small stride * Generic Aarch64 subtarget not prefetching either Reviewers: hfinkel Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17945 llvm-svn: 263771	2016-03-18 00:27:38 +00:00
Adam Nemet	b0c4eae073	[LoopVectorize] Annotate versioned loop with noalias metadata Summary: Use the new LoopVersioning facility (D16712) to add noalias metadata in the vector loop if we versioned with memchecks. This can enable some optimization opportunities further down the pipeline (see the included test or the benchmark improvement quoted in D16712). The test also covers the bug I had in the initial version in D16712. The vectorizer did not previously use LoopVersioning. The reason is that the vectorizer performs its transformations in single shot. It creates an empty single-block vector loop that it then populates with the widened, if-converted instructions. Thus creating an intermediate versioned scalar loop seems wasteful. So this patch (rather than bringing in LoopVersioning fully) adds a special interface to LoopVersioning to allow the vectorizer to add no-alias annotation while still performing its own versioning. As the vectorizer propagates metadata from the instructions in the original loop to the vector instructions we also check the pointer in the original instruction and see if LoopVersioning can add no-alias metadata based on the issued memchecks. Reviewers: hfinkel, nadav, mzolotukhin Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17191 llvm-svn: 263744	2016-03-17 20:32:37 +00:00
Adam Nemet	5eccf07df3	[LoopVersioning] Annotate versioned loop with noalias metadata Summary: If we decide to version a loop to benefit a transformation, it makes sense to record the now non-aliasing accesses in the newly versioned loop. This allows non-aliasing information to be used by subsequent passes. One example is 456.hmmer in SPECint2006 where after loop distribution, we vectorize one of the newly distributed loops. To vectorize we version this loop to fully disambiguate may-aliasing accesses. If we add the noalias markers, we can use the same information in a later DSE pass to eliminate some dead stores which amounts to ~25% of the instructions of this hot memory-pipeline-bound loop. The overall performance improves by 18% on our ARM64. The scoped noalias annotation is added in LoopVersioning. The patch then enables this for loop distribution. A follow-on patch will enable it for the vectorizer. Eventually this should be run by default when versioning the loop but first I'd like to get some feedback whether my understanding and application of scoped noalias metadata is correct. Essentially my approach was to have a separate alias domain for each versioning of the loop. For example, if we first version in loop distribution and then in vectorization of the distributed loops, we have a different set of memchecks for each versioning. By keeping the scopes in different domains they can conveniently be defined independently since different alias domains don't affect each other. As written, I also have a separate domain for each loop. This is not necessary and we could save some metadata here by using the same domain across the different loops. I don't think it's a big deal either way. Probably the best is to review the tests first to see if I mapped this problem correctly to scoped noalias markers. I have plenty of comments in the tests. Note that the interface is prepared for the vectorizer which needs the annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning so we need a way to pass in the versioned instructions. This is also why the maps have to become part of the object state. Also currently, we only have an AA-aware DSE after the vectorizer if we also run the LTO pipeline. Depending how widely this triggers we may want to schedule a DSE toward the end of the regular pass pipeline. Reviewers: hfinkel, nadav, ashutosh.nema Subscribers: mssimpso, aemerson, llvm-commits, mcrosier Differential Revision: http://reviews.llvm.org/D16712 llvm-svn: 263743	2016-03-17 20:32:32 +00:00
Guozhi Wei	7b390ec4cd	[InstCombine] Combine A->B->A BitCast This patch enhances InstCombine to handle following case: A -> B bitcast PHI B -> A bitcast llvm-svn: 263734	2016-03-17 18:47:20 +00:00
Sanjoy Das	c9058ca9e0	[Statepoints] Export a magic constant into a header; NFC llvm-svn: 263733	2016-03-17 18:42:17 +00:00
Sanjay Patel	9e23fedaf0	propagate 'unpredictable' metadata on select instructions This is similar to D18133 where we allowed profile weights on select instructions. This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects. A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at: http://reviews.llvm.org/rL263648 Differential Revision: http://reviews.llvm.org/D18220 llvm-svn: 263716	2016-03-17 15:30:52 +00:00
Sanjoy Das	312038872d	[Statepoints] Separate out logic for statepoint directives; NFC This splits out the logic that maps the `"statepoint-id"` attribute into the actual statepoint ID, and the `"statepoint-num-patch-bytes"` attribute into the number of patchable bytes the statpeoint is lowered into. The new home of this logic is in IR/Statepoint.cpp, and this refactoring will support similar functionality when lowering calls with deopt operand bundles in the future. llvm-svn: 263685	2016-03-17 01:56:10 +00:00
Chad Rosier	fea398188c	[SLP] Make DataLayout a member variable. llvm-svn: 263656	2016-03-16 19:48:42 +00:00
Geoff Berry	56fabf9b55	Revert "[LSR] Create fewer redundant instructions." This reverts commit r263644. Investigating bootstrap failures. llvm-svn: 263655	2016-03-16 19:21:47 +00:00
Evgeniy Stepanov	4b96ed693a	[msan] Add a comment with a bug link. llvm-svn: 263645	2016-03-16 17:39:17 +00:00
Geoff Berry	459b750871	[LSR] Create fewer redundant instructions. Summary: Fix LSRInstance::HoistInsertPosition() to check the original insert position block first for a canonical insertion point that is dominated by all inputs. This leads to SCEV being able to reuse more instructions since it currently tracks the instructions it creates for reuse by keeping a table of <Value, insert point> pairs. Reviewers: atrick Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18001 llvm-svn: 263644	2016-03-16 17:29:49 +00:00
Haicheng Wu	7873857a88	[JumpThreading] See through Cast Instructions To capture more jump-thread opportunity. llvm-svn: 263618	2016-03-16 04:52:52 +00:00
Haicheng Wu	64d9d7c3f7	Revert "[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()" Not sure it handles undef properly. llvm-svn: 263605	2016-03-15 23:38:47 +00:00
Adam Nemet	c979c6e123	Turn LoopLoadElimination on again The latent bug that LLE exposed in the LoopVectorizer was resolved (PR26952). The pass can be disabled with -mllvm -enable-loop-load-elim=0 llvm-svn: 263595	2016-03-15 22:26:12 +00:00
Bjorn Steinbrink	37ca462508	Also handle the new Rust pers fn to isCatchAll() llvm-svn: 263585	2016-03-15 20:57:07 +00:00
Evgeniy Stepanov	d6e91369d8	[msan] Don't put module constructors in comdats. There is something strange going on with debug info (.eh_frame_hdr) disappearing when msan.module_ctor are placed in comdat sections. Moving this functionality under flag, disabled by default. llvm-svn: 263579	2016-03-15 20:25:47 +00:00
Adam Nemet	fdb20595a1	[LV] Preserve LoopInfo when store predication is used This was a latent bug that got exposed by the change to add LoopSimplify as a dependence to LoopLoadElimination. Since LoopInfo was corrupted after LV, LoopSimplify mis-compiled nbench in the test-suite (more details in the PR). The problem was that when we create the blocks for predicated stores we didn't add those to any loops. The original testcase for store predication provides coverage for this assuming we verify LI on the way out of LV. Fixes PR26952. llvm-svn: 263565	2016-03-15 18:06:20 +00:00
Benjamin Kramer	96f4b12880	[GlobalOpt] Don't look through aliases when sorting names of globals. If both are different aliases to the same value the sorting becomes non-deterministic as array_pod_sort is not stable. llvm-svn: 263550	2016-03-15 14:18:26 +00:00
Chad Rosier	ebe559019b	[SLP] Update comment to reflect reality. NFC. llvm-svn: 263548	2016-03-15 13:27:58 +00:00
Eric Christopher	257338ff0f	Use some braces to format this a little better. llvm-svn: 263527	2016-03-15 03:01:31 +00:00
Eric Christopher	ee00abe5e6	Fix llvm/llvm/lib/Transforms/Utils/LoopUnroll.cpp:285:53: error: suggest parentheses around '&&' within '\|\|' [-Werror=parentheses]. llvm-svn: 263525	2016-03-15 02:19:06 +00:00
Teresa Johnson	b43027d1e0	Move global ID computation from Function to GlobalValue (NFC) Since the static getGlobalIdentifier and getGUID methods are now called for global values other than functions, reflect that by moving these methods to the GlobalValue class. llvm-svn: 263524	2016-03-15 02:13:19 +00:00
Teresa Johnson	26ab5772b0	[ThinLTO] Renaming of function index to module summary index (NFC) (Resubmitting after fixing missing file issue) With the changes in r263275, there are now more than just functions in the summary. Completed the renaming of data structures (started in r263275) to reflect the wider scope. In particular, changed the FunctionIndex* data structures to ModuleIndex*, and renamed related variables and comments. Also renamed the files to reflect the changes. A companion clang patch will immediately succeed this patch to reflect this renaming. llvm-svn: 263513	2016-03-15 00:04:37 +00:00
Justin Lebar	6827de19b2	[LoopUnroll] Respect the convergent attribute. Summary: Specifically, when we perform runtime loop unrolling of a loop that contains a convergent op, we can only unroll k times, where k divides the loop trip multiple. Without this change, we'll happily unroll e.g. the following loop for (int i = 0; i < N; ++i) { if (i == 0) convergent_op(); foo(); } into int i = 0; if (N % 2 == 1) { convergent_op(); foo(); ++i; } for (; i < N - 1; i += 2) { if (i == 0) convergent_op(); foo(); foo(); }. This is unsafe, because we've just added a control-flow dependency to the convergent op in the prelude. In general, runtime unrolling loops that contain convergent ops is safe only if we don't have emit a prelude, which occurs when the unroll count divides the trip multiple. Reviewers: resistor Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17526 llvm-svn: 263509	2016-03-14 23:15:34 +00:00
Amaury Sechet	bdb261b4c0	Imporove load to store => memcpy Summary: This now try to reorder instructions in order to help create the optimizable pattern. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Differential Revision: http://reviews.llvm.org/D16523 llvm-svn: 263503	2016-03-14 22:52:27 +00:00
Teresa Johnson	cec0cae313	Revert "[ThinLTO] Renaming of function index to module summary index (NFC)" This reverts commit r263490. Missed a file. llvm-svn: 263493	2016-03-14 21:18:10 +00:00
Teresa Johnson	892920b358	[ThinLTO] Renaming of function index to module summary index (NFC) With the changes in r263275, there are now more than just functions in the summary. Completed the renaming of data structures (started in r263275) to reflect the wider scope. In particular, changed the FunctionIndex* data structures to ModuleIndex*, and renamed related variables and comments. Also renamed the files to reflect the changes. A companion clang patch will immediately succeed this patch to reflect this renaming. llvm-svn: 263490	2016-03-14 21:05:56 +00:00
Adam Nemet	bb45810e4f	Revert "Turn LoopLoadElimination on again" This reverts commit r263472. There is an LNT failure on clang-ppc64be-linux-lnt. Turn this off, while I am investigating. llvm-svn: 263485	2016-03-14 20:38:55 +00:00
Sanjay Patel	ee52b6e77d	allow branch weight metadata on select instructions (PR26636) As noted in: https://llvm.org/bugs/show_bug.cgi?id=26636 This doesn't accomplish anything on its own. It's the first step towards preserving and using branch weights with selects. The next step would be to make sure we're propagating the info in all of the other places where we create selects (SimplifyCFG, InstCombine, etc). I don't think there's an easy fix to make this happen; we have to look at each transform individually to determine how to correctly propagate the weights. Along with that step, we need to then use the weights when making subsequent transform decisions such as discussed in http://reviews.llvm.org/D16836. The inliner test is independent but closely related. It verifies that metadata is preserved when both branches and selects are cloned. Differential Revision: http://reviews.llvm.org/D18133 llvm-svn: 263482	2016-03-14 20:18:59 +00:00
Justin Lebar	9d94397859	[attrs] Handle convergent CallSites. Summary: Previously we had a notion of convergent functions but not of convergent calls. This is insufficient to correctly analyze calls where the target is unknown, e.g. indirect calls. Now a call is convergent if it targets a known-convergent function, or if it's explicitly marked as convergent. As usual, we can remove convergent where we can prove that no convergent operations are performed in the call. Originally landed as r261544, then reverted in r261544 for (incidental) build breakage. Re-landed here with no changes. Reviewers: chandlerc, jingyue Subscribers: llvm-commits, tra, jhen, hfinkel Differential Revision: http://reviews.llvm.org/D17739 llvm-svn: 263481	2016-03-14 20:18:54 +00:00
Keno Fischer	a91ae8336b	[SLPVectorizer] Fix dependency list Summary: DemandedBits was added to the requirements of SLPVectorizer in rL261212 (and various earlier version of it), but the appropriate initialization statement was accidentally forgotten. Ref [[ https://github.com/JuliaLang/julia/issues/14998 \| JuliaLang/julia#14998 ]]. Patch by Yichao Yu. Reviewers: mssimpso Differential Revision: http://reviews.llvm.org/D18152 llvm-svn: 263476	2016-03-14 20:04:24 +00:00
Adam Nemet	5a19ae917b	Turn LoopLoadElimination on again The two issues that were discovered got fixed (r263058, r263173). The pass can be disabled with -mllvm -enable-loop-load-elim=0 llvm-svn: 263472	2016-03-14 19:40:25 +00:00
Chad Rosier	00bd82cade	[CVP] Replace nonnegative with positive, per Philip's request. NFC. llvm-svn: 263430	2016-03-14 13:48:00 +00:00
Haicheng Wu	d60ae33d29	[CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegative The motivating example is this for (j = n; j > 1; j = i) { i = j / 2; } The signed division is safely to be changed to an unsigned division (j is known to be larger than 1 from the loop guard) and later turned into a single shift without considering the sign bit. llvm-svn: 263406	2016-03-14 03:24:28 +00:00
Mehdi Amini	ba9fba81d6	Remove PreserveNames template parameter from IRBuilder This reapplies r263258, which was reverted in r263321 because of issues on Clang side. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263393	2016-03-13 21:05:13 +00:00
Sanjay Patel	5658b58936	remove unnecessary cast; NFC llvm-svn: 263343	2016-03-12 18:17:41 +00:00
Sanjay Patel	2e0027706a	fix formatting; NFC llvm-svn: 263342	2016-03-12 18:05:53 +00:00
Sanjay Patel	5781d840dd	use range loops; NFCI llvm-svn: 263341	2016-03-12 16:52:17 +00:00
Sanjay Patel	c4acbae63f	[x86, InstCombine] delete x86 SSE2 masked store with zero mask This follows up on the related AVX instruction transforms, but this one is too strange to do anything more with. Intel's behavioral description of this instruction in its Software Developer's Manual is tragi-comic. llvm-svn: 263340	2016-03-12 15:16:59 +00:00
Eric Christopher	35abd051c0	Temporarily revert: commit ae14bf6488e8441f0f6d74f00455555f6f3943ac Author: Mehdi Amini <mehdi.amini@apple.com> Date: Fri Mar 11 17:15:50 2016 +0000 Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258 91177308-0d34-0410-b5e6-96231b3b80d8 until we can figure out what to do about clang and Release build testing. This reverts commit 263258. llvm-svn: 263321	2016-03-12 01:47:22 +00:00
George Burgess IV	b42b762bca	[MemorySSA] Make a return type reflect reality. NFC. llvm-svn: 263286	2016-03-11 19:34:03 +00:00
Sanjoy Das	b51325dbdb	Introduce @llvm.experimental.deoptimize Summary: This intrinsic, together with deoptimization operand bundles, allow frontends to express transfer of control and frame-local state from one (typically more specialized, hence faster) version of a function into another (typically more generic, hence slower) version. In languages with a fully integrated managed runtime this intrinsic can be used to implement "uncommon trap" like functionality. In unmanaged languages like C and C++, this intrinsic can be used to represent the slow paths of specialized functions. Note: this change does not address how `@llvm.experimental_deoptimize` is lowered. That will be done in a later change. Reviewers: chandlerc, rnk, atrick, reames Subscribers: llvm-commits, kmod, mjacob, maksfb, mcrosier, JosephTremoulet Differential Revision: http://reviews.llvm.org/D17732 llvm-svn: 263281	2016-03-11 19:08:34 +00:00
Vedant Kumar	e5a9a275d3	[PGO] Skip value profile instrumentation of inline asm Value profile instrumentation treats inline asm calls like they are indirect calls. This causes problems when the 'Callee' is passed to a ptrtoint cast -- the verifier rightly claims that this is bogus and crashes opt. llvm-svn: 263278	2016-03-11 18:57:48 +00:00
Teresa Johnson	76a1c1d0ba	[ThinLTO] Support for reference graph in per-module and combined summary. Summary: This patch adds support for including a full reference graph including call graph edges and other GV references in the summary. The reference graph edges can be used to make importing decisions without materializing any source modules, can be used in the plugin to make file staging decisions for distributed build systems, and is expected to have other uses. The call graph edges are recorded in each function summary in the bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when there is PGO, where the ValueId can be mapped to the function GUID via the ValueSymbolTable. In the function index in memory, the call graph edges reference the target via the CalleeGUID instead of the CalleeValueId. The reference graph edges are recorded in each summary record with a list of referenced value IDs, which can be mapped to value GUID via the ValueSymbolTable. Addtionally, a new summary record type is added to record references from global variable initializers. A number of bitcode records and data structures have been renamed to reflect the newly expanded scope of the summary beyond functions. More cleanup will follow. Reviewers: joker.eph, davidxl Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D17212 llvm-svn: 263275	2016-03-11 18:52:24 +00:00
Mehdi Amini	99eab3dd06	Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263258	2016-03-11 17:15:50 +00:00
Mehdi Amini	1e9c925182	Do not specialize IRBuilder to strip names in SROA Summary: Following r263086, we are replacing this by a runtime check. More cleanup will follow on the IRBuilder itself, but I submitted this patch separately as SROA has a fancy "prefixInserter" class that needs extra-love. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18022 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263256	2016-03-11 17:15:34 +00:00
Chandler Carruth	ace8c8f765	[PM] Sink the "Expression" type for GVN into the class as a private member type. Because of how this type is used by the ValueTable, it cannot actually have hidden visibility. GCC actually nicely warns about this but Clang just silently ... I don't even know. =/ We should do a better job either way though. This should resolve a bunch of the GCC warnings about visibility that the port of GVN triggered and make the visibility story a bit more correct. llvm-svn: 263250	2016-03-11 16:25:19 +00:00
Chandler Carruth	3bc9c7fb45	[PM] The order of evaluation of these analyses is actually significant, much to my horror, so use variables to fix it in place. This terrifies me. Both basic-aa and memdep will provide more precise information when the domtree and/or the loop info is available. Because of this, if your pass (like GVN) requires domtree, and then queries memdep or basic-aa, it will get more precise results. If it does this in the other order, it gets less precise results. All of the ideas I have for fixing this are, essentially, terrible. Here I've just caused us to stop having unspecified behavior as different implementations evaluate the order of these arguments differently. I'm actually rather glad that they do, or the fragility of memdep and basic-aa would have gone on unnoticed. I've left comments so we don't immediately break this again. This should fix bots whose host compilers evaluate the order of arguments differently from Clang. llvm-svn: 263231	2016-03-11 13:26:47 +00:00
Chandler Carruth	b47f8010a9	[PM] Make the AnalysisManager parameter to run methods a reference. This was originally a pointer to support pass managers which didn't use AnalysisManagers. However, that doesn't realistically come up much and the complexity of supporting it doesn't really make sense. In fact, many parts of the pass manager were just assuming the pointer was never null already. This at least makes it much more explicit and clear. llvm-svn: 263219	2016-03-11 11:05:24 +00:00
Benjamin Kramer	c126353473	[InstCombine] Use Twines to generate names. Since the names are used in a loop this does more work in debug builds. In release builds value names are generally discarded so we don't have to do the concatenation at all. It's also simpler code, no functional change intended. llvm-svn: 263215	2016-03-11 10:20:56 +00:00
Chandler Carruth	89c45a162f	[PM] Port GVN to the new pass manager, wire it up, and teach a couple of tests to run GVN in both modes. This is mostly the boring refactoring just like SROA and other complex transformation passes. There is some trickiness in that GVN's ValueNumber class requires hand holding to get to compile cleanly. I'm open to suggestions about a better pattern there, but I tried several before settling on this. I was trying to balance my desire to sink as much implementation detail into the source file as possible without introducing overly many layers of abstraction. Much like with SROA, the design of this system is made somewhat more cumbersome by the need to support both pass managers without duplicating the significant state and logic of the pass. The same compromise is struck here. I've also left a FIXME in a doxygen comment as the GVN pass seems to have pretty woeful documentation within it. I'd like to submit this with the FIXME and let those more deeply familiar backfill the information here now that we have a nice place in an interface to put that kind of documentaiton. Differential Revision: http://reviews.llvm.org/D18019 llvm-svn: 263208	2016-03-11 08:50:55 +00:00
Pete Cooper	adebb9379a	Remove llvm::getDISubprogram in favor of Function::getSubprogram llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself. Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function. Ideally this should be NFC, but in reality its possible that a function: has no !dbg (in which case there's likely a bug somewhere in an opt pass), or that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will Reviewed by Duncan Exon Smith. Differential Revision: http://reviews.llvm.org/D18074 llvm-svn: 263184	2016-03-11 02:14:16 +00:00
Adam Nemet	efb234135c	[LLE] Add missed LoopSimplify dependence The code assumed that we always had a preheader without making the pass dependent on LoopSimplify. Thanks to Mattias Eriksson V for reporting this. llvm-svn: 263173	2016-03-10 23:54:39 +00:00
Chandler Carruth	37f1f12226	[SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights out of, and I misdiagnosed for months and months. Andrea has had a patch for this forever, but I just couldn't see how it was fixing the root cause of the problem. It didn't make sense to me, even though the patch was perfectly good and the analysis of the actual failure event was fantastic. Well, I came back to it today because the patch has sat for far too long and needs attention and decided I wouldn't let it go until I really understood what was going on. After quite some time in the debugger, I finally realized that in fact I had just missed an important case with my previous attempt to fix PR22093 in r225149. Not only do we need to handle loads that won't be split, but stores-of-loads that we won't split. We do actually have enough logic in the presplitting to form new slices for split stores.... unless we decided not to split them! I'm so sorry that it took me this long to come to the realization that this is the issue. It seems so obvious in hind sight (of course). Anyways, the fix becomes much smaller and more focused. The fact that we're left doing integer smashing is related to the FIXME in my original commit: fundamentally, we're not aggressive about pre-splitting for loads and stores to the same alloca. If we want to get aggressive about this, it'll need both what Andrea had put into the proposed fix, but also a lot more logic to essentially iteratively pre-split the alloca until we can't do any more. As I said in that commit log, its really unclear that this is the right call. Instead, the integer blending and letting targets lower this to narrower stores seems slightly better. But we definitely shouldn't really go down that path just to fix this bug. Again, tons of thanks are owed to Andrea and others at Sony for working on this bug. I really should have seen what was going on here and re-directed them sooner. =//// llvm-svn: 263121	2016-03-10 15:31:17 +00:00
Chandler Carruth	d94a5962cc	[SROA] Clean up some really weird code, no functionality changed. We already have the instruction extracted into 'I', just cast that to a store the way we do for loads. Also, we don't enter the if unless SI is non-null, so don't test it again for null. I'm pretty sure the entire test there can be nuked, but this is just the trivial cleanup. llvm-svn: 263112	2016-03-10 14:16:18 +00:00
Michael Zolotukhin	b88fbe08fc	[SLP] Add -slp-min-reg-size command line option. MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt to allow changing it. I tried not to change any existing behavior for the default case. Differential revision: http://reviews.llvm.org/D13278 llvm-svn: 263089	2016-03-10 02:49:47 +00:00
Chandler Carruth	7776377e62	[gvn] Fix more indenting and formatting in regions of code that will need to be changed for porting to the new pass manager. Also sink the comment on the ValueTable class back to that class instead of it dangling on an anonymous namespace. No functionality changed. llvm-svn: 263084	2016-03-10 00:58:20 +00:00
Chandler Carruth	169c84f1cc	[gvn] Reformat a chunk of the GVN code that is strangely indented prior to restructuring it for porting to the new pass manager. No functionality changed. llvm-svn: 263083	2016-03-10 00:58:18 +00:00
Chandler Carruth	61440d225b	[PM] Port memdep to the new pass manager. This is a fairly straightforward port to the new pass manager with one exception. It removes a very questionable use of releaseMemory() in the old pass to invalidate its caches between runs on a function. I don't think this is really guaranteed to be safe. I've just used the more direct port to the new PM to address this by nuking the results object each time the pass runs. While this could cause some minor malloc traffic increase, I don't expect the compile time performance hit to be noticable, and it makes the correctness and other aspects of the pass much easier to reason about. In some cases, it may make things faster by making the sets and maps smaller with better locality. Indeed, the measurements collected by Bruno (thanks!!!) show mostly compile time improvements. There is sadly very limited testing at this point as there are only two tests of memdep, and both rely on GVN. I'll be porting GVN next and that will exercise this heavily though. Differential Revision: http://reviews.llvm.org/D17962 llvm-svn: 263082	2016-03-10 00:55:30 +00:00
Philip Reames	e0a5454df4	Fix the build I screwed up rebasing 263072. This change fixes the build and passes all make check. llvm-svn: 263073	2016-03-09 23:07:53 +00:00
Philip Reames	b54c8e6eea	[LICM] Store promotion when memory is thread local This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop. Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped. In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem. Differential Revision: http://reviews.llvm.org/D16783 llvm-svn: 263072	2016-03-09 22:59:30 +00:00
Philip Reames	8f12eba78d	[ValueTracking] Extract isKnownPositive [NFCI] Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem. llvm-svn: 263062	2016-03-09 21:31:47 +00:00
Philip Reames	ec8a8b5437	[InstCombine] (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min. Differential Revision: http://reviews.llvm.org/D17873 llvm-svn: 263059	2016-03-09 21:05:07 +00:00
Adam Nemet	660748ca8c	[LLE] Add missing check for unit stride I somehow missed this. The case in GCC (global_alloc) was similar to the new testcase except it had an array of structs rather than a two dimensional array. Fixes RP26885. llvm-svn: 263058	2016-03-09 20:47:55 +00:00
Matthias Braun	c31032d607	InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2 As part of r251146 InstCombine was extended to call computeKnownBits on every value in the function to determine whether it happens to be constant. This increases typical compiletime by 1-3% (5% in irgen+opt time) in my measurements. On the other hand this case did not trigger once in the whole llvm-testsuite. This patch introduces the notion of ExpensiveCombines which are only enabled for OptLevel > 2. I removed the check in InstructionSimplify as that is called from various places where the OptLevel is not known but given the rarity of the situation I think a check in InstCombine is enough. Differential Revision: http://reviews.llvm.org/D16835 llvm-svn: 263047	2016-03-09 18:47:11 +00:00
Petar Jovanovic	921c2b4eb3	Reland r262337 "calculate builtin_object_size if arg is a removable pointer" Original commit message: calculate builtin_object_size if argument is a removable pointer This patch fixes calculating correct value for builtin_object_size function when pointer is used only in builtin_object_size function call and never after that. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D17337 Reland the original change with a small modification (first do a null check and then do the cast) to satisfy ubsan. llvm-svn: 263011	2016-03-09 14:12:47 +00:00
Adam Nemet	34785ecff1	[LoopDataPrefetch] Add stats and debug output llvm-svn: 262998	2016-03-09 05:33:21 +00:00
Sanjoy Das	2eac48de9e	Return StringRef instead of a naked char*; NFC llvm-svn: 262989	2016-03-09 02:34:19 +00:00
Sanjoy Das	f13900f8ac	[IRCE] Reflow comments; NFC llvm-svn: 262988	2016-03-09 02:34:15 +00:00
Mehdi Amini	bd04e8fed6	FunctionIndex is not optional for renameModuleForThinLTO(), make it a reference (NFC) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 262976	2016-03-09 01:37:14 +00:00
Sanjay Patel	f831fdb56a	fix variable name; NFC llvm-svn: 262953	2016-03-08 19:07:42 +00:00
Sanjay Patel	5c96723622	use range-based loop; NFCI llvm-svn: 262952	2016-03-08 19:06:12 +00:00
Sanjay Patel	eaf06851d0	rangify, fix function names; NFCI llvm-svn: 262940	2016-03-08 17:12:32 +00:00
Sanjay Patel	5b8d741632	don't repeat function names in documentation comments; NFC llvm-svn: 262937	2016-03-08 16:26:39 +00:00
Junmo Park	974eb0a96d	Revert "[InstCombine] Combine A->B->A BitCast" This reverts commit r262670 due to compile failure. llvm-svn: 262916	2016-03-08 07:09:46 +00:00
Peter Collingbourne	3866cc5f69	Fix evaluation order. Spotted by Alexander Riccio! llvm-svn: 262907	2016-03-08 03:50:36 +00:00
Easwaran Raman	b1bd398ceb	Revert revisions 262636, 262643, 262679, and 262682. llvm-svn: 262883	2016-03-08 00:36:35 +00:00
Anna Zaks	c1efa64c63	[tsan] Add support for pointer typed atomic stores, loads, and cmpxchg TSan instrumentation functions for atomic stores, loads, and cmpxchg work on integer value types. This patch adds casts before calling TSan instrumentation functions in cases where the value is a pointer. Differential Revision: http://reviews.llvm.org/D17833 llvm-svn: 262876	2016-03-07 23:16:23 +00:00
Adam Nemet	bb3680bd85	[LoopDataPrefetch] If prefetch distance is not set, skip pass This lets select sub-targets enable this pass. The patch implements the idea from the recent llvm-dev thread: http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925 The goal is to enable the LoopDataPrefetch pass for the Cyclone sub-target only within Aarch64. Positive and negative tests will be included in an upcoming patch that enables selective prefetching of large-strided accesses on Cyclone. llvm-svn: 262844	2016-03-07 18:35:42 +00:00
Adam Nemet	81113ef68c	Revert "Enable LoopLoadElimination by default" This reverts commit r262250. It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when running with ref data set. The error happens with "-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing". llvm-svn: 262839	2016-03-07 17:38:02 +00:00
Chandler Carruth	9ca96384f3	[DFSan] Remove an overly aggressive assert reported in PR26068. This code has been successfully used to bootstrap libc++ in a no-asserts mode for a very long time, so the code that follows cannot be completely incorrect. I've added a test that shows the current behavior for this kind of code with DFSan. If it is desirable for DFSan to do something special when processing an invoke of a variadic function, it can be added, but we shouldn't keep an assert that we've been ignoring due to release builds anyways. llvm-svn: 262829	2016-03-07 14:05:09 +00:00
Rong Xu	ecdc98fdae	[PGO] Add a commandline option to control number of the VP annotation metadata. llvm-svn: 262750	2016-03-04 22:08:44 +00:00
Easwaran Raman	3b7a8246c9	Fix a use-after-free bug introduced in r262636 llvm-svn: 262679	2016-03-04 00:44:01 +00:00
Guozhi Wei	92e9d0e80e	[InstCombine] Combine A->B->A BitCast This patch enhances InstCombine to handle following case: A -> B bitcast PHI B -> A bitcast llvm-svn: 262670	2016-03-03 23:21:38 +00:00
Sanjay Patel	9bba75084b	[InstCombine] transform bitcasted bitwise logic ops with constants (PR26702) Given that we're not actually reducing the instruction count in the included regression tests, I think we would call this a canonicalization step. The motivation comes from the example in PR26702: https://llvm.org/bugs/show_bug.cgi?id=26702 If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable example of: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %bc = bitcast <4 x i32> %not to <2 x i64> %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1> %bc2 = bitcast <2 x i64> %notnot to <4 x i32> ret <4 x i32> %bc2 } Simplifies to the expected: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> ret <4 x i32> %lobit } Differential Revision: http://reviews.llvm.org/D17583 llvm-svn: 262645	2016-03-03 19:19:04 +00:00
Easwaran Raman	3035719c86	Infrastructure for PGO enhancements in inliner This patch provides the following infrastructure for PGO enhancements in inliner: Enable the use of block level profile information in inliner Incremental update of block frequency information during inlining Update the function entry counts of callees when they get inlined into callers. Differential Revision: http://reviews.llvm.org/D16381 llvm-svn: 262636	2016-03-03 18:26:33 +00:00
Dehao Chen	57d1dda558	Use LineLocation instead of CallsiteLocation to index callsite profile. Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples). Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17827 llvm-svn: 262634	2016-03-03 18:09:32 +00:00
Matthew Simpson	b840a6d6f4	[LoopUtils, LV] Fix PR26734 The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624	2016-03-03 16:12:01 +00:00
Amaury Sechet	3b8b2ea2e1	Explode store of arrays in instcombine Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized. Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17828 llvm-svn: 262530	2016-03-02 22:36:45 +00:00
Amaury Sechet	7cd3fe7db6	Unpack array of all sizes in InstCombine Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements. Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15890 llvm-svn: 262521	2016-03-02 21:28:30 +00:00
Daniel Berlin	6412002d24	Really fix ASAN leak/etc issues with MemorySSA unittests llvm-svn: 262519	2016-03-02 21:16:28 +00:00
Daniel Berlin	989e601b26	Revert "Fix ASAN detected errors in code and test" (it was not meant to be committed yet) This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95. llvm-svn: 262512	2016-03-02 20:36:22 +00:00
Daniel Berlin	27ed1c2eb0	Fix ASAN detected errors in code and test llvm-svn: 262511	2016-03-02 20:27:29 +00:00
Chandler Carruth	12884f7f80	[AA] Hoist the logic to reformulate various AA queries in terms of other parts of the AA interface out of the base class of every single AA result object. Because this logic reformulates the query in terms of some other aspect of the API, it would easily cause O(n^2) query patterns in alias analysis. These could in turn be magnified further based on the number of call arguments, and then further based on the number of AA queries made for a particular call. This ended up causing problems for Rust that were actually noticable enough to get a bug (PR26564) and probably other places as well. When originally re-working the AA infrastructure, the desire was to regularize the pattern of refinement without losing any generality. While I think it was successful, that is clearly proving to be too costly. And the cost is needless: we gain no actual improvement for this generality of making a direct query to tbaa actually be able to re-use some other alias analysis's refinement logic for one of the other APIs, or some such. In short, this is entirely wasted work. To the extent possible, delegation to other API surfaces should be done at the aggregation layer so that we can avoid re-walking the aggregation. In fact, this significantly simplifies the logic as we no longer need to smuggle the aggregation layer into each alias analysis (or the TargetLibraryInfo into each alias analysis just so we can form argument memory locations!). However, we also have some delegation logic inside of BasicAA and some of it even makes sense. When the delegation logic is baking in specific knowledge of aliasing properties of the LLVM IR, as opposed to simply reformulating the query to utilize a different alias analysis interface entry point, it makes a lot of sense to restrict that logic to a different layer such as BasicAA. So one aspect of the delegation that was in every AA base class is that when we don't have operand bundles, we re-use function AA results as a fallback for callsite alias results. This relies on the IR properties of calls and functions w.r.t. aliasing, and so seems a better fit to BasicAA. I've lifted the logic up to that point where it seems to be a natural fit. This still does a bit of redundant work (we query function attributes twice, once via the callsite and once via the function AA query) but it is exactly twice here, no more. The end result is that all of the delegation logic is hoisted out of the base class and into either the aggregation layer when it is a pure retargeting to a different API surface, or into BasicAA when it relies on the IR's aliasing properties. This should fix the quadratic query pattern reported in PR26564, although I don't have a stand-alone test case to reproduce it. It also seems general goodness. Now the numerous AAs that don't need target library info don't carry it around and depend on it. I think I can even rip out the general access to the aggregation layer and only expose that in BasicAA as it is the only place where we re-query in that manner. However, this is a non-trivial change to the AA infrastructure so I want to get some additional eyes on this before it lands. Sadly, it can't wait long because we should really cherry pick this into 3.8 if we're going to go this route. Differential Revision: http://reviews.llvm.org/D17329 llvm-svn: 262490	2016-03-02 15:56:53 +00:00
George Burgess IV	e0e6e48b29	Attempt to fix ASAN failure in a MemorySSA test. llvm-svn: 262452	2016-03-02 02:35:04 +00:00
Sanjay Patel	5e4c46de6d	revert r262424 because there's a clang test for AArch64 that checks -O3 asm output that is broken by this change llvm-svn: 262440	2016-03-02 01:04:09 +00:00
Sanjay Patel	147e927957	[InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701) As noted in the code comment, I don't think we can do the same transform that we do for scalar integers comparisons to vector integers comparisons because it might pessimize the general case. Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT for integer vectors. But we should now recognize all the variants of this construct and produce the optimal code for the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26701 llvm-svn: 262424	2016-03-01 23:55:18 +00:00
Dehao Chen	1012be120a	Perform InstructioinCombiningPass before SampleProfile pass. Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17742 llvm-svn: 262419	2016-03-01 22:53:02 +00:00
Owen Anderson	7ea02fc787	Fix an issue where fast math flags were dropped during scalarization. Most portions of InstCombine properly propagate fast math flags, but apparently the vector scalarization section was overlooked. llvm-svn: 262376	2016-03-01 19:35:52 +00:00
Daniel Berlin	83fc77b4c0	Add the beginnings of an update API for preserving MemorySSA Summary: This adds the beginning of an update API to preserve MemorySSA. In particular, this patch adds a way to remove memory SSA accesses when instructions are deleted. It also adds relevant unit testing infrastructure for MemorySSA's API. (There is an actual user of this API, i will make that diff dependent on this one. In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P) Reviewers: hfinkel, reames, george.burgess.iv Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17157 llvm-svn: 262362	2016-03-01 18:46:54 +00:00
Petar Jovanovic	6315f3f9b7	Revert "calculate builtin_object_size if argument is a removable pointer" Revert r262337 as "check-llvm ubsan" step failed on sanitizer-x86_64-linux-fast buildbot. llvm-svn: 262349	2016-03-01 16:50:08 +00:00
Petar Jovanovic	8aef99aa86	calculate builtin_object_size if argument is a removable pointer This patch fixes calculating correct value for builtin_object_size function when pointer is used only in builtin_object_size function call and never after that. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D17337 llvm-svn: 262337	2016-03-01 14:39:55 +00:00
Sanjay Patel	6f2c01f712	[x86, InstCombine] transform more x86 masked loads to LLVM intrinsics Continuation of: http://reviews.llvm.org/rL262269 llvm-svn: 262273	2016-02-29 23:59:00 +00:00
Adam Nemet	efc091f457	[LLE] Fix a comment llvm-svn: 262270	2016-02-29 23:21:12 +00:00
Sanjay Patel	98a71505f5	[x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics The intended effect of this patch in conjunction with: http://reviews.llvm.org/rL259392 http://reviews.llvm.org/rL260145 is that customers using the AVX intrinsics in C will benefit from combines when the load mask is constant: __m128 mload_zeros(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(0)); } __m128 mload_fakeones(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(1)); } __m128 mload_ones(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000)); } __m128 mload_oneset(float f) { return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0)); } ...so none of the above will actually generate a masked load for optimized code. This is the masked load counterpart to: http://reviews.llvm.org/rL262064 llvm-svn: 262269	2016-02-29 23:16:48 +00:00
Adam Nemet	83be06e529	[LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly We can actually have dependences between accesses with different underlying types. Bail in this case. A test will follow shortly. llvm-svn: 262267	2016-02-29 22:53:59 +00:00
Adam Nemet	dd9e637aca	Enable LoopLoadElimination by default Summary: I re-benchmarked this and results are similar to original results in D13259: On ARM64: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27% SingleSource/Benchmarks/Polybench/stencils/adi -19.78% On x86: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14% And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop Distribution. In terms of compile time, there is ~5% increase on both SingleSource/Benchmarks/Misc/oourafft and SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny loop-intensive programs where SCEV computations dominates compile time. The reason that time spent in SCEV increases has to do with the design of the old pass manager. If a transform pass does not preserve an analysis we invalidate the analysis even if there was no modification made by the transform pass. This means that currently we don't take advantage of LLE and LV sharing the same analysis (LAA) and unfortunately we recompute LAA and SCEV for LLE. (There should be a way to work around this limitation in the case of SCEV and LAA since both compute things on demand and internally cache their result. Thus we could pretend that transform passes preserve these analyses and manually invalidate them upon actual modification. On the other hand the new pass manager is supposed to solve so I am not sure if this is worthwhile.) Reviewers: hfinkel, dberlin Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16300 llvm-svn: 262250	2016-02-29 20:35:11 +00:00
Rong Xu	9e926e8b92	Minor code cleanup. NFC llvm-svn: 262242	2016-02-29 19:16:04 +00:00
Dehao Chen	939993ff2f	Move discriminator assignment to the right place. Summary: Now discriminator is assigned per-function instead of per-module. Reviewers: davidxl, dnovillo Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D17664 llvm-svn: 262240	2016-02-29 18:59:48 +00:00
Xinliang David Li	985ff20a9c	[PGO] Remove redundant counter copies for avail_extern functions. Differential Revision: http://reviews.llvm.org/D17654 llvm-svn: 262157	2016-02-27 23:11:30 +00:00
Renato Golin	9a5419ecf7	Revert "[sancov] do not instrument nodes that are full pre-dominators" This reverts commit r262103, as it broke all ARM and AArch64 bots. llvm-svn: 262139	2016-02-27 14:19:19 +00:00
Sean Silva	ea399f0242	[instrprof] Use __{start,stop}_SECNAME on PS4 too. Summary: The PS4 linker seems to handle this fine. Hi David, it seems that indeed most ELF linkers support __{start,stop}_SECNAME, as our proprietary linker does as well. This follows the pattern of r250679 w.r.t. the testing. Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain prerelease and it seems to work fine. Reviewers: davidxl Subscribers: probinson, phillip.power, MaggieYi Differential Revision: http://reviews.llvm.org/D17672 llvm-svn: 262112	2016-02-27 06:01:26 +00:00
Mike Aizatsky	9056284912	[sancov] properly initializing pass. llvm-svn: 262111	2016-02-27 05:50:40 +00:00
Kostya Serebryany	3c767db3c5	[libFuzzer] don't emit callbacks to sanitizer run-time in -fsanitize-coverage=trace-pc mode; update libFuzzer doc for previous commit llvm-svn: 262110	2016-02-27 05:45:12 +00:00
Chandler Carruth	ad8cb382fa	[LICM] Teach LICM how to handle cases where the alias set tracker was merged into a loop that was subsequently unrolled (or otherwise nuked). In this case it can't merge in the ASTs for any remaining nested loops, it needs to re-add their instructions dircetly. The fix is very isolated, but I've pulled the code for merging blocks into the AST into a single place in the process. The only behavior change is in the case which would have crashed before. This fixes a crash reported by Mikael Holmen on the list after r261316 restored much of the loop pass pipelining and allowed us to actually do this kind of nested transformation sequenc. I've taken that test case and further reduced it into the somewhat twisty maze of loops in the included test case. This does in fact trigger the bug even in this reduced form. llvm-svn: 262108	2016-02-27 04:34:07 +00:00
Mike Aizatsky	9b53ab7121	[sancov] do not instrument nodes that are full pre-dominators Summary: Without tree pruning clang has 2,667,552 points. Wiht only dominators pruning: 1,515,586. With both dominators & predominators pruning: 1,340,534. Differential Revision: http://reviews.llvm.org/D17671 llvm-svn: 262103	2016-02-27 02:10:27 +00:00
Reid Kleckner	892ae2e2b6	[InstCombine] Be more conservative about removing stackrestore We ended up removing a save/restore pair around an inalloca call, leading to a miscompile in Chromium. llvm-svn: 262095	2016-02-27 00:53:54 +00:00
Sanjay Patel	fc7e7ebf36	[x86, InstCombine] transform x86 AVX2 masked stores to LLVM intrinsics Replicate everything for integers...because x86. Continuation of: http://reviews.llvm.org/rL262064 llvm-svn: 262077	2016-02-26 21:51:44 +00:00
Sanjay Patel	1ace99351f	[x86, InstCombine] transform x86 AVX masked stores to LLVM intrinsics The intended effect of this patch in conjunction with: http://reviews.llvm.org/rL259392 http://reviews.llvm.org/rL260145 is that customers using the AVX intrinsics in C will benefit from combines when the store mask is constant: void mstore_zero_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(0), v); } void mstore_fake_ones_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(1), v); } void mstore_ones_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v); } void mstore_one_set_elt_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v); } ...so none of the above will actually generate a masked store for optimized code. Differential Revision: http://reviews.llvm.org/D17485 llvm-svn: 262064	2016-02-26 21:04:14 +00:00
Haicheng Wu	5539f852ae	[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors() This change tries to find more opportunities to thread over basic blocks. llvm-svn: 261981	2016-02-26 06:06:04 +00:00
Michael Zolotukhin	9f520ebc54	[LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating. Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect. Reviewers: chandlerc, hfinkel Subscribers: sanjoy, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17632 llvm-svn: 261958	2016-02-26 02:57:05 +00:00
Mike Aizatsky	5971f18133	[sancov] Pruning full dominator blocks from instrumentation. Summary: This is the first simple attempt to reduce number of coverage- instrumented blocks. If a basic block dominates all its successors, then its coverage information is useless to us. Ingore such blocks if santizer-coverage-prune-tree option is set. Differential Revision: http://reviews.llvm.org/D17626 llvm-svn: 261949	2016-02-26 01:17:22 +00:00
Anna Zaks	40148f1716	[asan] Do not instrument globals in the special "LLVM" sections llvm-svn: 261794	2016-02-24 22:12:18 +00:00
David Majnemer	ee0cbbbe69	[SimplifyCFG] Use a more elegant solution than r261731 The cleanupret instruction has an invariant that it's 'from' operand be a cleanuppad. This invariant was violated when we removed a dead block which removed a cleanuppad leaving behind a cleanupret with an undef 'from' operand. This was solved in r261731 by staving off the removal of the dead block to a later pass. However, it occured to me that we do not need to do this. Instead, we can simply avoid processing the cleanupret if it has an undef 'from' operand because we know that it will be removed soon. llvm-svn: 261754	2016-02-24 17:30:48 +00:00
Sanjay Patel	dbbaca0e1b	[InstCombine] enable optimization of casted vector xor instructions This is part of the payoff for the refactoring in: http://reviews.llvm.org/rL261649 http://reviews.llvm.org/rL261707 In addition to removing a pile of duplicated code, the xor case was missing the optimization for vector types because it checked "SrcTy->isIntegerTy()" rather than "SrcTy->isIntOrIntVectorTy()" like 'and' and 'or' were already doing. This solves part of: https://llvm.org/bugs/show_bug.cgi?id=26702 llvm-svn: 261750	2016-02-24 17:00:34 +00:00
Artur Pilipenko	31bcca47d3	NFC. Move isDereferenceable to Loads.h/cpp This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated. Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D16180 llvm-svn: 261736	2016-02-24 12:49:04 +00:00
David Majnemer	ec72e37220	[SimplifyCFG] Do not blindly remove unreachable blocks DeleteDeadBlock was called indiscriminately, leading to cleanuprets with undef cleanuppad references. Instead, try to drain the BB of most of it's instructions if it is unreachable. We can then remove the BB if it solely consists of a terminator (and maybe some phis). llvm-svn: 261731	2016-02-24 10:02:16 +00:00
Sanjay Patel	75b4ae25cb	[InstCombine] refactor visitOr() to use foldCastedBitwiseLogic() Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra check from the nearly identical 'or' case: if ((!isa<ICmpInst>(Cast0Src) \|\| !isa<ICmpInst>(Cast1Src)) But I'm not sure how to expose that difference in a regression test. Without that check, the 'or' path will infinite loop on: test/Transforms/InstCombine/zext-or-icmp.ll because the zext-or-icmp fold is attempting a reverse transform. The refactoring should extend to the 'xor' case next to solve part of PR26702. llvm-svn: 261707	2016-02-23 23:56:23 +00:00
Sanjay Patel	713f25e0f8	[InstCombine] improve readability ; NFCI Less indenting, named local variables, more descriptive names. llvm-svn: 261659	2016-02-23 17:41:34 +00:00
David Majnemer	223538f764	[WinEH] Don't inline an 'unwinds to caller' cleanupret into funclets which locally unwind It is problematic if the inlinee has a cleanupret which unwinds to caller and we inline it into a call site which doesn't unwind. If the funclet unwinds anywhere other than to the caller, then we will give the funclet two unwind destinations. This will result in a verifier failure. Seeing as how the caller wasn't an invoke (which would locally unwind) and that the funclet cannot unwind to caller, we must conclude that an 'unwind to caller' cleanupret is dynamically unreachable. This fixes PR26698. Differential Revision: http://reviews.llvm.org/D17536 llvm-svn: 261656	2016-02-23 17:11:04 +00:00
Sanjay Patel	7d0d810ce5	[InstCombine] less indenting; NFC llvm-svn: 261652	2016-02-23 16:59:21 +00:00
Sanjay Patel	40e7ba0046	[InstCombine] add helper function to foldCastedBitwiseLogic() ; NFCI This is a straight cut and paste of the existing code and is intended to be the first step in solving part of PR26702: https://llvm.org/bugs/show_bug.cgi?id=26702 We should be able to reuse most of this and delete the nearly identical existing code in visitOr(). Then, we can enhance visitXor() to use the same code too. llvm-svn: 261649	2016-02-23 16:36:07 +00:00
Michael Zolotukhin	792a885537	Follow up for r261597: Add the * to the auto. llvm-svn: 261600	2016-02-23 00:57:48 +00:00
Michael Zolotukhin	4fdf974e3e	Follow-up for r261595: use range loop. llvm-svn: 261597	2016-02-23 00:48:44 +00:00
Michael Zolotukhin	de19ed1eb1	[LoopUnroll] Avoid unnecessary DT recomputation. Summary: When we completely unroll a loop, it's pretty easy to update DT in-place and thus avoid rebuilding it. DT recalculation is one of the most time-consuming tasks in loop-unroll, so avoiding it at least in case of full unroll should be beneficial. On some extreme (but still real-world) tests this patch improves compile time by ~2x. Reviewers: escha, jmolloy, hfinkel, sanjoy, chandlerc Subscribers: joker.eph, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D17473 llvm-svn: 261595	2016-02-23 00:30:50 +00:00
Dehao Chen	6c73b49911	Set function entry count as 0 if sample profile is not found for the function. Summary: This change makes the sample profile's behavior consistent with instr profile. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17522 llvm-svn: 261587	2016-02-22 22:46:21 +00:00
Adam Nemet	fb31d580ea	[LoopDataPrefetch] Make it testable with opt Summary: Since this is an IR pass it's nice to be able to write tests without llc. This is the counterpart of the llc test under CodeGen/PowerPC/loop-data-prefetch.ll. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17464 llvm-svn: 261578	2016-02-22 21:41:22 +00:00
Michael Zolotukhin	d734bea8ba	[LoopUnrolling] Fix a bug introduced in r259869 (PR26688). The issue was that we only required LCSSA rebuilding if the immediate parent-loop had values used outside of it. The fix is to enaable the same logic for all outer loops, not only immediate parent. llvm-svn: 261575	2016-02-22 21:21:45 +00:00
Philip Reames	ce38c2ddf6	[RS4GC] "Constant fold" the rs4gc-split-vector-values flag This flag was part of a migration to a new means of handling vectors-of-points which was described in the llvm-dev thread "FYI: Relocating vector of pointers". The old code path has been off by default for a while without complaints, so time to cleanup. llvm-svn: 261569	2016-02-22 21:01:28 +00:00
Philip Reames	79fa9b75c0	[RS4GC] Revert optimization attempt due to memory corruption This change reverts "246133 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers" and a follow on bugfix 12575. As pointed out in pr25846, this code suffers from a memory corruption bug. Since I'm (empirically) not going to get back to this any time soon, simply reverting the problematic change is the right answer. llvm-svn: 261565	2016-02-22 20:45:56 +00:00
Justin Lebar	ccbd8f5a02	Revert "[attrs] Handle convergent CallSites." This reverts r261544, which was causing a test failure in Transforms/FunctionAttrs/readattrs.ll. llvm-svn: 261549	2016-02-22 18:24:43 +00:00
Justin Lebar	7bf9187abb	[attrs] Handle convergent CallSites. Summary: Previously we had a notion of convergent functions but not of convergent calls. This is insufficient to correctly analyze calls where the target is unknown, e.g. indirect calls. Now a call is convergent if it targets a known-convergent function, or if it's explicitly marked as convergent. As usual, we can remove convergent where we can prove that no convergent operations are performed in the call. Reviewers: chandlerc, jingyue Subscribers: hfinkel, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17317 llvm-svn: 261544	2016-02-22 17:51:35 +00:00
Benjamin Kramer	451f54cf62	Fix some abuse of auto flagged by clang's -Wrange-loop-analysis. llvm-svn: 261524	2016-02-22 13:11:58 +00:00
Elena Demikhovsky	9914dbd11b	Allow setting MaxRerollIterations above 16 By Ayal Zaks. Differential Revision http://reviews.llvm.org/D17258 llvm-svn: 261517	2016-02-22 09:38:28 +00:00
Duncan P. N. Exon Smith	e9bc579c37	ADT: Remove == and != comparisons between ilist iterators and pointers I missed == and != when I removed implicit conversions between iterators and pointers in r252380 since they were defined outside ilist_iterator. Since they depend on getNodePtrUnchecked(), they indirectly rely on UB. This commit removes all uses of these operators. (I'll delete the operators themselves in a separate commit so that it can be easily reverted if necessary.) There should be NFC here. llvm-svn: 261498	2016-02-21 20:39:50 +00:00
Duncan P. N. Exon Smith	ec6f7fed54	TransformUtils: Avoid getNodePtrUnchecked() in integer division, NFC Stop relying on `getNodePtrUnchecked()` being useful on invalid iterators. This function is documented to be for internal use only, and the pointer type will eventually have to change to remove UB from ilist_iterator. Instead, check the iterator before it has been invalidated. llvm-svn: 261497	2016-02-21 20:14:29 +00:00
Sanjay Patel	2440130437	fix inaccurate comment; NFC llvm-svn: 261484	2016-02-21 17:33:31 +00:00
Sanjay Patel	368ac5dbf7	[InstCombine] add getNegativeIsTrueBoolVec() helper function; NFC Originally part of: http://reviews.llvm.org/D17485 We need this when simplifying masked memory ops too. llvm-svn: 261483	2016-02-21 17:29:33 +00:00
Sanjoy Das	979a11d5b2	[LoopDeletion] Add an assert that verifies LCSSA This is inspired by PR24804 -- had this assert been there before, isolating the root cause for PR24804 would have been far easier. llvm-svn: 261481	2016-02-21 17:11:59 +00:00
Simon Pilgrim	471efd244a	[InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element llvm-svn: 261460	2016-02-20 23:17:35 +00:00
Benjamin Kramer	7d537ae747	[SimplifyCFG] Use pointer identity to simplify predicate. No functional change intended. llvm-svn: 261427	2016-02-20 10:40:42 +00:00
David Majnemer	1efa23ddab	[SimplifyCFG] Merge together cleanuppads Cleanuppads may be merged together if one is the only predecessor of the other in which case a simple transform can be performed: replace the a cleanupret with a branch and remove an unnecessary cleanuppad. Differential Revision: http://reviews.llvm.org/D17459 llvm-svn: 261390	2016-02-20 01:07:45 +00:00
Hans Wennborg	a0f7090563	Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368	2016-02-19 21:40:12 +00:00
Matthew Simpson	29c997c1a1	[LV] Vectorize first-order recurrences This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346	2016-02-19 17:56:08 +00:00
Silviu Baranga	ad1dafb2c3	[LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 llvm-svn: 261331	2016-02-19 15:46:10 +00:00
Chandler Carruth	31088a9d58	[LPM] Factor all of the loop analysis usage updates into a common helper routine. We were getting this wrong in small ways and generally being very inconsistent about it across loop passes. Instead, let's have a common place where we do this. One minor downside is that this will require some analyses like SCEV in more places than they are strictly needed. However, this seems benign as these analyses are complete no-ops, and without this consistency we can in many cases end up with the legacy pass manager scheduling deciding to split up a loop pass pipeline in order to run the function analysis half-way through. It is very, very annoying to fix these without just being very pedantic across the board. The only loop passes I've not updated here are ones that use AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer. They seemed less relevant. With this patch, almost all of the problems in PR24804 around loop pass pipelines are fixed. The one remaining issue is that we run simplify-cfg and instcombine in the middle of the loop pass pipeline. We've recently added some loop variants of these passes that would seem substantially cleaner to use, but this at least gets us much closer to the previous state. Notably, the seven loop pass managers is down to three. I've not updated the loop passes using LoopAccessAnalysis because that analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't clear that those transforms want to support those forms anyways. They all run late anyways, so this is harmless. Similarly, LSR is left alone because it already carefully manages its forms and doesn't need to get fused into a single loop pass manager with a bunch of other loop passes. LoopReroll didn't use loop simplified form previously, and I've updated the test case to match the trivially different output. Finally, I've also factored all the pass initialization for the passes that use this technique as well, so that should be done regularly and reliably. Thanks to James for the help reviewing and thinking about this stuff, and Ben for help thinking about it as well! Differential Revision: http://reviews.llvm.org/D17435 llvm-svn: 261316	2016-02-19 10:45:18 +00:00
Chandler Carruth	ac07270828	[AA] Preserve the AA results wrapper pass as well as BasicAA in a few more places to prevent gratuitous re-"runs" of these passes. The passes themselves don't do any work when run, but we keep spending time scheduling and running these needlessly when we really don't need to do so. This is the first patch towards fixing the really horrible loop pass pipeline fragmentation pointed out by Sanjoy in PR24804. llvm-svn: 261302	2016-02-19 03:12:14 +00:00
Lawrence Hu	84e6f1dd70	Bug fix: use dyn_cast_or_null instead of dyn_cast Differential Revision: http://reviews.llvm.org/D17154 llvm-svn: 261299	2016-02-19 02:17:07 +00:00
Richard Trieu	7a08381403	Remove uses of builtin comma operator. Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270	2016-02-18 22:09:30 +00:00
Adam Nemet	9d9cb274ea	[PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). Obviously the pass still only used from PPC at this point. Subsequent patches will start driving this from ARM64 as well. Due to the previous patch most lines should show up as moved lines. llvm-svn: 261265	2016-02-18 21:38:19 +00:00
Matthew Simpson	92821cb4a8	Reapply commit r259357 with a fix for PR26629 Commit r259357 was reverted because it caused PR26629. We were assuming all roots of a vectorizable tree could be truncated to the same width, which is not the case in general. This commit reapplies the patch along with a fix and a new test case to ensure we don't regress because of this issue again. This should fix PR26629. llvm-svn: 261212	2016-02-18 14:14:40 +00:00
Chandler Carruth	9c4ed175c2	[PM] Port the PostOrderFunctionAttrs pass to the new pass manager and convert one test to use this. This is a particularly significant milestone because it required a working per-function AA framework which can be queried over each function from within a CGSCC transform pass (and additionally a module analysis to be accessible). This is essentially the point of the entire pass manager rewrite. A CGSCC transform is able to query for multiple different function's analysis results. It works. The whole thing appears to actually work and accomplish the original goal. While we were able to hack function attrs and basic-aa to "work" in the old pass manager, this port doesn't use any of that, it directly leverages the new fundamental functionality. For this to work, the CGSCC framework also has to support SCC-based behavior analysis, etc. The only part of the CGSCC pass infrastructure not sorted out at this point are the updates in the face of inlining and running function passes that mutate the call graph. The changes are pretty boring and boiler-plate. Most of the work was factored into more focused preperatory patches. But this is what wires it all together. llvm-svn: 261203	2016-02-18 11:03:11 +00:00
Junmo Park	80440eb804	Minor code cleanup. NFC. llvm-svn: 261200	2016-02-18 10:09:20 +00:00
Kostya Serebryany	d4590c7304	[sanitizer-coverage] implement -fsanitize-coverage=trace-pc. This is similar to trace-bb, but has a different API. We already use the equivalent flag in GCC for Linux kernel fuzzing. We may be able to use this flag with AFL too llvm-svn: 261159	2016-02-17 21:34:43 +00:00
Amaury Sechet	da71cb7b92	NFC: Fix formating llvm-svn: 261156	2016-02-17 21:21:29 +00:00
Haicheng Wu	57e1a3e6ee	[LIR] Avoid turning non-temporal stores into memset This is to fix PR26645. llvm-svn: 261149	2016-02-17 21:00:06 +00:00
Adrian Prantl	a5b2a64980	Debug Info: Teach LdStHasDebugValue() (Local.cpp) about DIExpressions. This function is used to check whether a dbg.value intrinsic has already been inserted, but without comparing the DIExpression, it would erroneously fire on split aggregates and only the first scalar would survive. Found via http://reviews.llvm.org/D16867. <rdar://problem/24456528> llvm-svn: 261145	2016-02-17 20:02:25 +00:00
Elena Demikhovsky	88e76cad16	Create masked gather and scatter intrinsics in Loop Vectorizer. Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140	2016-02-17 19:23:04 +00:00
Amaury Sechet	61a7d629ec	Fix load alignement when unpacking aggregates structs Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it. Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17326 llvm-svn: 261139	2016-02-17 19:21:28 +00:00
David Majnemer	f48bcb2bd9	Revert "Reapply commit r258404 with fix." This reverts commit r259357, it caused PR26629. llvm-svn: 261137	2016-02-17 19:02:36 +00:00
Frederic Riss	009d60650d	[ObjCARC] Handle ARCInstKind::ClaimRV in OptimizeIndividualCalls. When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the ARC optimizer in r258970, one case was missed which would lead the optimizer to execute an llvm_unreachable. In this case, just handle ClaimRV in the same way we handle RetainRV. llvm-svn: 261134	2016-02-17 18:51:27 +00:00
Mehdi Amini	1db10ac6ce	Define the ThinLTO Pipeline (experimental) Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel (even if it is not totally "free": projects like clang are reusing product from the "compile phase" for multiple link, think about libLLVMSupport reused for opt, llc, etc.). This pipeline is based on the proposal in D13443 for full LTO. We didn't move forward on this proposal because the LTO link was far too long after that. We believe that we can afford it with ThinLTO. The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. It is not clear if this part of the pipeline will stay as is, as the split model of ThinLTO does not allow the same benefit as FullLTO without added tricks. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO and it will evolve, but this should provide a good starting point. Reviewers: tejohnson Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261029	2016-02-16 23:02:29 +00:00
Mehdi Amini	ec8bee1879	Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC) It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 261028	2016-02-16 22:54:27 +00:00
Junmo Park	6ebdc14cf1	[SCEVExpander] Make findExistingExpansion smarter Summary: Extending findExistingExpansion can use existing value in ExprValueMap. This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark) Reviewers: mzolotukhin, sanjoy, zzheng Differential Revision: http://reviews.llvm.org/D15559 llvm-svn: 260938	2016-02-16 06:46:58 +00:00
Silviu Baranga	ec7063ac77	[LV] Add support for insertelt/extractelt processing during type truncation Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893	2016-02-15 15:38:17 +00:00
Roman Gareev	036c08874a	Tweak the LICM code to reuse the first sub-loop instead of creating a new one LICM starts with an empty AST, and then merges in each sub-loop. While the add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for sub-loop 1. If the AST starts off empty, we can just clone/move the contents of the subloop into the containing AST. Reviewed-by: Philip Reames <listmail@philipreames.com> Differential Revision: http://reviews.llvm.org/D16753 llvm-svn: 260892	2016-02-15 14:48:50 +00:00
Benjamin Kramer	8a752e316d	Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way. llvm-svn: 260824	2016-02-13 16:01:12 +00:00
Chandler Carruth	632d208c78	[attrs] Move the norecurse deduction to operate on the node set rather than the SCC object, and have it scan the instruction stream directly rather than relying on call records. This makes the behavior of this routine consistent between libc routines and LLVM intrinsics for libc routines. We can go and start teaching it about those being norecurse, but we should behave the same for the intrinsic and the libc routine rather than differently. I chatted with James Molloy and the inconsistency doesn't seem intentional and likely is due to intrinsic calls not being modelled in the call graph analyses. This also fixes a bug where we would deduce norecurse on optnone functions, when generally we try to handle optnone functions as-if they were replaceable and thus unanalyzable. llvm-svn: 260813	2016-02-13 08:47:51 +00:00
Keno Fischer	7c7c3e3591	[Cloning] Clone every Function's Debug Info Summary: Export the CloneDebugInfoMetadata utility, which clones all debug info associated with a function into the first module. Also use this function in CloneModule on each function we clone (the CloneFunction entrypoint already does this). Without this, cloning a module will lead to DI quality regressions, especially since r252219 reversed the Function <-> DISubprogram edge (before we could get lucky and have this edge preserved if the DISubprogram itself was, e.g. due to location metadata). This was verified to fix missing debug information in julia and a unittest to verify the new behavior is included. Patch by Yichao Yu! Thanks! Reviewers: loladiro, pcc Differential Revision: http://reviews.llvm.org/D17165 llvm-svn: 260791	2016-02-13 02:04:29 +00:00
Chad Rosier	81362a8599	[LIR] Allow merging of memsets in negatively strided loops. Last part of PR25166. llvm-svn: 260732	2016-02-12 21:03:23 +00:00
Justin Lebar	6086c6a387	Fix typo in comment. llvm-svn: 260731	2016-02-12 21:01:37 +00:00
Justin Lebar	db63949e8d	[SimplifyCFG] Don't fold conditional branches that contain calls to convergent functions. Summary: Performing this optimization duplicates the call to the convergent function and adds new control-flow dependencies, which is a no-no. Reviewers: jingyue Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17128 llvm-svn: 260730	2016-02-12 21:01:36 +00:00
Justin Lebar	df04d2a1f1	[LoopRotate] Don't perform loop rotation if the loop header calls a convergent function. Summary: Calls to convergent functions can be duplicated, but only if the duplicates are not control-flow dependent on any additional values. Loop rotation doesn't meet the bar. Reviewers: jingyue Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune Differential Revision: http://reviews.llvm.org/D17127 llvm-svn: 260729	2016-02-12 21:01:33 +00:00
David Majnemer	01674939b2	Remove unused variable llvm-svn: 260722	2016-02-12 20:33:51 +00:00
Philip Reames	96fccc2d09	[GVN] Common code for local and non-local load availability [NFCI] The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case. The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first. Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread. Differential Revision: http://reviews.llvm.org/D16608 llvm-svn: 260711	2016-02-12 19:24:57 +00:00
Chad Rosier	4acff96646	[LIR] Partially revert r252926(NFC), which introduced a very subtle change. In short, before r252926 we were comparing an unsigned (StoreSize) against an a APInt (Stride), which is fine and well. After we were zero extending the Stride and then converting to an unsigned, which is not the same thing. Obviously, Stides can also be negative. This commit just restores the original behavior. AFAICT, it's not possible to write a test case to expose the issue because the code already has checks to make sure the StoreSize can't overflow an unsigned (which prevents the Stride from overflowing an unsigned as well). llvm-svn: 260706	2016-02-12 19:05:27 +00:00
David Majnemer	0f0abc7bc2	[InstCombine] Don't aggressively replace xor with icmp For some cases, InstCombine replaces the sequence of xor/sub instruction followed by cmp instruction into a single cmp instruction. However, this replacement may result suboptimal result especially when the xor/sub has more than one use, as discussed in bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465). This patch make the replacement happen only when xor/sub has only one use. Differential Revision: http://reviews.llvm.org/D16915 Patch by Taewook Oh! llvm-svn: 260695	2016-02-12 18:12:38 +00:00
Chandler Carruth	3937bc70f9	[attrs] Simplify the convergent removal to directly use the pre-built node set rather than walking the SCC directly. This directly exposes the functions and has already had null entries filtered out. We also don't need need to handle optnone as it has already been handled in the caller -- we never try to remove convergent when there are optnone functions in the SCC. With this change, the code for removing convergent should work with the new pass manager and a different SCC analysis. llvm-svn: 260668	2016-02-12 09:47:49 +00:00
Chandler Carruth	057df3d423	[attrs] Consolidate the test for a non-SCC, non-convergent function call with the test for a non-convergent intrinsic call. While it is possible to use the call records to search for function calls, we're going to do an instruction scan anyways to find the intrinsics, we can handle both cases while scanning instructions. This will also make the logic more amenable to the new pass manager which doesn't use the same call graph structure. My next patch will remove use of CallGraphNode entirely and allow this code to work with both the old and new pass manager. Fortunately, it should also get strictly simpler without changing functionality. llvm-svn: 260666	2016-02-12 09:23:53 +00:00
Chandler Carruth	bbbbec0b54	[attrs] Run clang-format over a newly added routine in function-attrs before I update it to be friendly with the new pass manager. llvm-svn: 260653	2016-02-12 03:07:50 +00:00
Evgeniy Stepanov	ba6ca87ffb	[msan] Put msan constructor in a comdat. MSan adds a constructor to each translation unit that calls __msan_init, and does nothing else. The idea is to run __msan_init before any instrumented code. This results in multiple constructors and multiple .init_array entries in the final binary, one per translation unit. This is absolutely unnecessary; one would be enough. This change moves the constructors to a comdat group in order to drop the extra ones. llvm-svn: 260632	2016-02-12 00:37:52 +00:00
Matthew Simpson	a4e43c5b51	[SLP] Add debug output for extract cost (NFC) llvm-svn: 260614	2016-02-11 23:06:40 +00:00
Quentin Colombet	490cfbe2a2	Re-apply r238452, the bug was in clang and was fixed in r260567. Original commit message: [InstCombine] Fold IntToPtr and PtrToInt into preceding loads. Currently we only fold a BitCast into a Load when the BitCast is its only user. Do the same for any no-op cast. Patch by Philip Pfaffe! Differential Revision: http://reviews.llvm.org/D9152 llvm-svn: 260612	2016-02-11 22:30:41 +00:00
Mehdi Amini	9c1c3ac627	Revert "Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()"" This reverts commit r260603. I didn't intend to push it :( From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260607	2016-02-11 22:09:11 +00:00
Mehdi Amini	c5bf5ecc1b	Revert "Define the ThinLTO Pipeline" This reverts commit r260604. I didn't intend to push this now. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260606	2016-02-11 22:09:07 +00:00
Mehdi Amini	484470d605	Define the ThinLTO Pipeline Summary: On the contrary to Full LTO, ThinLTO can afford to shift compile time from the frontend to the linker: both phases are parallel. This pipeline is based on the proposal in D13443 for full LTO. We ] didn't move forward on this proposal because the link was far too long after that. This patch refactor the "function simplification" passes that are part of the inliner loop in a helper function (this part is NFC and can be commited separately to simplify the diff). The ThinLTO pipeline integrates in the regular O2/O3 flow: - The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined. - The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes. The measurements on the public test suite as well as on our internal suite show an overall net improvement. The binary size for the clang executable is reduced by 5%. We're still tuning it with the bringup of ThinLTO but this should provide a good starting point. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits, dexonsmith Differential Revision: http://reviews.llvm.org/D17115 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260604	2016-02-11 22:00:31 +00:00
Mehdi Amini	f9a3718c5a	Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" It is intended to contains the passes run over a function after the inliner is done with a function and before it moves to its callers. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260603	2016-02-11 22:00:25 +00:00
Pete Cooper	5562c333b8	Set load alignment on aggregate loads. When optimizing a extractvalue(load), we generate a load from the aggregate type. This load didn't have alignment set and so would get the alignment of the type. This breaks when the type is packed and so the alignment should be lower. For example, loading { int, int } would give us alignment of 4, but the original load from this type may have an alignment of 1 if packed. Reviewed by David Majnemer Differential revision: http://reviews.llvm.org/D17158 llvm-svn: 260587	2016-02-11 21:10:40 +00:00
Jun Bum Lim	10e58e867d	Fixed typo in r260530 llvm-svn: 260541	2016-02-11 16:46:13 +00:00
Jun Bum Lim	339e9723c1	[InstCombine] Simplify a known nonzero incoming value of PHI Summary: When a PHI is used only to be compared with zero, it is possible to replace an incoming value with any non-zero constant if the incoming value can be proved as a known nonzero value. For example, in below code, we can replace the incoming value %v with any non-zero constant based on the fact that the PHI is only used to be compared with zero and %v is a known non-zero value: %v = select %cond, 1, 2 %p = phi [%v, BB] ... %c = icmp eq, %p, 0 Reviewers: mcrosier, jmolloy, sanjoy Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry Differential Revision: http://reviews.llvm.org/D16240 llvm-svn: 260530	2016-02-11 15:50:07 +00:00
Tamas Berghammer	d12f31535a	Fix MSVC 2013 build after rL260504 llvm-svn: 260511	2016-02-11 11:27:51 +00:00
Artur Pilipenko	44e7c51b05	Don't propagate dereferenceable attribute through gc.relocate in InstCombine Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16143 llvm-svn: 260509	2016-02-11 11:22:46 +00:00
Ashutosh Nema	2260a3a046	Fixed typo in comment & coding style for LoopVersioningLICM. llvm-svn: 260504	2016-02-11 09:23:53 +00:00
Teresa Johnson	41806854cc	Fix Windows bot failure in Transforms/FunctionImport/funcimport.ll Make sure we split ":" from the end of the global function id (which is <path>:<function> for local functions) instead of the beginning to avoid splitting at the wrong place for Windows file paths that contain a ":". llvm-svn: 260469	2016-02-10 23:47:38 +00:00
Mehdi Amini	4064174892	FunctionImport: add a progressive heuristic to limit importing too deep in the callgraph The current function importer will walk the callgraph, importing transitively any callee that is below the threshold. This can lead to import very deep which is costly in compile time and not necessarily beneficial as most of the inline would happen in imported function and not necessarilly in user code. The actual factor has been carefully chosen by flipping a coin ;) Some tuning need to be done (just at the existing limiting threshold). Reviewers: tejohnson Differential Revision: http://reviews.llvm.org/D17082 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260466	2016-02-10 23:31:45 +00:00
Mehdi Amini	c87d7d02e1	Use a StringSet in Internalize, and allow to create the pass from an existing one (NFC) There is not reason to pass an array of "char *" to rebuild a set if the client already has one. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 260462	2016-02-10 23:24:31 +00:00
Philip Reames	d59638e14f	Follow up to 260439, Speculative fix to clang builders It looks like clang has a couple of test cases which caught the fact LVI was not slightly more precise after 260439. When looking at the failures, it struck me as wasteful to be querying nullness of a constant via LVI, so instead of tweaking the clang tests, let's just stop querying constants from this source. llvm-svn: 260451	2016-02-10 22:22:41 +00:00
Teresa Johnson	e1164de5d0	Restore "[ThinLTO] Use MD5 hash in function index." with fix This restores commit r260408, along with a fix for a bot failure. The bot failure was caused by dereferencing a unique_ptr in the same call instruction parameter list where it was passed via std::move. Apparently due to luck this was not exposed when I built the compiler with clang, only with gcc. llvm-svn: 260442	2016-02-10 21:55:02 +00:00
Teresa Johnson	89f38fb5cc	Revert "[ThinLTO] Use MD5 hash in function index." due to bot failure This reverts commit r260408. Bot failure that I need to investigate. llvm-svn: 260412	2016-02-10 19:11:15 +00:00
Teresa Johnson	0919a84071	[ThinLTO] Use MD5 hash in function index. Summary: This patch uses the lower 64-bits of the MD5 hash of a function name as a GUID in the function index, instead of storing function names. Any local functions are first given a global name by prepending the original source file name. This is the same naming scheme and GUID used by PGO in the indexed profile format. This change has a couple of benefits. The primary benefit is size reduction in the combined index file, for example 483.xalancbmk's combined index file was reduced by around 70%. It should also result in memory savings for the index file in memory, as the in-memory map is also indexed by the hash instead of the string. Second, this enables integration with indirect call promotion, since the indirect call profile targets are recorded using the same global naming convention and hash. This will enable the function importer to easily locate function summaries for indirect call profile targets to enable their import and subsequent promotion. The original source file name is recorded in the bitcode in a new module-level record for use in the ThinLTO backend pipeline. Reviewers: davidxl, joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D17028 llvm-svn: 260408	2016-02-10 18:57:54 +00:00
Rong Xu	13b01dc8d9	[PGO] Indirect-call profile annotation in IR level profiling This patch reads the indirect-call value records in the profile and makes the annotation in the indirect-call instruction. This is for IR level profile instrumentation. Differential Revision: http://reviews.llvm.org/D16935 llvm-svn: 260400	2016-02-10 18:24:45 +00:00
Teresa Johnson	488a800a4c	[ThinLTO] Move global processing from Linker to TransformUtils (NFC) Summary: As discussed on IRC, move the ThinLTOGlobalProcessing code out of the linker, and into TransformUtils. The name of the class is changed to FunctionImportGlobalProcessing. Reviewers: joker.eph, rafael Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D17081 llvm-svn: 260395	2016-02-10 18:11:31 +00:00
Daniel Berlin	f6c9ae9c6d	Rename a member variable to be more accurate with how it is used llvm-svn: 260389	2016-02-10 17:41:25 +00:00
Daniel Berlin	932b4cbf5d	Constify two functions, make them accessible to unit tests llvm-svn: 260387	2016-02-10 17:39:43 +00:00
Rong Xu	33c76c0cc2	[PGO] Differentiate Clang instrumentation and IR level instrumentation profiles This patch uses one bit in profile version to differentiate Clang instrumentation and IR level instrumentation profiles. PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so that the compiler runtime can set the right profile kind. For Maco-O platform, we generate the variable as linkonce_odr linkage as COMDAT is not supported. PGOInstrumenation now checks this bit to make sure it's an IR level instrumentation profile. The patch was submitted as r260164 but reverted due to a Darwin test breakage. Original Differential Revision: http://reviews.llvm.org/D15540 Differential Revision: http://reviews.llvm.org/D17020 llvm-svn: 260385	2016-02-10 17:18:30 +00:00
Tom Stellard	6fa4e290d7	StructurizeCFG: Initialize SkipUniformRegions in the default constructor This should fix some random bot failures caused by r260336. llvm-svn: 260342	2016-02-10 01:10:09 +00:00
Tom Stellard	755a4e6b57	StructurizeCFG: Add an option for skipping regions with only uniform branches Summary: Tests for this will be added once the AMDGPU backend enables this option. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16602 llvm-svn: 260336	2016-02-10 00:39:37 +00:00
Justin Lebar	260854bfaf	Add convergent-removing bits to FunctionAttrs pass. Summary: Remove the convergent attribute on any functions which provably do not contain or invoke any convergent functions. After this change, we'll be able to modify clang to conservatively add 'convergent' to all functions when compiling CUDA. Reviewers: jingyue, joker.eph Subscribers: llvm-commits, tra, jhen, hfinkel, resistor, chandlerc, arsenm Differential Revision: http://reviews.llvm.org/D17013 llvm-svn: 260319	2016-02-09 23:03:22 +00:00
Peter Collingbourne	9b656527f1	Fix GCC build. llvm-svn: 260317	2016-02-09 23:01:38 +00:00
Peter Collingbourne	df49d1bbb2	WholeProgramDevirt: introduce. This pass implements whole program optimization of virtual calls in cases where we know (via bitset information) that the list of callees is fixed. This includes the following: - Single implementation devirtualization: if a virtual call has a single possible callee, replace all calls with a direct call to that callee. - Virtual constant propagation: if the virtual function's return type is an integer <=64 bits and all possible callees are readnone, for each class and each list of constant arguments: evaluate the function, store the return value alongside the virtual table, and rewrite each virtual call as a load from the virtual table. - Uniform return value optimization: if the conditions for virtual constant propagation hold and each function returns the same constant value, replace each virtual call with that constant. - Unique return value optimization for i1 return values: if the conditions for virtual constant propagation hold and a single vtable's function returns 0, or a single vtable's function returns 1, replace each virtual call with a comparison of the vptr against that vtable's address. Differential Revision: http://reviews.llvm.org/D16795 llvm-svn: 260312	2016-02-09 22:50:34 +00:00
Philip Reames	ea4d8e8ce9	[InstCombine][GC] Handle gc.relocations of vector type We introduced gc.relocates of vector-of-pointer types a couple of weeks back. Somehow, I missed updating the InstCombine rule to account for this. If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>. I also took the chance to do a bit of code style cleanup. llvm-svn: 260279	2016-02-09 21:09:22 +00:00
Sanjoy Das	10c8a04b80	[FunctionAttrs] Fix SCC logic around operand bundles FunctionAttrs does an "optimistic" analysis of SCCs as a unit, which means normally it is able to disregard calls from an SCC into itself. However, calls and invokes with operand bundles are allowed to have memory effects not fully described by the memory effects on the call target, so we can't be optimistic around operand-bundled calls from an SCC into itself. llvm-svn: 260244	2016-02-09 18:40:40 +00:00
Sanjoy Das	1c481f50d2	Add an "addUsedAAAnalyses" helper function Summary: Passes that call `getAnalysisIfAvailable<T>` also need to call `addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the legacy pass manager that it uses `T`. This contract was being violated by passes that used `createLegacyPMAAResults`. This change fixes this by exposing a helper in AliasAnalysis.h, `addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults and does the right thing when called from `getAnalysisUsage`. Reviewers: chandlerc Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17010 llvm-svn: 260183	2016-02-09 01:21:57 +00:00
Rong Xu	d0dfb67fe1	[PGO] Revert r260146 as it breaks Darwin platforms. r260146 \| xur \| 2016-02-08 13:07:46 -0800 (Mon, 08 Feb 2016) \| 13 lines [PGO] Differentiate Clang instrumentation and IR level instrumentation profiles llvm-svn: 260170	2016-02-08 23:11:16 +00:00
Michael Zolotukhin	1da4afdfc9	Factor out UnrollAnalyzer to Analysis, and add unit tests for it. Summary: Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests. The change itself is supposed to be NFC, except adding a couple of tests. I plan to add more tests as I add new functionality and find/fix bugs. Reviewers: chandlerc, hfinkel, sanjoy Subscribers: zzheng, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D16623 llvm-svn: 260169	2016-02-08 23:03:59 +00:00
Sanjay Patel	4d36bbaf19	rangify; NFC llvm-svn: 260151	2016-02-08 21:32:43 +00:00
Rong Xu	1288a19421	[PGO] Differentiate Clang instrumentation and IR level instrumentation profiles This patch uses one bit in profile version to differentiate Clang instrumentation and IR level instrumentation profiles. PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so that the compiler runtime can set the right profile kind. PGOInstrumenation now checks this bit to make sure it's an IR level instrumentation profile. Differential Revision: http://reviews.llvm.org/D15540 llvm-svn: 260146	2016-02-08 21:07:46 +00:00
Sanjay Patel	e08381a529	fix typos; NFC llvm-svn: 260130	2016-02-08 19:27:33 +00:00
Xinliang David Li	a82d6c0a4b	[PGO] Enable compression in pgo instrumentation This reduces sizes of instrumented object files, final binaries, process images, and raw profile data. The format of the indexed profile data remain the same. Differential Revision: http://reviews.llvm.org/D16388 llvm-svn: 260117	2016-02-08 18:13:49 +00:00
Silviu Baranga	ea63a7f512	[SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260112	2016-02-08 17:02:45 +00:00
Haicheng Wu	b35f772b90	[JumpThreading] Change a return of ComputeValueKnownInPredecessors() Change a return statement of ComputeValueKnownInPredecessors() to be the same as the rest return statements of the function. Otherwise, it might return true with an empty Result when the current basic block has no predecessors and trigger the first assert of JumpThreading::ProcessThreadableEdges(). llvm-svn: 260110	2016-02-08 17:00:39 +00:00
Igor Breger	1a39a34eae	[SLP] Fix placement of debug statement (NFC) By Ayal Zaks (ayal.zaks@intel.com) Differential Revision: http://reviews.llvm.org/D16976 llvm-svn: 260094	2016-02-08 14:11:39 +00:00
Silviu Baranga	41b4973329	Revert r260086 and r260085. They have broken the memory sanitizer bots. llvm-svn: 260087	2016-02-08 11:56:15 +00:00
Silviu Baranga	70a98bb9e8	[LoopVersioning] Don't assert when there are no memchecks We shouldn't assert when there are no memchecks, since we can have SCEV checks. There is already an assert covering the case where there are no SCEV checks or memchecks. This also changes the LAA pointer wrapping versioning test to use the loop versioning pass (this was how I managed to trigger the assert in the loop versioning pass). llvm-svn: 260086	2016-02-08 11:15:29 +00:00
Silviu Baranga	a35fadc7c4	[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260085	2016-02-08 10:45:50 +00:00
Maxim Ostapenko	b1e3f60fb9	[asan] Introduce new hidden -asan-use-private-alias option. As discussed in https://github.com/google/sanitizers/issues/398, with current implementation of poisoning globals we can have some CHECK failures or false positives in case of mixing instrumented and non-instrumented code due to ASan poisons innocent globals from non-sanitized binary/library. We can use private aliases to avoid such errors. In addition, to preserve ODR violation detection, we introduce new __odr_asan_gen_XXX symbol for each instrumented global that indicates if this global was already registered. To detect ODR violation in runtime, we should only check the value of indicator and report an error if it isn't equal to zero. Differential Revision: http://reviews.llvm.org/D15642 llvm-svn: 260075	2016-02-08 08:30:57 +00:00
Asaf Badouh	ad5c3fc47d	[X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode Differential Revision: http://reviews.llvm.org/D16629 llvm-svn: 260033	2016-02-07 14:59:13 +00:00
Daniel Berlin	905a646c24	Don't use module context here. It's unnecessary and makes it harder to write unittests llvm-svn: 260015	2016-02-07 02:03:39 +00:00
Daniel Berlin	1b51a2957d	Compute live-in for MemorySSA llvm-svn: 260014	2016-02-07 01:52:19 +00:00
Daniel Berlin	7898ca658e	Only insert into definingblocks once per block llvm-svn: 260013	2016-02-07 01:52:15 +00:00
Ashutosh Nema	df6763abe8	New Loop Versioning LICM Pass Summary: When alias analysis is uncertain about the aliasing between any two accesses, it will return MayAlias. This uncertainty from alias analysis restricts LICM from proceeding further. In cases where alias analysis is uncertain we might use loop versioning as an alternative. Loop Versioning will create a version of the loop with aggressive aliasing assumptions in addition to the original with conservative (default) aliasing assumptions. The version of the loop making aggressive aliasing assumptions will have all the memory accesses marked as no-alias. These two versions of loop will be preceded by a memory runtime check. This runtime check consists of bound checks for all unique memory accessed in loop, and it ensures the lack of memory aliasing. The result of the runtime check determines which of the loop versions is executed: If the runtime check detects any memory aliasing, then the original loop is executed. Otherwise, the version with aggressive aliasing assumptions is used. The pass is off by default and can be enabled with command line option -enable-loop-versioning-licm. Reviewers: hfinkel, anemet, chatur01, reames Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga, llvm-commits Differential Revision: http://reviews.llvm.org/D9151 llvm-svn: 259986	2016-02-06 07:47:48 +00:00
Michael Zolotukhin	73957179d3	[LoopUnrolling] Try harder to avoid rebuilding LCSSA when possible. In r255133 (reapplied r253126) we started to avoid redundant recomputation of LCSSA after loop-unrolling. This patch moves one step further in this direction - now we can avoid it for much wider range of loops, as we start to look at IR and try to figure out if the transformation actually breaks LCSSA phis or makes it necessary to insert new ones. Differential Revision: http://reviews.llvm.org/D16838 llvm-svn: 259869	2016-02-05 02:17:36 +00:00
Joseph Tremoulet	adc2376375	[RS4GC] Pass DenseMap by reference, NFC Summary: Passing the rematerialized values map to insertRematerializationStores by value looks to be a simple oversight; update it to pass by reference. Reviewers: reames, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16911 llvm-svn: 259867	2016-02-05 01:42:52 +00:00
Adam Nemet	9455c1d2b1	[LoopLoadElim] Don't allow versioning when optForSize This was requested in the review of D16300. llvm-svn: 259861	2016-02-05 01:14:05 +00:00
Wei Mi	a49559befb	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736	2016-02-04 01:27:38 +00:00
Gerolf Hoflehner	2432bd0ddd	[SimplifyCFG] Fix for "endless" loop after dead code removal (Alternative to D16251) Summary: This is a simpler fix to the problem than the dominator approach in http://reviews.llvm.org/D16251. It adds only values into the gather() while loop that have been seen before. The actual endless loop is in the constant compare gather() routine in Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the queue: %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i Here is what happens at the IR level: for.cond.i: ; preds = %if.end6.i, %if.end.i54 %ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ] %ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<< %cmp2.i = icmp ult i32 %ix.0.i, %11 br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit if.end6.i: ; preds = %for.body.i %cmp10.i = icmp ugt i32 %conv.i, %add9.i %.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<< When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i. The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i (Note the first ‘or’ operand is now %.ret.0.off0.i, and NOT %ret.0.off0.i). And now there is use of .ret.0.off0.i before a definition which triggers the “endless” loop in gather(): while(!DFT.empty()) { V = DFT.pop_back_val(); // V is .ret.0.off0.i if (Instruction *I = dyn_cast<Instruction>(V)) { // If it is a \|\| (or && depending on isEQ), process the operands. if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) { DFT.push_back(I->getOperand(1)); // This is now .ret.0.off0.i also DFT.push_back(I->getOperand(0)); continue; // “endless loop” for .ret.0.off0.i } Reviewers: reames, ahatanak Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16839 llvm-svn: 259730	2016-02-03 23:54:25 +00:00
Vedant Kumar	2d5b5d3d3a	[InstrProfiling] Fix a comment (NFC) llvm-svn: 259727	2016-02-03 23:22:43 +00:00
Junmo Park	e90057a5f3	Minor code cleanups. NFC. llvm-svn: 259725	2016-02-03 23:16:39 +00:00
David Majnemer	a53b5bbb18	[LoopStrengthReduce] Don't rewrite PHIs with incoming values from CatchSwitches Bail out if we have a PHI on an EHPad that gets a value from a CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is no good place to stick any instructions. This fixes PR26373. llvm-svn: 259702	2016-02-03 21:30:34 +00:00
Wei Mi	97de385868	Revert r259662, which caused regressions on polly tests. llvm-svn: 259675	2016-02-03 18:05:57 +00:00
Quentin Colombet	7ec03dc7f8	[InstCombine] Revert r238452: Fold IntToPtr and PtrToInt into preceding loads. According to git bisect, this is the root cause of a miscompile for Regex in libLLVMSupport. I am still working on reducing a test case. The actual bug may be elsewhere and this commit just exposed it. Anyway, at the moment, to reproduce, follow these steps: 1. Build clang and libLTO in release mode. 2. Create a new build directory <stage2> and cd into it. 3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts using -O2 -flto 4. Run llvm-extract -ralias '.bar' -S test/Other/extract-alias.ll Result: program doesn't contain global named '.bar'! Expected result: @a0a0bar = alias void ()* @bar @a0bar = alias void ()* @bar declare void @bar() Note: In step #3, if you don't use lto or asserts, the miscompile disappears. llvm-svn: 259674	2016-02-03 18:04:13 +00:00
Wei Mi	ed133978a0	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662	2016-02-03 17:05:12 +00:00
Peter Collingbourne	0c0d7e2d0f	LowerBitSets: Don't bother to do any work if the llvm.bitset.test intrinsic is unused. llvm-svn: 259625	2016-02-03 03:48:46 +00:00
Peter Collingbourne	83cc981c49	Add #include "llvm/Support/raw_ostream.h" to fix Windows build. llvm-svn: 259623	2016-02-03 03:16:37 +00:00
Peter Collingbourne	9f7ec14009	Transforms: Move GlobalOpt's Evaluator to Utils where it can be reused. llvm-svn: 259621	2016-02-03 02:51:00 +00:00
Adam Nemet	d52ed84160	[LoopVersioning] Expose loop versioning as a pass too Summary: LoopVersioning is a transform utility that transform passes can use to run-time disambiguate may-aliasing accesses. I'd like to also expose as pass to allow it to be unit-tested. I am planning to add support for non-aliasing annotation in LoopVersioning and I'd like to be able to write tests directly using this pass. (After that feature is done, the pass could also be used to look for optimization opportunities that are hidden behind incomplete alias information at compile time.) The pass drives LoopVersioning in its default way which is to fully disambiguate may-aliasing accesses no matter how many checks are required. Reviewers: hfinkel, ashutosh.nema, sbaranga Subscribers: zzheng, mssimpso, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D16612 llvm-svn: 259610	2016-02-03 00:06:10 +00:00
George Burgess IV	60adac46f2	Attempt #2 to unbreak r259595. llvm-svn: 259602	2016-02-02 23:26:01 +00:00
George Burgess IV	b5a229f779	Attempt to fix builds broken by r259595. llvm-svn: 259599	2016-02-02 23:15:26 +00:00
George Burgess IV	e1100f533f	This patch adds MemorySSA to LLVM. Please see include/llvm/Transforms/Utils/MemorySSA.h for a description of MemorySSA, and what it does. Differential Revision: http://reviews.llvm.org/D7864 llvm-svn: 259595	2016-02-02 22:46:49 +00:00
Anna Zaks	3b50e70bbe	[asan] Add iOS support to AddressSanitzier Differential Revision: http://reviews.llvm.org/D15625 llvm-svn: 259586	2016-02-02 22:05:07 +00:00
Eugene Zelenko	ecefe5a81f	Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes. Differential revision: http://reviews.llvm.org/D16793 llvm-svn: 259539	2016-02-02 18:20:45 +00:00
Sanjay Patel	4b198802b3	function names start with a lowercase letter; NFC llvm-svn: 259425	2016-02-01 22:23:39 +00:00
Sanjay Patel	103ab7d571	[InstCombine] simplify masked scatter/gather intrinsics with zero masks A masked scatter with a zero mask means there's no store. A masked gather with a zero mask means the passthru arg is returned. This is a continuation of: http://reviews.llvm.org/rL259369 http://reviews.llvm.org/rL259392 llvm-svn: 259421	2016-02-01 22:10:26 +00:00
Sanjay Patel	04f792bdc9	[InstCombine] simplify masked store intrinsics with all ones or zeros masks A masked store with a zero mask means there's no store. A masked store with an allOnes mask means it's a normal vector store. This is a continuation of: http://reviews.llvm.org/rL259369 llvm-svn: 259392	2016-02-01 19:39:52 +00:00
David Majnemer	f8853ae7b3	[InstCombine] Don't transform (X+INT_MAX)>=(Y+INT_MAX) -> (X<=Y) This miscompile came about because we tried to use a transform which was only appropriate for xor operators when addition was present. This fixes PR26407. llvm-svn: 259375	2016-02-01 17:37:56 +00:00
Sanjay Patel	b695c5557c	[InstCombine] simplify masked load intrinsics with all ones or zeros masks A masked load with a zero mask means there's no load. A masked load with an allOnes mask means it's a normal vector load. Differential Revision: http://reviews.llvm.org/D16691 llvm-svn: 259369	2016-02-01 17:00:10 +00:00
Matthew Simpson	73dad62174	[LV] Rename RdxPHIsToFix to PHIsToFix (NFC) In the future, we will vectorize recurrences other than reductions. This patch renames a few variables and updates their associated comments to enable them to be reused for non-reduction PHI nodes. This change was requested in the review for D16197. llvm-svn: 259364	2016-02-01 16:07:01 +00:00
Matthew Simpson	c578d67407	Reapply commit r258404 with fix. The previous patch caused PR26364. The fix is to ensure that we don't enter a cycle when iterating over use-def chains. llvm-svn: 259357	2016-02-01 13:38:29 +00:00
Sanjay Patel	0069f56e33	add helper function for minnum/maxnum ; NFC llvm-svn: 259326	2016-01-31 16:35:23 +00:00
Sanjay Patel	8af7fbc34c	use range-based for loop; NFC llvm-svn: 259325	2016-01-31 16:34:48 +00:00
Sanjay Patel	690955fcbc	fix formatting; NFC llvm-svn: 259324	2016-01-31 16:34:11 +00:00
Sanjay Patel	24b77d11bc	simplify; NFC llvm-svn: 259323	2016-01-31 16:33:33 +00:00
Craig Topper	2ed5369424	Convert int to Twine instead of using utostr since it was already being added to a Twine. NFC llvm-svn: 259308	2016-01-31 00:15:35 +00:00
Matt Arsenault	56c079f393	InstCombine: fabs(x) * fabs(x) -> x * x llvm-svn: 259295	2016-01-30 05:02:00 +00:00
Matthias Braun	b30f2f5141	Avoid overly large SmallPtrSet/SmallSet These sets perform linear searching in small mode so it is never a good idea to use SmallSize/N bigger than 32. llvm-svn: 259283	2016-01-30 01:24:31 +00:00
Sanjay Patel	6038d3e5c6	function names start with a lower case letter ; NFC llvm-svn: 259264	2016-01-29 23:27:03 +00:00
Sanjay Patel	f9f5d3cc45	fix formatting; NFC llvm-svn: 259262	2016-01-29 23:14:58 +00:00
Fiona Glaser	36e8230db0	Fix typo in LoopSimplifyCFG llvm-svn: 259261	2016-01-29 23:12:52 +00:00
Fiona Glaser	b417d464e6	Add LoopSimplifyCFG pass Loop transformations can sometimes fail because the loop, while in valid rotated LCSSA form, is not in a canonical CFG form. This is an extremely simple pass that just merges obviously redundant blocks, which can be used to fix some known failure cases. In the future, it may be enhanced with more cases (and have code shared with SimplifyCFG). This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll, so that SimplifyCFG cleans up the loop before Rotate tries to run. Not currently used in the pass manager, since this pass doesn't do anything unless you can hook it up in an LPM with other loop passes. It'll be added once Chandler cleans up things to allow this. Tested in a custom pipeline out of tree to confirm it works in practice (in addition to the included trivial test). llvm-svn: 259256	2016-01-29 22:35:36 +00:00
Sanjay Patel	66fff73c76	[InstCombine] avoid an insertelement transformation that induces the opposite extractelement fold (PR26354) We would infinite loop because we created a shufflevector that was wider than needed and then failed to combine that with the insertelement. When subsequently visiting the extractelement from that shuffle, we see that it's unnecessary, delete it, and trigger another visit to the insertelement. llvm-svn: 259236	2016-01-29 20:21:02 +00:00
David Majnemer	75f492e7f1	Fix the build llvm-svn: 259215	2016-01-29 17:46:57 +00:00
Matthew Simpson	53d00ef874	[SLP] Fix printing of debug statement (NFC) llvm-svn: 259212	2016-01-29 17:21:38 +00:00
Sanjoy Das	c816f03b70	[RS4GC] Address post-commit review on r259208 from David NFC llvm-svn: 259211	2016-01-29 17:20:49 +00:00
Sanjoy Das	565f7866ac	[RS4GC] Remove unnecessary const_cast; NFC GCRelocateInst::getDerivedPtr already returns a non-const llvm::Value pointer. llvm-svn: 259209	2016-01-29 16:54:49 +00:00
Sanjoy Das	3794eeb8bb	[RS4GC] Minor local cleanup to StabilizeOrder; NFC - Locally declare struct, and call it BaseDerivedPair - Use a lambda to compare, instead of a singleton with uninitialized fields - Add a constructor to BaseDerivedPair and use SmallVector::emplace_back llvm-svn: 259208	2016-01-29 16:50:34 +00:00
David Majnemer	b2416bd2a7	Revert "Reapply commit r258404 with fix" This reverts commit r258929, it caused PR26364. llvm-svn: 259148	2016-01-29 02:43:22 +00:00
Philip Reames	10e678d25a	[GVN] Add clarifying assert [NFCI] Just adding an assert which makes invariants between AnalyzeLoadsFromClobberingLoads and GetLoadValueForLoad slightly more clear. llvm-svn: 259145	2016-01-29 02:23:10 +00:00
Sanjoy Das	bcf27523f5	[RS4GC] Minor cleanups enabled by the previous change; NFC llvm-svn: 259133	2016-01-29 01:03:20 +00:00
Sanjoy Das	4099297856	[RS4GC] Delete code that is dead due to r259129; NFC llvm-svn: 259132	2016-01-29 01:03:17 +00:00
Sanjoy Das	0407108020	[RS4GC] Clamp UseDeoptBundles to true and update tests The full diff for the test directory may be hard to read because of the filename clash; so here's all that happened as far as the tests are concerned: ``` cd test/Transforms/RewriteStatepointsForGC git rm ll git mv deopt-bundles/ ./ rmdir deopt-bundles find . -name '*.ll' \| xargs gsed -i 's/-rs4gc-use-deopt-bundles //g' ``` llvm-svn: 259129	2016-01-29 00:28:57 +00:00
Sanjoy Das	bb04f6e28f	[PlaceSafepoints] Use DEBUG() instead of TraceLSP DEBUG() is the more idiomatic LLVM style. llvm-svn: 259121	2016-01-28 23:49:27 +00:00
Sanjoy Das	cd23fec756	[PlaceSafepoints] Misc. minor cleanups; NFC These changes are aimed at bringing PlaceSafepoints up to code with the LLVM coding guidelines: - Fix variable naming - Use DenseSet instead of std::set - Remove dead code - Minor local code simplifications llvm-svn: 259112	2016-01-28 23:03:19 +00:00

... 5 6 7 8 9 ...

14972 Commits