llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	57783730fd	Revert "[opt-remarks] If hotness threshold is set, ignore remarks without hotness" This reverts commit r319556. Something is not working with this when used with sample-based profiling. Investigating... llvm-svn: 319562	2017-12-01 18:12:29 +00:00
Adam Nemet	8d1fc2b65b	[opt-remarks] If hotness threshold is set, ignore remarks without hotness These are blocks that haven't not been executed during training. For large projects this could make a significant difference. For the project, I was looking at, I got an order of magnitude decrease in the size of the total YAML files with this and r319235. Differential Revision: https://reviews.llvm.org/D40678 llvm-svn: 319556	2017-12-01 17:02:04 +00:00
Florian Hahn	30932a3c16	[InstSimplify] More fcmp cases when comparing against negative constants. Summary: For known positive non-zero value X: fcmp uge X, -C => true fcmp ugt X, -C => true fcmp une X, -C => true fcmp oeq X, -C => false fcmp ole X, -C => false fcmp olt X, -C => false Patch by Paul Walker. Reviewers: majnemer, t.p.northover, spatel, RKSimon Reviewed By: spatel Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D40012 llvm-svn: 319538	2017-12-01 12:34:16 +00:00
Zachary Turner	8065f0b975	Mark all library options as hidden. These command line options are not intended for public use, and often don't even make sense in the context of a particular tool anyway. About 90% of them are already hidden, but when people add new options they forget to hide them, so if you were to make a brand new tool today, link against one of LLVM's libraries, and run tool -help you would get a bunch of junk that doesn't make sense for the tool you're writing. This patch hides these options. The real solution is to not have libraries defining command line options, but that's a much larger effort and not something I'm prepared to take on. Differential Revision: https://reviews.llvm.org/D40674 llvm-svn: 319505	2017-12-01 00:53:10 +00:00
Dan Gohman	59e4c0b938	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482	2017-11-30 22:10:53 +00:00
Davide Italiano	9d939c8f19	[InlineCost] Prefer getFunction() to two calls to getParent(). Improves clarity, also slightly cheaper. NFCI. llvm-svn: 319481	2017-11-30 22:10:35 +00:00
Max Kazantsev	9545a408b6	[SCEV][NFC] Break from loop after we found first non-Phi in getAddRecExprPHILiterally llvm-svn: 319306	2017-11-29 10:54:16 +00:00
Max Kazantsev	1c3b622820	[SCEV][NFC] Remove condition that can never happen due to check few lines above llvm-svn: 319293	2017-11-29 06:10:36 +00:00
Max Kazantsev	6e78ad35cc	[SCEV][NFC] More efficient caching in CompareValueComplexity Currently, we use a set of pairs to cache responces like `CompareValueComplexity(X, Y) == 0`. If we had proved that `CompareValueComplexity(S1, S2) == 0` and `CompareValueComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareValueComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` that merges Values into equivalence sets so that any two values from the same set are equal from point of `CompareValueComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40429 llvm-svn: 319153	2017-11-28 08:26:43 +00:00
Max Kazantsev	cf9b1b24ce	[SCEV][NFC] More efficient caching in CompareSCEVComplexity Currently, we use a set of pairs to cache responces like `CompareSCEVComplexity(X, Y) == 0`. If we had proved that `CompareSCEVComplexity(S1, S2) == 0` and `CompareSCEVComplexity(S2, S3) == 0`, this cache does not allow us to prove that `CompareSCEVComplexity(S1, S3)` is also `0`. This patch replaces this set with `EquivalenceClasses` any two values from the same set are equal from point of `CompareSCEVComplexity`. This, in particular, allows us to prove the fact from example above. Differential Revision: https://reviews.llvm.org/D40428 llvm-svn: 319149	2017-11-28 07:48:12 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Sanjay Patel	4ca9968155	[InstSimplify] use m_APFloat to simplify fcmp folds; NFCI llvm-svn: 319043	2017-11-27 16:37:09 +00:00
Jatin Bhateja	7410eead0c	[SCEV] Adding a check on outgoing branches of a terminator instr for SCEVBackedgeConditionFolder, NFC. Summary: For a given loop, getLoopLatch returns a non-null value when a loop has only one latch block. In the modified context adding an assertion to check that both the outgoing branches of a terminator instruction (of latch) does not target same header. + few minor code reorganization. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D40460 llvm-svn: 318997	2017-11-26 15:08:41 +00:00
Jatin Bhateja	a1da5e4ce7	[SCEV] NFC : Removing unnecessary check on outgoing branches of a branch instr. Summary: For a given loop, getLoopLatch returns a non-null value when a loop has only one latch block. In the modified context a check on both the outgoing branches of a terminator instruction (of latch) to same header is redundant. Reviewers: jbhateja Reviewed By: jbhateja Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D40460 llvm-svn: 318991	2017-11-26 02:01:01 +00:00
Fedor Sergeev	61975b49fe	IR printing improvement for loop passes Summary: Loop-pass printing is somewhat deficient since it does not provide the context around the loop (e.g. preheader). This context information becomes pretty essential when analyzing transformations that move stuff out of the loop. Extending printLoop to cover preheader and exit blocks (if any). Reviewers: sanjoy, silvas, weimingz Reviewed By: sanjoy Subscribers: apilipenko, skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D40246 llvm-svn: 318878	2017-11-22 20:59:53 +00:00
Max Kazantsev	23044fa639	[SCEV] Strengthen variance condition in calculateLoopDisposition Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively. When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if `L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are consecutive sibling loops within the parent loop. In this case, `AR2` is also varying w.r.t. `L1`, but we don't correctly identify it. It can lead, for exaple, to attempt of incorrect folding. Consider: AR1 = {a,+,b}<L1> AR2 = {c,+,d}<L2> EXAR2 = sext(AR1) MUL = mul AR1, EXAR2 If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect because `AR2` is not available on entrance of `L1`. Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled with one check: "header of `L1` dominates header of `L2`". This patch replaces the old insufficient check with this one. Differential Revision: https://reviews.llvm.org/D39453 llvm-svn: 318819	2017-11-22 06:21:39 +00:00
Hans Wennborg	70e22d121d	Fix r318786 llvm-svn: 318787	2017-11-21 18:00:01 +00:00
Nuno Lopes	5c122882ed	removed unused private method decl. NFC llvm-svn: 318786	2017-11-21 17:53:19 +00:00
Alina Sbirlea	ff8b8aea2e	Add MemorySSA as loop dependency, disabled by default [NFC]. Summary: First step in adding MemorySSA as dependency for loop pass manager. Adding the dependency under a flag. New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null. Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default. Reviewers: sanjoy, davide, gberry Subscribers: mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D40274 llvm-svn: 318772	2017-11-21 15:45:46 +00:00
Sanjay Patel	eb731b09f3	[InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions: FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans) FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) \| isnan(Y) So we can simplify patterns like this: (fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y (fcmp uno (known NNAN), X) \|\| (fcmp uno X, Y) --> fcmp uno X, Y It might be better to split this into (X uno 0) \| (Y uno 0) as a canonicalization, but that would be another patch. Differential Revision: https://reviews.llvm.org/D40130 llvm-svn: 318627	2017-11-19 15:34:27 +00:00
Chandler Carruth	693eedb138	[PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple. The pass will now be more of a "full loop unswitching" pass rather than anything substantively simpler than any other approach. I plan to rename it accordingly once the dust settles. The key ideas of the new loop unswitcher are carried over for non-trivial unswitching: 1) Fully unswitch a branch or switch instruction from inside of a loop to outside of it. 2) Update the CFG and IR. This avoids needing to "remember" the unswitched branches as well as avoiding excessively cloning and reliance on complex parts of simplify-cfg to cleanup the cfg. 3) Update the analyses (where we can) rather than just blowing them away or relying on something else updating them. Sadly, #3 is somewhat compromised here as the dominator tree updates were too complex for me to want to reason about. I will need to make another attempt to do this now that we have a nice dynamic update API for dominators. However, we do adhere to #3 w.r.t. LoopInfo. This approach also adds an important principls specific to non-trivial unswitching: not all of the loop will be duplicated when unswitching. This fact allows us to compute the cost in terms of how much duplicate code is inserted rather than just on raw size. Unswitching conditions which essentialy partition loops will work regardless of the total loop size. Some remaining issues that I will be addressing in subsequent commits: - Handling unstructured control flow. - Unswitching 'switch' cases instead of just branches. - Moving to the dynamic update API for dominators. Some high-level, interesting limitationsV that folks might want to push on as follow-ups but that I don't have any immediate plans around: - We could be much more clever about not cloning things that will be deleted. In fact, we should be able to delete nothing and do a minimal number of clones. - There are many more interesting selection criteria for which branch to unswitch that we might want to look at. One that I'm interested in particularly are a set of conditions which all exit the loop and which can be merged into a single unswitched test of them. Differential revision: https://reviews.llvm.org/D34200 llvm-svn: 318549	2017-11-17 19:58:36 +00:00
Volodymyr Sapsai	8b46ff1648	[ThinLTO] Remove too aggressive assertion in building function call graph. The assertion was introduced in r317853 but there are cases when a call isn't handled either as direct or indirect. In this case we add a reference graph edge but not a call graph edge. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, inglorion, eraman, hiraditya, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D40056 llvm-svn: 318540	2017-11-17 18:28:05 +00:00
Javed Absar	da30c30a5e	[SCEV] simplify loop. NFC. Change loop to range-based llvm-svn: 318401	2017-11-16 13:49:27 +00:00
Max Kazantsev	87f4a3de45	[SCEV][NFC] Introduce isSafeToExpandAt function to SCEVExpander This function checks that: 1) It is safe to expand a SCEV; 2) It is OK to materialize it at the specified location. For example, attempt to expand a loop's AddRec to the same loop's preheader should fail. Differential Revision: https://reviews.llvm.org/D39236 llvm-svn: 318377	2017-11-16 05:10:56 +00:00
Mikael Holmen	6e60297ee6	[Lint] Don't warn about passing alloca'd value to tail call if using byval Summary: This fixes PR35241. When using byval, the data is effectively copied as part of the call anyway, so the pointer returned by the alloca will not be leaked to the callee and thus there is no reason to issue a warning. Reviewers: rnk Reviewed By: rnk Subscribers: Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D40009 llvm-svn: 318279	2017-11-15 07:46:48 +00:00
Reid Kleckner	e021f703db	Fix clang -Wsometimes-uninitialized warning in SCEV code I don't believe this was a problem in practice, as it's likely that the boolean wasn't checked unless the backend condition was non-null. llvm-svn: 318073	2017-11-13 18:43:11 +00:00
Sanjay Patel	20df88a754	[ValueTracking] use 'auto' with 'dyn_cast'; NFC llvm-svn: 318058	2017-11-13 17:56:23 +00:00
Sanjay Patel	9e3d8f4b39	[ValueTracking] simplify code in CannotBeNegativeZero() with match(); NFCI llvm-svn: 318055	2017-11-13 17:40:47 +00:00
Jatin Bhateja	c61ade1ca0	[SCEV] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: sanjoy, junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 318050	2017-11-13 16:43:24 +00:00
Volodymyr Sapsai	a73960213e	[ThinLTO] Fix missing call graph edges for calls with bitcasts. This change doesn't fix the root cause of the miscompile PR34966 as the root cause is in the linker ld64. This change makes call graph more complete allowing to have better module imports/exports. rdar://problem/35344706 Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39356 llvm-svn: 317853	2017-11-10 00:47:47 +00:00
Nuno Lopes	2ee4b30276	revert r317812 [BasicAA] fix build break by converting the previously introduced assert into an if stmt The code has a bug, but some tests regress. I'll discuss this further on the mailing list. llvm-svn: 317815	2017-11-09 17:35:36 +00:00
Nuno Lopes	9f82a2b60e	[BasicAA] fix build break by converting the previously introduced assert into an if stmt Apparently V1Size == -1 doest imply V2Size == -1, which is a bit surprising to me. llvm-svn: 317812	2017-11-09 17:06:42 +00:00
Nuno Lopes	eb1a603dd1	[BasicAA] add assertion for corner case in aliasGEP() llvm-svn: 317803	2017-11-09 16:16:46 +00:00
Dan Gohman	2c74fe977d	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729	2017-11-08 21:59:51 +00:00
Craig Topper	81d772c28a	[ValueTracking] Use APInt::isNullValue/isOneValue which are more efficient for large APInts. llvm-svn: 317712	2017-11-08 19:38:45 +00:00
Ivan A. Kosarev	d60a3cc395	[Analysis] Fix merging TBAA tags with different final access types There are cases when we have to merge TBAA access tags with the same base access type, but different final access types. For example, accesses to different members of the same structure may be vectorized into a single load or store instruction. Since we currently assume that the tags to merge always share the same final access type, we incorrectly return a tag that describes an access to one of the original final access types as the generic tag. This patch fixes that by producing generic tags for the common type and not the final access types of the original tags. Resolves: PR35225: Wrong tbaa metadata after load store vectorizer due to recent change https://bugs.llvm.org/show_bug.cgi?id=35225 Differential Revision: https://reviews.llvm.org/D39732 llvm-svn: 317682	2017-11-08 11:42:21 +00:00
Nuno Lopes	17921d9e21	BasicAA: fix bug where we would return partialalias instead of noalias My fix is conservative and will make us return may-alias instead. The test case is: check(gep(x, 0), n, gep(x, n), -1) with n == sizeof(x) Here, the first value accesses the whole object, but the second access doesn't access anything. The semantics of -1 is read until the end of the object, which in this case means read nothing. No test case, since isn't trivial to exploit this one, but I've proved it correct. llvm-svn: 317680	2017-11-08 10:59:00 +00:00
Sanjay Patel	86d24f1668	[ValueTracking] readonly (const) is a requirement for converting sqrt to llvm.sqrt; nnan is not As discussed in D39204, this is effectively a revert of rL265521 which required nnan to vectorize sqrt libcalls based on the old LangRef definition of llvm.sqrt. Now that the definition has been updated so the libcall and intrinsic have the same semantics apart from potentially setting errno, we can remove the nnan requirement. We have the right check to know that errno is not set: if (!ICS.onlyReadsMemory()) ...ahead of the switch. This will solve https://bugs.llvm.org/show_bug.cgi?id=27435 assuming that's being built for a target with -fno-math-errno. Differential Revision: https://reviews.llvm.org/D39642 llvm-svn: 317519	2017-11-06 22:40:09 +00:00
Dorit Nuzman	eb13dd3eac	[LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a single-iteration loop This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that Stride >= Trip-Count. Such a predicate will effectively optimize a single or zero iteration loop, as Trip-Count <= Stride == 1. Differential Revision: https://reviews.llvm.org/D38785 llvm-svn: 317438	2017-11-05 16:53:15 +00:00
Sean Fertile	4595a915f6	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Originally commited as r317374, but reverted in r317395 to update some missed tests. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317408	2017-11-04 17:04:39 +00:00
Sean Fertile	39770ca0a1	Revert "[LTO][ThinLTO] Use the linker resolutions to mark global values ..." Changes more tests then expected on one of the build bots. reverting to investigate. This reverts https://llvm.org/svn/llvm-project/llvm/trunk@317374 llvm-svn: 317395	2017-11-04 01:54:20 +00:00
Sean Fertile	36528c2a9b	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. Now that we have a way to mark GlobalValues as local we can use the symbol resolutions that the linker plugin provides as part of lto/thinlto link step to refine the compilers view on what symbols will end up being local. Differential Revision: https://reviews.llvm.org/D35702 llvm-svn: 317374	2017-11-03 21:45:55 +00:00
Ivan A. Kosarev	4b77f463d0	[Analysis] Refine matching and merging of TBAA tags This patch combines the code that matches and merges TBAA access tags. The aim is to simplify future changes and making sure that these operations produce consistent results. Differential Revision: https://reviews.llvm.org/D39463 llvm-svn: 317311	2017-11-03 10:26:25 +00:00
Hiroshi Yamauchi	dce9def3dd	Irreducible loop metadata for more accurate block frequency under PGO. Summary: Currently the block frequency analysis is an approximation for irreducible loops. The new irreducible loop metadata is used to annotate the irreducible loop headers with their header weights based on the PGO profile (currently this is approximated to be evenly weighted) and to help improve the accuracy of the block frequency analysis for irreducible loops. This patch is a basic support for this. Reviewers: davidxl Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D39028 llvm-svn: 317278	2017-11-02 22:26:51 +00:00
Yichao Yu	6fefc0d65e	Allow inaccessiblememonly and inaccessiblemem_or_argmemonly to be overwriten on call site with operand bundle Summary: Similar to argmemonly, readonly and readnone. Fix PR35128 Reviewers: andrew.w.kaylor, chandlerc, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D39434 llvm-svn: 317201	2017-11-02 12:18:33 +00:00
Geoff Berry	eed6531ea2	[BranchProbabilityInfo] Handle irreducible loops. Summary: Compute the strongly connected components of the CFG and fall back to use these for blocks that are in loops that are not detected by LoopInfo when computing loop back-edge and exit branch probabilities. Reviewers: dexonsmith, davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D39385 llvm-svn: 317094	2017-11-01 15:16:50 +00:00
Philip Reames	5552f503d5	Undo accidental commit These files shouldn't have been submitted in 316967 llvm-svn: 316968	2017-10-31 00:04:09 +00:00
Philip Reames	9c3cbeea39	[CGP] Fix crash on i96 bit multiply Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725 If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like. llvm-svn: 316967	2017-10-30 23:59:51 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Artur Gainullin	af7ba8ff6b	Improve clamp recognition in ValueTracking. Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 llvm-svn: 316795	2017-10-27 20:53:41 +00:00
Max Kazantsev	52d0a49046	Revert rL316568 because of sudden performance drop on ARM llvm-svn: 316739	2017-10-27 04:17:44 +00:00
Sanjoy Das	8499ebf2e9	[SCEV] Fix an assertion failure in the max backedge taken count Max backedge taken count is always expected to be a constant; and this is usually true by construction -- it is a SCEV expression with constant inputs. However, if the max backedge expression ends up being computed to be a udiv with a constant zero denominator[0], SCEV does not fold the result to a constant since there is no constant it can fold it to (SCEV has no representation for "infinity" or "undef"). However, in computeMaxBECountForLT we already know the denominator is positive, and thus at least 1; and we can use this fact to avoid dividing by zero. [0]: We can end up with a constant zero denominator if the signed range of the stride is more precise than the unsigned range. llvm-svn: 316615	2017-10-25 21:41:00 +00:00
Sanjoy Das	f15a861601	Add a comment to clarify a future change llvm-svn: 316614	2017-10-25 21:40:59 +00:00
Max Kazantsev	b6d40067af	[SCEV] Enhance SCEVFindUnsafe for division This patch allows SCEVFindUnsafe algorithm to tread division by any non-positive value as safe. Previously, it could only recognize non-zero constants. Differential Revision: https://reviews.llvm.org/D39228 llvm-svn: 316568	2017-10-25 11:07:43 +00:00
Mikael Holmen	279790b674	[MemDep] DBG intrinsics don't impact abort limit for call site dependence analysis Summary: Memory dependence analysis no longer counts DbgInfoIntrinsics towards the limit where to abort the analysis. Before, a bunch of calls to dbg.value could affect the generated code, meaning that with -g we could generate different code than without. Reviewers: chandlerc, Prazek, davide, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39181 llvm-svn: 316551	2017-10-25 06:15:32 +00:00
Artem Belevich	cb8f6328dc	[NVPTX] allow address space inference for volatile loads/stores. If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495	2017-10-24 20:31:44 +00:00
Craig Topper	8e8b6efdc8	[ValueTracking] Remove unnecessary temporary APInt from computeNumSignBitsVectorConstant. We can just use getNumSignBits instead of inverting negative numbers. llvm-svn: 316266	2017-10-21 16:35:41 +00:00
Craig Topper	b98ee58511	[ValueTracking] Simplify the known bits code for constant vectors a little. Neither of these cases really require a temporary APInt outside the loop. For the ConstantDataSequential case the APInt will never be larger than 64-bits so its fine to just call getElementAsAPInt. For ConstantVector we can get the APInt by reference and only make a copy where the inversion is needed. llvm-svn: 316265	2017-10-21 16:35:39 +00:00
Nikolai Bozhenov	fa8c5514c5	[ValueTracking] Enabling ValueTracking patch by default (recommit #2 after checking for timeout issue). The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 316208	2017-10-20 10:08:47 +00:00
Nikolai Bozhenov	8dcab54cb4	Revert r315992 because of a found miscompilation failure llvm-svn: 316164	2017-10-19 15:36:18 +00:00
Sanjoy Das	2f27456c82	Revert "[ScalarEvolution] Handling for ICmp occuring in the evolution chain." This reverts commit r316054. There was some confusion over the review process: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171016/495884.html llvm-svn: 316129	2017-10-18 22:00:57 +00:00
Nikolai Bozhenov	9723f12491	Fixup patch for revision rL316070. Added check that type of CmpConst and source type of trunc are equal for correct matching of the case when we can set widened C constant equal to CmpConstant. %cond = cmp iN %x, CmpConst %tr = trunc iN %x to iK %narrowsel = select i1 %cond, iK %t, iK C Patch by: Gainullin, Artur <artur.gainullin@intel.com> llvm-svn: 316082	2017-10-18 14:24:50 +00:00
Nikolai Bozhenov	74c047eabb	Improve lookThroughCast function. Summary: When we have the following case: %cond = cmp iN %x, CmpConst %tr = trunc iN %x to iK %narrowsel = select i1 %cond, iK %t, iK C We could possibly match only min/max pattern after looking through cast. So it is more profitable if widened C constant will be equal CmpConst. That is why just set widened C constant equal to CmpConst, because there is a further check in this function that trunc CmpConst == C. Also description for lookTroughCast function was added. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38536 Patch by: Artur Gainullin <artur.gainullin@intel.com> llvm-svn: 316070	2017-10-18 09:28:09 +00:00
Jatin Bhateja	1fc49627e4	[ScalarEvolution] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. Currently scope of evaluation is limited to SCEV computation for PHI nodes. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 316054	2017-10-18 01:36:16 +00:00
Nikolai Bozhenov	346f4329c4	Improve clamp recognition in ValueTracking. Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. This change breaks the canonical form of cmp inside the matchMinMax function, that is why additional checks for compare predicates is needed. Added corresponding test cases. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38531 Patch by: Artur Gainullin <artur.gainullin@intel.com> llvm-svn: 315992	2017-10-17 11:50:48 +00:00
Sanjoy Das	3a5e25278a	Revert "[SCEV] Maintain and use a loop->loop invalidation dependency" This reverts commit r315713. It causes PR34968. I think I know what the problem is, but I don't think I'll have time to fix it this week. llvm-svn: 315962	2017-10-17 01:03:56 +00:00
Anna Thomas	79503c035f	[SCEV] Rename getMaxBECount and update comments. NFC Post commit review comments at D38825. llvm-svn: 315920	2017-10-16 17:47:17 +00:00
Sanjay Patel	b7d1238cfc	[ValueTracking] fix typos, formatting; NFC llvm-svn: 315909	2017-10-16 14:46:37 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Hongbin Zheng	73f650435b	[LoopInfo][Refactor] Make SetLoopAlreadyUnrolled a member function of the Loop Pass, NFC. This avoid code duplication and allow us to add the disable unroll metadata elsewhere. Differential Revision: https://reviews.llvm.org/D38928 llvm-svn: 315850	2017-10-15 07:31:02 +00:00
Matthew Simpson	2284937bbc	[IPSCCP] Move common functions to ValueLatticeUtils (NFC) This patch moves some common utility functions out of IPSCCP and makes them available globally. The functions determine if interprocedural data-flow analyses can propagate information through function returns, arguments, and global variables. Differential Revision: https://reviews.llvm.org/D37638 llvm-svn: 315719	2017-10-13 17:53:44 +00:00
Sanjoy Das	c70a7a02ea	[SCEV] Maintain and use a loop->loop invalidation dependency Summary: This change uses the loop use list added in the previous change to remember the loops that appear in the trip count expressions of other loops; and uses it in forgetLoop. This lets us not scan every loop in the function on a forgetLoop call. With this change we no longer invalidate clear out backedge taken counts on forgetValue. I think this is fine -- the contract is that SCEV users must call forgetLoop(L) if their change to the IR could have changed the trip count of L; solely calling forgetValue on a value feeding into the backedge condition of L is not enough. Moreover, I don't think we can strengthen forgetValue to be sufficient for invalidating trip counts without significantly re-architecting SCEV. For instance, if we have the loop: I = *Ptr; E = I + 10; do { // ... } while (++I != E); then the backedge taken count of the loop is 9, and it has no reference to either I or E, i.e. there is no way in SCEV today to re-discover the dependency of the loop's trip count on E or I. So a SCEV client cannot change E to (say) "I + 20", call forgetValue(E) and expect the loop's trip count to be updated. Reviewers: atrick, sunfish, mkazantsev Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38435 llvm-svn: 315713	2017-10-13 17:13:44 +00:00
Anna Thomas	a2ca902033	[SCEV] Teach SCEV to find maxBECount when loop endbound is variant Summary: This patch teaches SCEV to calculate the maxBECount when the end bound of the loop can vary. Note that we cannot calculate the exactBECount. This will only be done when both conditions are satisfied: 1. the loop termination condition is strictly LT. 2. the IV is proven to not overflow. This provides more information to users of SCEV and can be used to improve identification of finite loops. Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick Reviewed by: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38825 llvm-svn: 315683	2017-10-13 14:30:43 +00:00
Daniel Jasper	3344a21236	Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680	2017-10-13 14:04:21 +00:00
Sanjoy Das	e6b995f2b2	[SCEV] Maintain loop use lists, and use them in forgetLoop Summary: Currently we do not correctly invalidate memoized results for add recurrences that were created directly (i.e. they were not created from a `Value`). This change fixes this by keeping loop use lists and using the loop use lists to determine which SCEV expressions to invalidate. Here are some statistics on the number of uses of in the use lists of all loops on a clang bootstrap (config: release, no asserts): Count: 731310 Min: 1 Mean: 8.555150 50th %time: 4 95th %tile: 25 99th %tile: 53 Max: 433 Reviewers: atrick, sunfish, mkazantsev Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38434 llvm-svn: 315672	2017-10-13 05:50:52 +00:00
Sanjay Patel	e272be7c9a	[ValueTracking] return zero when there's conflict in known bits of a shift (PR34838) Poison allows us to return a better result than undef. llvm-svn: 315595	2017-10-12 17:31:46 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Hiroshi Inoue	b49b015bed	[ScheduleDAGInstrs] fix behavior of getUnderlyingObjectsForCodeGen when no identifiable object found This patch fixes the bug introduced in https://reviews.llvm.org/D35907; the bug is reported by http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171002/491452.html. Before D35907, when GetUnderlyingObjects fails to find an identifiable object, allMMOsOkay lambda in getUnderlyingObjectsForInstr returns false and Objects vector is cleared. This behavior is unintentionally changed by D35907. This patch makes the behavior for such case same as the previous behavior. Since D35907 introduced a wrapper function getUnderlyingObjectsForCodeGen around GetUnderlyingObjects, getUnderlyingObjectsForCodeGen is modified to return a boolean value to ask the caller to clear the Objects vector. Differential Revision: https://reviews.llvm.org/D38735 llvm-svn: 315565	2017-10-12 06:26:04 +00:00
Daniel Neilson	5acfd1dd78	[SCEV] Properly handle the case of a non-constant start with a zero accum in ScalarEvolution::createAddRecFromPHIWithCastsImpl Summary: This patch fixes an error in the patch to ScalarEvolution::createAddRecFromPHIWithCastsImpl made in D37265. In that patch we handle the cases where the either the start or accum values can be zero after truncation. But, we assume that the start value must be a constant if the accum is zero. This is clearly an erroneous assumption. This change removes that assumption. Reviewers: sanjoy, dorit, mkazantsev Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38814 llvm-svn: 315491	2017-10-11 19:05:14 +00:00
Vivek Pandya	9590658fb8	[NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure parameterized emit() calls Summary: This is not functional change to adopt new emit() API added in r313691. Reviewed By: anemet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38285 llvm-svn: 315476	2017-10-11 17:12:59 +00:00
Adam Nemet	0965da2055	Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.* Sync it up with the name of the class actually defined here. This has been bothering me for a while... llvm-svn: 315249	2017-10-09 23:19:02 +00:00
Matthew Simpson	49ee814996	[SparsePropagation] Move member definitions to header (NFC) AbstractLatticeFunction and SparseSolver are class templates parameterized by a lattice value, so we need to move these member functions over to the header. Differential Revision: https://reviews.llvm.org/D38561 llvm-svn: 314996	2017-10-05 18:03:30 +00:00
Sanjoy Das	005b88c0a6	Do not call Loop::getName on possibly dead loops This fixes PR34832. llvm-svn: 314938	2017-10-04 22:02:27 +00:00
Jun Bum Lim	d40e03c2d8	Recommit : Use the basic cost if a GEP is not used as addressing mode Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923	2017-10-04 18:33:52 +00:00
Adam Nemet	6c381b7a2e	[OptRemark] Move YAML writing to IR Before the patch this was in Analysis. Moving it to IR and making it implicit part of LLVMContext::diagnose allows the full opt-remark facility to be used outside passes e.g. the pass manager. Jessica is planning to use this to report function size after each pass. The same could be used for time reports. Tested with BUILD_SHARED_LIBS=On. llvm-svn: 314909	2017-10-04 15:18:11 +00:00
Adam Nemet	f31b1f310c	Move verbosity check for remarks to the diag handler Test needs some slight adjustment because we no longer check the existence of BFI but rather that the actual hotness is set on the remark. If entry_count is not set getBlockProfileCount returns None. llvm-svn: 314874	2017-10-04 04:26:23 +00:00
Hans Wennborg	9a9048e19f	Revert r314806 "[SLP] Vectorize jumbled memory loads." All the buildbots are red, e.g. http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/ > Summary: > This patch tries to vectorize loads of consecutive memory accesses, accessed > in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 > which was reverted back due to some basic issue with representing the 'use mask' of > jumbled accesses. > > This patch fixes the mask representation by recording the 'use mask' in the usertree entry. > > Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df > > Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh > > Reviewed By: Ayal > > Subscribers: hans, mzolotukhin > > Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 314824	2017-10-03 18:32:29 +00:00
Mohammad Shahid	1d5422f27f	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 314806	2017-10-03 15:28:48 +00:00
Evgeny Astigeevich	d3558b5d5e	[InlineCost, NFC] Extract code dealing with inbounds GEPs from visitGetElementPtr into a function The code responsible for analysis of inbounds GEPs is extracted into a separate function: CallAnalyzer::canFoldInboundsGEP. With the patch SROA enabling/disabling code is localized at one place instead of spreading across the code of CallAnalyzer::visitGetElementPtr. Differential Revision: https://reviews.llvm.org/D38233 llvm-svn: 314787	2017-10-03 12:00:40 +00:00
Mikael Holmen	6efe507e42	[Lint] Avoid failed assertion by fetching the proper pointer type Summary: When checking if a constant expression is a noop cast we fetched the IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can be cases where V doesn't return a pointer, and then getIntPtrType() triggers an assertion. Now we pass DataLayout to isNoopCast so the method itself can determine what the IntPtrType is. Reviewers: arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D37894 llvm-svn: 314763	2017-10-03 06:03:49 +00:00
Daniel Berlin	67fbb351e3	SparseSolver: Rename getOrInitValueState to getValueState, matching what SCCP calls it llvm-svn: 314744	2017-10-03 00:26:21 +00:00
Haicheng Wu	25f6c196d7	[InstSimplify] teach SimplifySelectInst() to fold more vector selects Call ConstantFoldSelectInstruction() to fold cases like below select <2 x i1><i1 true, i1 false>, <2 x i8> <i8 0, i8 1>, <2 x i8> <i8 2, i8 3> All operands are constants and the condition has mixed true and false conditions. Differential Revision: https://reviews.llvm.org/D38369 llvm-svn: 314741	2017-10-02 23:43:52 +00:00
Daniel Berlin	4d825bcf09	Template the sparse propagation solver instead of using void pointers Summary: This avoids using void * as the type of the lattice value and ugly casts needed to make that happen. (If folks want to use references, etc, they can use a reference_wrapper). Reviewers: davide, mssimpso Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D38476 llvm-svn: 314734	2017-10-02 22:49:49 +00:00
Xin Tong	c063c3f09d	Revert "Fix typo [NFC]" This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9. Incorrectly include changes that are not typo fix. llvm-svn: 314614	2017-10-01 00:09:53 +00:00
Xin Tong	efec219e1b	Fix typo [NFC] llvm-svn: 314613	2017-10-01 00:07:24 +00:00
Alex Shlyapnikov	e76aa3b0b2	Revert "Use the basic cost if a GEP is not used as addressing mode" This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560	2017-09-29 22:04:45 +00:00
Jun Bum Lim	0e16a59e83	Use the basic cost if a GEP is not used as addressing mode Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517	2017-09-29 14:50:16 +00:00
Florian Hahn	8af01573a3	[LVI] Move LVILatticeVal class to separate header file (NFC). Summary: This allows sharing the lattice value code between LVI and SCCP (D36656). It also adds a `satisfiesPredicate` function, used by D36656. Reviewers: davide, sanjoy, efriedma Reviewed By: sanjoy Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D37591 llvm-svn: 314411	2017-09-28 11:09:22 +00:00
Sanjoy Das	def1729dc4	Use a BumpPtrAllocator for Loop objects Summary: And now that we no longer have to explicitly free() the Loop instances, we can (with more ease) use the destructor of LoopBase to do what LoopBase::clear() was doing. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38201 llvm-svn: 314375	2017-09-28 02:45:42 +00:00
Haicheng Wu	3ec848bc50	[InlineCost] add visitSelectInst() InlineCost can understand Select IR now. This patch finds free Select IRs and continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and SROAArgValues. Differential Revision: https://reviews.llvm.org/D37198 llvm-svn: 314307	2017-09-27 14:44:56 +00:00
Daniel Berlin	97f34e887f	MemorySSAUpdater: Only add phis to insertedphis if we actually inserted them, not if we just found existing ones llvm-svn: 314273	2017-09-27 05:35:19 +00:00
Matthias Braun	cc603ee3d5	TargetLibraryInfo: Stop guessing wchar_t size Usually the frontend communicates the size of wchar_t via metadata and we can optimize wcslen (and possibly other calls in the future). In cases without the wchar_size metadata we would previously try to guess the correct size based on the target triple; however this is fragile to keep up to date and may miss users manually changing the size via flags. Better be safe and stop guessing and optimizing if the frontend didn't communicate the size. Differential Revision: https://reviews.llvm.org/D38106 llvm-svn: 314185	2017-09-26 02:36:57 +00:00
Michael Liao	b30286d81c	Remove trailing whitespaces. llvm-svn: 314115	2017-09-25 16:21:21 +00:00
Clement Courbet	2807c0a442	[CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion. Summary: Right now there are two functions with the same name, one does the work and the other one returns true if expansion is needed. Rename TargetTransformInfo::expandMemCmp to make it more consistent with other members of TargetTransformInfo. Remove the unused Instruction* parameter. Differential Revision: https://reviews.llvm.org/D38165 llvm-svn: 314096	2017-09-25 06:35:16 +00:00
Daniel Neilson	1341ac2ced	[SCEV] Generalize folding of trunc(x)+ntrunc(y) into folding mtrunc(x)+ntrunc(y) Summary: A SCEV such as: {%v2,+,((-1 (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop> can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr() will not try to apply the simplification mtrunc(x)+ntrunc(y) -> trunc(trunc(m)x+trunc(n)y) because it only keys off having a non-multiplied trunc as the first term in the simplification. This patch generalizes this code to try to do a more generic fold of these trunc expressions. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37888 llvm-svn: 313988	2017-09-22 15:47:57 +00:00
Sanjoy Das	388b012f4e	Rename markAsErased to erase, as pointed out in a previous review; NFC llvm-svn: 313951	2017-09-22 01:47:41 +00:00
Hans Wennborg	57c3341ada	Revert r313771 "[SLP] Vectorize jumbled memory loads." This broke the buildbots, e.g. http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391 > Summary: > This patch tries to vectorize loads of consecutive memory accesses, accessed > in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 > which was reverted back due to some basic issue with representing the 'use mask' > jumbled accesses. > > This patch fixes the mask representation by recording the 'use mask' in the usertree entry. > > Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df > > Subscribers: mzolotukhin > > Reviewed By: ayal > > Differential Revision: https://reviews.llvm.org/D36130 > > Review comments updated accordingly > > Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 > > Added a TODO for sortLoadAccesses API > > Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 > > Modified the TODO for sortLoadAccesses API > > Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 > > Review comment update for using OpdNum to insert the mask in respective location > > Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce > > Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase > > Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313781	2017-09-20 18:00:03 +00:00
Mohammad Shahid	2b281de576	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Subscribers: mzolotukhin Reviewed By: ayal Differential Revision: https://reviews.llvm.org/D36130 Review comments updated accordingly Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 Added a TODO for sortLoadAccesses API Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 Modified the TODO for sortLoadAccesses API Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 Review comment update for using OpdNum to insert the mask in respective location Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313771	2017-09-20 17:19:57 +00:00
Alexander Kornienko	6a140234ed	Revert r313736: "[SLP] Vectorize jumbled memory loads." The revision breaks buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio llvm-svn: 313758	2017-09-20 14:53:07 +00:00
Alexander Kornienko	7302344bdf	Revert r313753: "Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp" llvm-svn: 313757	2017-09-20 14:52:56 +00:00
Alexander Kornienko	6c629b5728	Fix a -Wsign-compare warning in LoopAccessAnalysis.cpp llvm-svn: 313753	2017-09-20 12:18:22 +00:00
Mohammad Shahid	f8db9bd857	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 Commit after rebase for patch D36130 Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab llvm-svn: 313736	2017-09-20 08:18:28 +00:00
Sanjoy Das	09613b122e	Tighten the invariants around LoopBase::invalidate Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708	2017-09-20 02:31:57 +00:00
Sanjoy Das	66a004ac0c	Clang-format few files to make later diffs leaner; NFC llvm-svn: 313705	2017-09-20 01:12:09 +00:00
Sanjoy Das	76ab23234c	[LoopInfo] Make LoopBase and Loop destructors non-public Summary: See comment for why I think this is a good idea. This change also: - Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop. - Updates the loop pass manager test case to deal with a private ~Loop::Loop. - Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37996 llvm-svn: 313695	2017-09-19 23:19:00 +00:00
Sanjay Patel	0d4fd5b668	[InstSimplify] fold sdiv/srem based on compare of dividend and divisor This should bring signed div/rem analysis up to the same level as unsigned. We use icmp simplification to determine when the divisor is known greater than the dividend. Each positive test is followed by a negative test to show that we're not overstepping the boundaries of the known bits. There are extra tests for the signed-min-value special cases. Alive proofs: http://rise4fun.com/Alive/WI5 Differential Revision: https://reviews.llvm.org/D37713 llvm-svn: 313264	2017-09-14 14:59:07 +00:00
Sanjay Patel	cca8f7853f	[InstSimplify] clean up div/rem handling; NFCI The idea to make an 'isDivZero' helper was suggested for the signed case in D37713: https://reviews.llvm.org/D37713 This clean-up makes it clear that D37713 is just filling the gap for signed div/rem, removes unnecessary code, and allows us to remove a bit of duplicated code from the planned improvement in D37713. llvm-svn: 313261	2017-09-14 14:09:11 +00:00
Chandler Carruth	7376ae88eb	[PM/CGSCC] Teach the CGSCC pass manager components to gracefully handle invalidated SCCs even when we do not have an updated SCC to redirect towards. This comes up in a fairly subtle and surprising circumstance: we need to have a connected but internal node in the call graph which later becomes a disconnected island, and then gets deleted. All of this needs to happen mid-CGSCC walk. Because it is disconnected, we have no way of computing a new "current" SCC when it gets deleted. Instead, we need to explicitly check for a deleted "current" SCC and bail out of the current CGSCC step. This will bubble all the way up to the post-order walk and then resume correctly. I've included minimal tests for this bug. The specific behavior matches something we've seen in the wild with the new PM combined with ThinLTO and sample PGO, but I've not yet confirmed whether this is the only issue there. llvm-svn: 313242	2017-09-14 08:33:57 +00:00
Alon Kom	682cfc1d4c	[LV] Fix maximum legal VF calculation This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies. Differential Revision: https://reviews.llvm.org/D37507 llvm-svn: 313237	2017-09-14 07:40:02 +00:00
Easwaran Raman	4924bb002d	[Inliner] Add another way to compute full inline cost. Summary: Full inline cost is computed when -inline-cost-full is true or ORE is non-null. This patch adds another way to compute full inline cost by adding a field to InlineParams. This will be used by SampleProfileLoader to check legality of inlining a callee that it wants to inline. Reviewers: danielcdh, haicheng Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37819 llvm-svn: 313185	2017-09-13 20:16:02 +00:00
Hiroshi Yamauchi	a43913cfaf	Add options to dump PGO counts in text. Summary: Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text. This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37776 llvm-svn: 313159	2017-09-13 17:20:38 +00:00
Teresa Johnson	cbdc5ff628	[ThinLTO] AliasSummary should not have any references Summary: References should only be on the aliasee. Reviewers: pcc Subscribers: llvm-commits, inglorion Differential Revision: https://reviews.llvm.org/D37814 llvm-svn: 313158	2017-09-13 17:10:24 +00:00
Silviu Baranga	ac920f7716	[LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs Summary: LAA can only emit run-time alias checks for pointers with affine AddRec SCEV expressions. However, non-AddRecExprs can be now be converted to affine AddRecExprs using SCEV predicates. This change tries to add the minimal set of SCEV predicates in order to enable run-time alias checking. Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel Reviewed By: hfinkel Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D17080 llvm-svn: 313012	2017-09-12 07:48:22 +00:00
Marcello Maggioni	ce90060d1c	[ScalarEvolution] Refactor forgetLoop() to improve performance forgetLoop() has pretty bad performance because it goes over the same instructions over and over again in particular when nested loop are involved. The refactoring changes the function to a not-recursive function and reusing the allocation for data-structures and the Visited set. NFCI Differential Revision: https://reviews.llvm.org/D37659 llvm-svn: 312920	2017-09-11 15:44:20 +00:00
Sanjay Patel	fa877fd464	[InstSimplify] reorder methods; NFC I'm trying to refactor some shared code for integer div/rem, but I keep having to scroll through fdiv. The FP ops have nothing in common with the integer ops, so I'm moving FP below everything else. While here, improve a couple of comments and fix some formatting. llvm-svn: 312913	2017-09-11 13:34:27 +00:00
Sanjay Patel	5876189ff1	[InstSimplify] refactor udiv/urem code and add tests; NFCI This removes some duplicated code and makes it easier to support signed div/rem in a similar way if we want to do that. Note that the existing comments were not accurate - we don't need a constant divisor to simplify; icmp simplification does more than that. But as the added tests show, it could go even further. llvm-svn: 312885	2017-09-10 17:55:08 +00:00
Nuno Lopes	404f106d71	Merge isKnownNonNull into isKnownNonZero It now knows the tricks of both functions. Also, fix a bug that considered allocas of non-zero address space to be always non null Differential Revision: https://reviews.llvm.org/D37628 llvm-svn: 312869	2017-09-09 18:23:11 +00:00
Sanjay Patel	6fd4391ddd	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00
Guozhi Wei	62d6414465	[TargetTransformInfo] Add a new public interface getInstructionCost Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832	2017-09-08 22:29:17 +00:00
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Peter Collingbourne	681fbb64a4	ModuleSummaryAnalysis: Correctly handle all function operand references. The current code that handles personality functions when creating a module summary does not correctly handle the case where a function's personality function operand refers to the function indirectly (e.g. via a bitcast). This patch handles such cases by treating personality function references like any other reference, i.e. by adding them to the function's reference list. This has the minor side benefit of allowing personality functions to participate in early dead stripping. We do this by calling findRefEdges on the function itself. This way we also end up handling other function operands (specifically prefix data and prologue data) for free. Differential Revision: https://reviews.llvm.org/D37553 llvm-svn: 312698	2017-09-07 05:35:35 +00:00
Matt Arsenault	3ced3d90c3	InstSimplify: canonicalize is idempotent llvm-svn: 312685	2017-09-07 01:21:43 +00:00
Nuno Lopes	ba1c9f7aee	Fix PR33878: BasicAA incorrectly assumes different address spaces don't alias Remove code that assumed that a nullptr of address space != 0 couldnt alias with a non-null pointer. This is incorrect, since nothing can be concluded about a null pointer in an address space != 0. This code was written before address spaces were introduced Differential Revision: https://reviews.llvm.org/D37518 llvm-svn: 312648	2017-09-06 16:55:31 +00:00
Sanjay Patel	6840c5ff75	[ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite(): https://bugs.llvm.org/show_bug.cgi?id=27145 In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno with a constant operand. But while looking at those patterns, I realized we were missing a canonicalization for nonzero constants. Rather than limiting to just folds for constants, we're adding a general value tracking method for this based on an existing DAG helper. By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps() and pick up missing vector folds. Differential Revision: https://reviews.llvm.org/D37427 llvm-svn: 312591	2017-09-05 23:13:13 +00:00
Daniel Neilson	3f0e4ad833	[SCEV] Ensure ScalarEvolution::createAddRecFromPHIWithCastsImpl properly handles out of range truncations of the start and accum values Summary: When constructing the predicate P1 in ScalarEvolution::createAddRecFromPHIWithCastsImpl() it is possible for the PHISCEV from which the predicate is constructed to be a SCEVConstant instead of a SCEVAddRec. If this happens, then the cast<SCEVAddRec>(PHISCEV) in the code will assert. Such a PHISCEV is possible if either the start value or the accumulator value is a constant value that not equal to its truncated value, and if the truncated value is zero. This patch adds tests that demonstrate the cast<> assertion, and fixes this problem by checking whether the PHISCEV is a constant before constructing the P1 predicate; if it is, then P1 is equivalent to one of P2 or P3. Additionally, if we know that the start value or accumulator value are constants then we check whether the P2 and/or P3 predicates are known false at compile time; if either is, then we bail out of constructing the AddRec. Reviewers: sanjoy, mkazantsev, silviu.baranga Reviewed By: mkazantsev Subscribers: mkazantsev, llvm-commits Differential Revision: https://reviews.llvm.org/D37265 llvm-svn: 312568	2017-09-05 19:54:03 +00:00
Eugene Zelenko	75075efe5e	[Analysis, Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 312383	2017-09-01 21:37:29 +00:00
Craig Topper	924f20262b	[InstCombine][InstSimplify] Teach decomposeBitTestICmp to look through truncate instructions This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask. This allows some code to be removed from InstSimplify that was doing this functionality. This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out. There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary. Differential Revision: https://reviews.llvm.org/D37158 llvm-svn: 312382	2017-09-01 21:27:34 +00:00
Peter Collingbourne	5e8b94c137	ModuleSummaryAnalysis: Correctly handle refs from function inline asm to module inline asm. If a function contains inline asm and the module-level inline asm contains the definition of a local symbol, prevent the function from being imported in case the function-level inline asm refers to a symbol in the module-level inline asm. Differential Revision: https://reviews.llvm.org/D37370 llvm-svn: 312332	2017-09-01 16:24:02 +00:00
Alexandre Isoard	405728fd47	[SCEV] Add URem support to SCEV In LLVM IR the following code: %r = urem <ty> %t, %b is equivalent to %q = udiv <ty> %t, %b %s = mul <ty> nuw %q, %b %r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented with minimal effort using that relation: %r --> (-%b * (%t /u %b)) + %t We implement two special cases: - if %b is 1, the result is always 0 - if %b is a power-of-two, we produce a zext/trunc based expression instead That is, the following code: %r = urem i32 %t, 65536 Produces: %r --> (zext i16 (trunc i32 %a to i16) to i32) Note that while this helps get a tighter bound on the range analysis and the known-bits analysis, this exposes some normalization shortcoming of SCEVs: %div = udim i32 %a, 65536 %mul = mul i32 %div, 65536 %rem = urem i32 %a, 65536 %add = add i32 %mul, %rem Will usually not be reduced. llvm-svn: 312329	2017-09-01 14:59:59 +00:00
Eugene Zelenko	fa6434bebb	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes. Also affected in files (NFC). llvm-svn: 312289	2017-08-31 21:56:16 +00:00
Adam Nemet	4846e66fdd	Remove an unnecessary const_cast. I think that this is dating back to when emit used to take a const reference. llvm-svn: 311948	2017-08-28 23:00:13 +00:00
Don Hinton	a67e13129d	[Dominators] Remove redundant explicit template instantiation. Summary: Remove redundant explicit template instantiation. This was reported by Andrew Kelley building release_50 with gcc7.2.0 on MacOS: duplicate symbol llvm::DominatorTreeBase. Reviewers: kuhar, andrewrk, davide, hans Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37185 llvm-svn: 311835	2017-08-26 21:08:51 +00:00
Hiroshi Yamauchi	63e17ebf8b	Add options to dump block frequency/branch probability info in text. Summary: Add options -print-bfi/-print-bpi that dump block frequency and branch probability info like -view-block-freq-propagation-dags and -view-machine-block-freq-propagation-dags do but in text. This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37165 llvm-svn: 311822	2017-08-26 00:31:00 +00:00
Haicheng Wu	61995364de	[InlineCost] Small changes to early exit condition. NFC. Change the early exit condition from Cost > Threshold to Cost >= Threshold because the inline condition is Cost < Threshold. Differential Revision: https://reviews.llvm.org/D37087 llvm-svn: 311791	2017-08-25 19:00:33 +00:00
Michael Kruse	c0a6aab6b6	Normlize to LF line endings. Commit r297442 introduced mixed CRLF/LF line endings to two files. Normalize to to LF-only line endings. llvm-svn: 311774	2017-08-25 12:38:53 +00:00
Dehao Chen	f0e27e63e7	Move accurate-sample-profile into the function attribute. Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO. Reviewers: davidxl, rsmith Reviewed By: davidxl Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D37113 llvm-svn: 311706	2017-08-24 21:37:04 +00:00
Tobias Grosser	d7eb619299	Model cache size and associativity in TargetTransformInfo Summary: We add the precise cache sizes and associativity for the following Intel architectures: - Penry - Nehalem - Westmere - Sandy Bridge - Ivy Bridge - Haswell - Broadwell - Skylake - Kabylake Polly uses since several months a performance model for BLAS computations that derives optimal cache and register tile sizes from cache and latency information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016). While bootstrapping this model, these target values have been kept in Polly. However, as our implementation is now rather mature, it seems time to teach LLVM itself about cache sizes. Interestingly, L1 and L2 cache sizes are pretty constant across micro-architectures, hence a set of architecture specific default values seems like a good start. They can be expanded to more target specific values, in case certain newer architectures require different values. For now a set of Intel architectures are provided. Just as a little teaser, for a simple gemm kernel this model allows us to improve performance from 1.2s to 0.27s. For gemm kernels with less optimal memory layouts even larger speedups can be reported. Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb Reviewed By: fhahn, asb Subscribers: lsaba, asb, pollydev, llvm-commits Differential Revision: https://reviews.llvm.org/D37051 llvm-svn: 311647	2017-08-24 09:46:25 +00:00
Rong Xu	15848e5977	[PGO] Set edge weights for indirectbr instruction with profile counts Current PGO only annotates the edge weight for branch and switch instructions with profile counts. We should also annotate the indirectbr instruction as all the information is there. This patch enables the annotating for indirectbr instructions. Also uses this annotation in branch probability analysis. Differential Revision: https://reviews.llvm.org/D37074 llvm-svn: 311604	2017-08-23 21:36:02 +00:00
George Rimar	1e94ca115d	[lib/Analysis] - Mark personality functions as live. This is PR33245. Case I am fixing is next: Imagine we have 2 BC files, one defines and uses personality routine, second has only declaration and also uses it. Previously algorithm computing dead symbols (llvm::computeDeadSymbols) did not know about personality routines and leaved them dead even if function that has routine was live. As a result thinLTOInternalizeAndPromoteGUID() method changed binding for such symbol to local. Later when LLD tried to link these objects it failed because one object had undefined global symbol for routine and second object contained local definition instead of global. Patch set the live root flag on the corresponding FunctionSummary for personality routines when we build the per-module summaries during the compile step. Differential revision: https://reviews.llvm.org/D36834 llvm-svn: 311432	2017-08-22 08:50:56 +00:00
Craig Topper	7227ebad9c	[ValueTracking] Add assertions that the starting Depth in isKnownToBeAPowerOfTwo and ComputeNumSignBitsImpl is not above MaxDepth The function does an equality check later to terminate the recursion, but that won't work if its starts out too high. Similar assert already exists in computeKnownBits. llvm-svn: 311400	2017-08-21 22:56:12 +00:00
Haicheng Wu	0812c5bea3	[InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes. Currently, the inline cost model will bail once the inline cost exceeds the inline threshold in order to avoid unnecessary compile-time. However, when debugging it is useful to compute the full cost, so this command line option is added to override the default behavior. I took over this work from Chad Rosier (mcrosier@codeaurora.org). Differential Revision: https://reviews.llvm.org/D35850 llvm-svn: 311371	2017-08-21 20:00:09 +00:00
Chad Rosier	4eb18742ca	[InlineCost] Add more debug during inline cost computation. llvm-svn: 311370	2017-08-21 19:56:46 +00:00
Eugene Zelenko	be709f2c19	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311212	2017-08-18 23:51:26 +00:00
Amjad Aboud	88ffa3afe2	[InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply instruction. Differential Revision: https://reviews.llvm.org/D36679 llvm-svn: 311206	2017-08-18 22:56:55 +00:00
Eugene Zelenko	bb1b2d09cf	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311048	2017-08-16 22:07:40 +00:00
Sanjay Patel	042a53624c	[DemandedBits] simplify call; NFC llvm-svn: 311009	2017-08-16 14:28:23 +00:00
Craig Topper	b1e4b1a070	[InstSimplify] Teach decomposeBitTestICmp to handle non-canonical compares This adds support non-canonical compare predicates. InstSimplify can't rely on canonicalization to have occurred. Differential Revision: https://reviews.llvm.org/D36646 llvm-svn: 310893	2017-08-14 22:11:43 +00:00
Craig Topper	0aa3a19512	Recommit r310869, "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify" This recommits r310869, with the moved files and no extra changes. Original commit message: This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too. I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself. I also had to make decomposeBitTest support vectors since InstSimplify needs that. As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library. Differential Revision: https://reviews.llvm.org/D36593 llvm-svn: 310889	2017-08-14 21:39:51 +00:00
Chandler Carruth	bba762a13f	[InlineCost] Refactor the checks for different analyses to be a bit more localized to the code that uses those analyses. Technically, this can change behavior as we no longer require the existence of the ProfileSummaryInfo analysis to use local profile information via BFI. We didn't actually require the PSI to have an interesting profile though, so this only really impacts the behavior in non-default pass pipelines. IMO, this makes it substantially less surprising how everything works -- before an analysis that wasn't actually used had to exist to trigger any profile aware inlining. I think the new organization makes it more obvious where various checks for profile signals happen. Differential Revision: https://reviews.llvm.org/D36710 llvm-svn: 310888	2017-08-14 21:25:00 +00:00
Andrew Kaylor	53a5fbb45f	Add strictfp attribute to prevent unwanted optimizations of libm calls Differential Revision: https://reviews.llvm.org/D34163 llvm-svn: 310885	2017-08-14 21:15:13 +00:00
Craig Topper	69fa8e0d99	Revert r310869 "[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify" Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything. llvm-svn: 310873	2017-08-14 19:09:32 +00:00
Craig Topper	9c7b881677	Revert r310870 "[InstCombine][InstSimplify] 'git add' two files that moved in r310869." An extra change crept in here. llvm-svn: 310872	2017-08-14 19:09:28 +00:00
Craig Topper	914c836842	[InstCombine][InstSimplify] 'git add' two files that moved in r310869. llvm-svn: 310870	2017-08-14 19:01:32 +00:00
Craig Topper	2f0b450666	[InstSimplify][InstCombine] Modify the interface of decomposeBitTestICmp and use it in the InstSimplify This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too. I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself. I also had to make decomposeBitTest support vectors since InstSimplify needs that. As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library. Differential Revision: https://reviews.llvm.org/D36593 llvm-svn: 310869	2017-08-14 18:49:42 +00:00
Hal Finkel	b03dd4be70	[ValueTracking] Don't delete assumes of side-effectful instructions ValueTracking has to strike a balance when attempting to propagate information backwards from assumes, because if the information is trivially propagated backwards, it can appear to LLVM that the assumption is known to be true, and therefore can be removed. This is sound (because an assumption has no semantic effect except for causing UB), but prevents the assume from allowing further optimizations. The isEphemeralValueOf check exists to try and prevent this issue by not removing the source of an assumption. This tries to make it a little bit more general to handle the case of side-effectful instructions, such as in %0 = call i1 @get_val() %1 = xor i1 %0, true call void @llvm.assume(i1 %1) Patch by Ariel Ben-Yehuda, thanks! Differential Revision: https://reviews.llvm.org/D36590 llvm-svn: 310859	2017-08-14 17:11:43 +00:00
Chandler Carruth	37c7b08710	[ValueTracking] Revert r310583 which enabled functionality that still is causing compile time issues. Moreover, the patch deleted the flag in addition to changing the default, and links to a code review that doesn't even discuss the flag and just has an update to a Clang test case. I've followed up on the commit thread to ask for numbers on compile time at this point, leaving the flag in place until things stabilize, and pointing at specific code that seems to exhibit excessive compile time with this patch. Original commit message for r310583: """ [ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. """" llvm-svn: 310816	2017-08-14 07:03:24 +00:00
Eugene Zelenko	530851c2bc	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310766	2017-08-11 21:30:02 +00:00
Chandler Carruth	19913b22c0	[PM] Switch the CGSCC debug messages to use the standard LLVM debug printing techniques with a DEBUG_TYPE controlling them. It was a mistake to start re-purposing the pass manager `DebugLogging` variable for generic debug printing -- those logs are intended to be very minimal and primarily used for testing. More detailed and comprehensive logging doesn't make sense there (it would only make for brittle tests). Moreover, we kept forgetting to propagate the `DebugLogging` variable to various places making it also ineffective and/or unavailable. Switching to `DEBUG_TYPE` makes this a non-issue. llvm-svn: 310695	2017-08-11 05:47:13 +00:00
Nikolai Bozhenov	d97136c182	[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 310583	2017-08-10 11:24:57 +00:00
Chandler Carruth	9c161e894a	[LCG] Fix an assert in a on-scope-exit lambda that checked the contents of the returned value. Checking the returned value from inside of a scoped exit isn't actually valid. It happens to work when NRVO fires and the stars align, which they reliably do with Clang but don't, for example, on MSVC builds. llvm-svn: 310547	2017-08-10 03:05:21 +00:00
Hiroshi Yamauchi	ccd412f48d	[LVI] Fix LVI compile time regression around constantFoldUser() Summary: Avoid checking each operand and calling getValueFromCondition() before calling constantFoldUser() when the instruction type isn't supported by constantFoldUser(). This fixes a large compile time regression in an internal build. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36552 llvm-svn: 310545	2017-08-10 02:23:14 +00:00
Craig Topper	ba69187988	[InstSimplify] Add test cases that show that simplifySelectWithICmpCond doesn't work with non-canonical comparisons. llvm-svn: 310542	2017-08-10 01:02:02 +00:00
Nuno Lopes	7829506731	CFLAA: return MustAlias when pointers p, q are equal, i.e., must-alias(p, sz_p, p, sz_q) irrespective of access sizes sz_p, sz_q As discussed a couple of weeks ago on the ML. This makes the behavior consistent with that of BasicAA. AA clients already check the obj size themselves and may not require the obj size to match exactly the access size (e.g., in case of store forwarding) llvm-svn: 310495	2017-08-09 17:02:18 +00:00
Davide Italiano	1a943a90f5	[ValueTracking] Turn a test into an assertion. As discussed with Chad, this should never happen, but this assertion is basically free, so, keep it around just in case. llvm-svn: 310493	2017-08-09 16:06:54 +00:00
Davide Italiano	30e5194287	[ValueTracking] Honour recursion limit. The recently improved support for `icmp` in ValueTracking (r307304) exposes the fact that `isImplied` condition doesn't really bail out if we hit the recursion limit (and calls `computeKnownBits` which increases the depth and asserts). Differential Revision: https://reviews.llvm.org/D36512 llvm-svn: 310481	2017-08-09 15:13:50 +00:00
Jonas Paulsson	6228aeda65	[LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess() isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463	2017-08-09 11:28:01 +00:00
Chandler Carruth	2cd28b2ba0	[LCG] Completely remove the map-based association of post-order numbers to Nodes when removing ref edges from a RefSCC. This map based association turns out to be pretty expensive for large RefSCCs and pointless as we already have embedded data members inside nodes that we use to track the DFS state. We can reuse one of those and the map becomes unnecessary. This also fuses the update of those numbers into the scan across the pending stack of nodes so that we don't walk the nodes twice during the DFS. With this I expect the new PM to be faster than the old PM for the test case I have been optimizing. That said, it also seems simpler and more direct in many ways. The side storage was always pretty awkward. The last remaining hot-spot in the profile of the LCG once this is done will be the edge iterator walk in the DFS. I'll take a look at improving that next. llvm-svn: 310456	2017-08-09 09:37:39 +00:00
Chandler Carruth	9c3deaa653	[LCG] Special case when removing a ref edge from a RefSCC leaves that RefSCC still connected. This is common and can be handled much more efficiently. As soon as we know we've covered every node in the RefSCC with the DFS, we can simply reset our state and return. This avoids numerous data structure updates and other complexity. On top of other changes, this appears to get new PM back to parity with the old PM for a large protocol buffer message source code. The dense map updates are very hot in this function. llvm-svn: 310451	2017-08-09 09:14:34 +00:00
Chandler Carruth	23c2f44cc7	[LCG] Switch one of the update methods for the LazyCallGraph to support limited batch updates. Specifically, allow removing multiple reference edges starting from a common source node. There are a few constraints that play into supporting this form of batching: 1) The way updates occur during the CGSCC walk, about the most we can functionally batch together are those with a common source node. This also makes the batching simpler to implement, so it seems a worthwhile restriction. 2) The far and away hottest function for large C++ files I measured (generated code for protocol buffers) showed a huge amount of time was spent removing ref edges specifically, so it seems worth focusing there. 3) The algorithm for removing ref edges is very amenable to this restricted batching. There are just both API and implementation special casing for the non-batch case that gets in the way. Once removed, supporting batches is nearly trivial. This does modify the API in an interesting way -- now, we only preserve the target RefSCC when the RefSCC structure is unchanged. In the face of any splits, we create brand new RefSCC objects. However, all of the users were OK with it that I could find. Only the unittest needed interesting updates here. How much does batching these updates help? I instrumented the compiler when run over a very large generated source file for a protocol buffer and found that the majority of updates are intrinsically updating one function at a time. However, nearly 40% of the total ref edges removed are removed as part of a batch of removals greater than one, so these are the cases batching can help with. When compiling the IR for this file with 'opt' and 'O3', this patch reduces the total time by 8-9%. Differential Revision: https://reviews.llvm.org/D36352 llvm-svn: 310450	2017-08-09 09:05:27 +00:00
Nuno Lopes	598d1632e1	BasicAA: assert on another case where aliasGEP shouldn't get a PartialAlias response llvm-svn: 310420	2017-08-08 21:25:26 +00:00
Dehao Chen	34cfcb29aa	Make ICP uses PSI to check for hotness. Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted. Reviewers: davidxl, tejohnson, eraman Reviewed By: davidxl Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36341 llvm-svn: 310416	2017-08-08 20:57:33 +00:00
Craig Topper	b498a23f0e	[KnownBits][ValueTracking] Move the math for calculating known bits for add/sub into a static method in KnownBits object I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier. Wonder if we should use it in SelectionDAG's computeKnownBits too. Differential Revision: https://reviews.llvm.org/D36433 llvm-svn: 310378	2017-08-08 16:29:35 +00:00
Nuno Lopes	c7d4110aa7	BasicAA: aliasGEP shouldn't get a PartialAlias response here add an assert() to ensure that's the case (as I'm not convinced it won't happen) llvm-svn: 310373	2017-08-08 16:13:24 +00:00
Chandler Carruth	6e35c31d2d	[PM] Fix a likely more critical infloop bug in the CGSCC pass manager. This was just a bad oversight on my part. The code in question should never have worked without this fix. But it turns out, there are relatively few places that involve libfunctions that participate in a single SCC, and unless they do, this happens to not matter. The effect of not having this correct is that each time through this routine, the edge from write_wrapper to write was toggled between a call edge and a ref edge. First time through, it becomes a demoted call edge and is turned into a ref edge. Next time it is a promoted call edge from a ref edge. On, and on it goes forever. I've added the asserts which should have always been here to catch silly mistakes like this in the future as well as a test case that will actually infloop without the fix. The other (much scarier) infinite-inlining issue I think didn't actually occur in practice, and I simply misdiagnosed this minor issue as that much more scary issue. The other issue is still a real issue, but I'm somewhat relieved that so far it hasn't happened in real-world code yet... llvm-svn: 310342	2017-08-08 10:13:23 +00:00
Chandler Carruth	691d0243a5	[LCG] Remove yet another variable only used inside of asserts. llvm-svn: 310174	2017-08-05 08:33:16 +00:00
Benjamin Kramer	ef42fd43f4	[LCG] Fold otherwise unused variable into assert. No functionality change intended. llvm-svn: 310173	2017-08-05 08:28:48 +00:00
Chandler Carruth	adbf14ab85	[LCG] Completely remove the parent set and leaf tracking for RefSCCs. After the previous series of patches, this is now trivial and deletes a pretty astonishing amount of complexity. This has been a long time coming, as the move toward a PO sequence of RefSCCs started eroding the underlying use cases for this half of the data structure. Among the biggest advantages here is that now there aren't two independent data structures that need to stay in sync. Some of my profiling has also indicated that updating the parent sets was among the most expensive parts of the lazy call graph. Eliminating it whole sale is likely to be a nice win in terms of compile time. Last but not least, I had discussed with some folks previously keeping it around for asserts and other correctness checking, but once the fundamentals of the parent and child checking were implemented without the parent sets their value in correctness checking was tiny and no where near worth the cost of the complexity required to keep everything up-to-date. llvm-svn: 310171	2017-08-05 07:37:00 +00:00
Chandler Carruth	38bd6b50ef	[LCG] Re-implement the basic isParentOf, isAncestorOf, isChildOf, and isDescendantOf methods on RefSCCs in terms of the forward edges rather than the parent sets. This is technically slower, but probably not interestingly slower, and all of these routines were already so expensive that they're guarded behind both !NDEBUG and EXPENSIVE_CHECKS. This removes another non-critical usage of parent sets. I've also added some comments to try and help clarify to any potential users the costs of these routines. They're mostly useful for debugging, asserts, or other queries. llvm-svn: 310170	2017-08-05 06:24:09 +00:00
Chandler Carruth	c718b8e7c3	[LCG] Add the concept of a "dead" node and use it to avoid a complex walk over the parent set. When removing a single function from the call graph, we previously would walk the entire RefSCC's parent set and then walk every outgoing edge just to find the ones to remove. In addition to this being quite high complexity in theory, it is also the last fundamental use of the parent sets. With this change, when we remove a function we transform the node containing it to be recognizably "dead" and then teach the edge iterators to recognize edges to such nodes and skip them the same way they skip null edges. We can't move fully to using "dead" nodes -- when disconnecting two live nodes we need to null out the edge. But the complexity this adds to the edge sequence isn't too bad and the simplification of lazily handling this seems like a significant win. llvm-svn: 310169	2017-08-05 05:47:37 +00:00
Chandler Carruth	39df40d8c2	[LCG] Replace an implicit bool operator with a named function. (NFC) The definition of 'false' here was already pretty vague and debatable, and I'm about to add another potential 'false' that would actually make much more sense in a bool operator. Especially given how rarely this is used, a nicely named method seems better. llvm-svn: 310165	2017-08-05 04:04:06 +00:00
Chandler Carruth	403d3c4b2b	[LCG] When removing a dead function and clearing out the data structures, actually null out the graph pointers as well. We won't ever update these, and we certainly shouldn't be calling any methods on them, so it seems good to defensively nuke them. llvm-svn: 310164	2017-08-05 03:37:39 +00:00
Chandler Carruth	7cb23e705f	[LCG] Rather than walking the directed graph structure to update graph pointers in node objects, just walk the map from function to node. It doesn't have stable ordering, but works just as well and is much simpler. We don't need ordering when just updating internal pointers. llvm-svn: 310163	2017-08-05 03:37:39 +00:00
Chandler Carruth	2c58e1a45c	[LCG] Remove the complex walk of the parent sets to update graph pointers. This is completely unnecessary as we have a trivial list of RefSCCs now that we can walk. llvm-svn: 310162	2017-08-05 03:37:38 +00:00
Chandler Carruth	13ffd110ad	[LCG] Remove the use of the parent sets to compute connectivity when merging RefSCCs. The logic to directly use the reference edges is simpler and not substantially slower (despite the comments to the contrary) because this is not actually an especially hot part of LCG in practice. llvm-svn: 310161	2017-08-05 03:37:37 +00:00
Amara Emerson	56dca4e3ca	[SCEV] Preserve NSW information for sext(subtract). Pushes the sext onto the operands of a Sub if NSW is present. Also adds support for propagating the nowrap flags of the llvm.ssub.with.overflow intrinsic during analysis. Differential Revision: https://reviews.llvm.org/D35256 llvm-svn: 310117	2017-08-04 20:19:46 +00:00
Easwaran Raman	ff77cc750c	[Inliner] Fix a typo in option description. NFC. llvm-svn: 310073	2017-08-04 17:15:17 +00:00
Craig Topper	4e22ee6745	[ConstantInt] Use ConstantInt::getValue instead of Constant::getUniqueInteger in a few places where we obviously have a ConstantInt. NFC getUniqueInteger will ultimately call ConstantInt::getValue, but calling ConstantInt::getValue should be inlined. llvm-svn: 310069	2017-08-04 16:59:29 +00:00
Dehao Chen	63799512b2	Adjust the hotness threshold from 99.9% to 99%. Summary: We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D36317 llvm-svn: 310065	2017-08-04 16:20:54 +00:00
Charles Saternos	75da10d1b2	[ThinLTO] Add FunctionAttrs to ThinLTO index Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules. llvm-svn: 310061	2017-08-04 16:00:58 +00:00
Nikolai Bozhenov	1545eb3408	[InstCombine] Canonicalize clamp of float types to minmax in fast mode. Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 310054	2017-08-04 12:22:17 +00:00
Teresa Johnson	8482e56920	Use profile summary to disable peeling for huge working sets Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005	2017-08-03 23:42:58 +00:00
Easwaran Raman	974d4eea93	[Inliner] Increase threshold for hot callsites without PGO. Summary: This increases the inlining threshold for hot callsites. Hotness is defined in terms of block frequency of the callsite relative to the caller's entry block's frequency. Since this requires BFI in the inliner, this only affects the new PM pipeline. This is enabled by default at -O3. This improves the performance of some internal benchmarks. Notably, an internal benchmark for Gipfeli compression (https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006 improves by ~2.5%. I am running more experiments and will update the thread if other benchmarks show improvement/regression. In terms of text size, LLVM test-suite shows an 1.22% text size increase. Diving into the results, 13 of the benchmarks in the test-suite increases by > 10%. Most of these are small, but Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase) have >250K text size. On a large application, the text size increases by 2% Reviewers: chandlerc, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36199 llvm-svn: 309994	2017-08-03 22:23:33 +00:00
Hiroshi Yamauchi	144ee2b4d7	[LVI] Constant-propagate a zero extension of the switch condition value through case edges Summary: (This is a second attempt as https://reviews.llvm.org/D34822 was reverted.) LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36247 llvm-svn: 309986	2017-08-03 21:11:30 +00:00
Dehao Chen	f58df39529	Do not want to use BFI to get profile count for sample pgo Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate. Reviewers: tejohnson, davidxl, eraman Reviewed By: eraman Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36025 llvm-svn: 309964	2017-08-03 17:11:41 +00:00
Max Kazantsev	2cb3653404	[SCEV] Re-enable "Cache results of computeExitLimit" The patch rL309080 was reverted because it did not clean up the cache on "forgetValue" method call. This patch re-enables this change, adds the missing check and introduces two new unit tests that make sure that the cache is cleaned properly. Differential Revision: https://reviews.llvm.org/D36087 llvm-svn: 309925	2017-08-03 08:41:30 +00:00
Hiroshi Inoue	0bd906ec8f	[StackColoring] Update AliasAnalysis information in stack coloring pass (part 2) This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments. Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types. Actually, there is a FIXME comment in StackColoring.cpp // FIXME: In order to enable the use of TBAA when using AA in CodeGen, // we'll also need to update the TBAA nodes in MMOs with values // derived from the merged allocas. But, TBAA has been already enabled in CodeGen without fixing this pass. The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling. Although we observed the problem on ppc64le, this is a platform neutral issue. This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots. This patch fixes PR33928. llvm-svn: 309849	2017-08-02 18:16:32 +00:00
Chad Rosier	5ce28f4f92	[InlineCost] Remove redundant call. NFC. llvm-svn: 309819	2017-08-02 14:50:27 +00:00
Chad Rosier	2e1c050e52	[InlineCost] Simplify more 'and' and 'or' operations. Differential Revision: https://reviews.llvm.org/D35856 llvm-svn: 309817	2017-08-02 14:40:42 +00:00
Sanjoy Das	4cad61adb3	[SCEV/IndVars] Always compute loop exiting values if the backedge count is 0 If SCEV can prove that the backedge taken count for a loop is zero, it does not need to "understand" a recursive PHI to compute its exiting value. This should fix PR33885. llvm-svn: 309758	2017-08-01 22:37:58 +00:00
Chad Rosier	dfd1de687d	[Value Tracking] Default argument to true and rename accordingly. NFC. IMHO this is a bit more readable. llvm-svn: 309739	2017-08-01 20:18:54 +00:00
Chad Rosier	f73a10d2df	[Value Tracking] Refactor and/or logic into helper. NFC. llvm-svn: 309726	2017-08-01 19:22:36 +00:00
Chandler Carruth	3c6a820ce3	[PM] Add a comment clarifying what a particular predicate is doing. This came up as a point of confusion while working on a fundamental problem with the combination of CGSCC iteration and the inliner. llvm-svn: 309662	2017-08-01 06:40:11 +00:00
Daniel Jasper	43cd2ef49c	Revert r309415: "[LVI] Constant-propagate a zero extension of the switch condition value through case edges" This causes assertion failures in (a somewhat old version of) SpiderMonkey. I have already forwarded reproduction instructions to the patch author. llvm-svn: 309659	2017-08-01 05:30:49 +00:00
Hiroshi Inoue	b9417dbd48	[StackColoring] Update AliasAnalysis information in stack coloring pass Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types. Actually, there is a FIXME comment in StackColoring.cpp // FIXME: In order to enable the use of TBAA when using AA in CodeGen, // we'll also need to update the TBAA nodes in MMOs with values // derived from the merged allocas. But, TBAA has been already enabled in CodeGen without fixing this pass. The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling. Although we observed the problem on ppc64le, this is a platform neutral issue. This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots. llvm-svn: 309651	2017-08-01 03:32:15 +00:00
Alina Sbirlea	967e7966fc	Allow None as a MemoryLocation to getModRefInfo Summary: Adding part of the changes in D30369 (needed to make progress): Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA. Original summary from D30369, by dberlin: Currently, we have instructions which affect memory but have no memory location. If you call, for example, MemoryLocation::get on a fence, it asserts. This means things specifically have to avoid that. It also means we end up with a copy of each API, one taking a memory location, one not. This starts to fix that. We add MemoryLocation::getOrNone as a new call, and reimplement the old asserting version in terms of it. We make MemoryLocation optional in the (Instruction, MemoryLocation) version of getModRefInfo, and kill the old one argument version in favor of passing None (it had one caller). Now both can handle fences because you can just use MemoryLocation::getOrNone on an instruction and it will return a correct answer. We use all this to clean up part of MemorySSA that had to handle this difference. Note that literally every actual getModRefInfo interface we have could be made private and replaced with: getModRefInfo(Instruction, Optional<MemoryLocation>) and getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>) and delegating to the right ones, if we wanted to. I have not attempted to do this yet. Reviewers: dberlin, davide, dblaikie Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits Differential Revision: https://reviews.llvm.org/D35441 llvm-svn: 309641	2017-08-01 00:28:29 +00:00
Alexey Bataev	0ab22bb991	[SLP] Initial rework for min/max horizontal reduction vectorization, NFC. Summary: All getReductionCost() functions are renamed to getArithmeticReductionCost() + added basic infrastructure to handle non-binary reduction operations. Reviewers: spatel, mzolotukhin, Ayal, mkuper, gilr, hfinkel Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D29402 llvm-svn: 309566	2017-07-31 14:36:05 +00:00
Alexey Bataev	3e9b3eb91d	[Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC. llvm-svn: 309563	2017-07-31 14:19:32 +00:00
Sanjoy Das	b5a968f62d	[SCEV] Change an early exit to an assert; NFC llvm-svn: 309480	2017-07-29 05:32:47 +00:00
Easwaran Raman	51b809bf2f	[Inliner] Do not apply any bonus for cold callsites. Summary: Inlining threshold is increased by application of bonuses when the callee has a single reachable basic block or is rich in vector instructions. Similarly, inlining cost is reduced by applying a large bonus when the last call to a static function is considered for inlining. This patch disables the application of these bonuses when the callsite or the callee is cold. The intention here is to prevent a large cold callsite from being inlined to a non-cold caller that could prevent the caller from being inlined. This is especially important when the cold callsite is a last call to a static since the associated bonus is very high. Reviewers: chandlerc, davidxl Subscribers: danielcdh, llvm-commits Differential Revision: https://reviews.llvm.org/D35823 llvm-svn: 309441	2017-07-28 21:47:36 +00:00
Chad Rosier	2f49803c1f	[Value Tracking] Refactor icmp comparison logic into helper. NFC. llvm-svn: 309417	2017-07-28 18:47:43 +00:00
Hiroshi Yamauchi	1b179bc5ff	[LVI] Constant-propagate a zero extension of the switch condition value through case edges Summary: LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: wmi, dberlin, sanjoy Reviewed By: sanjoy Subscribers: davide, davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D34822 llvm-svn: 309415	2017-07-28 18:35:25 +00:00
Chad Rosier	e42b44b87d	[ValueTracking] Remove a number of unused arguments. NFC. llvm-svn: 309385	2017-07-28 14:39:06 +00:00
Max Kazantsev	fa4969539a	[SCEV] Do not visit nodes twice in containsConstantSomewhere This patch reworks the function that searches constants in Add and Mul SCEV expression chains so that now it does not visit a node more than once, and also renames this function for better correspondence between its implementation and semantics. Differential Revision: https://reviews.llvm.org/D35931 llvm-svn: 309367	2017-07-28 06:42:15 +00:00
Sanjoy Das	843ab57457	Revert "[SCEV] Cache results of computeExitLimit" This reverts commit r309080. The patch needs to clear out the ScalarEvolution::ExitLimits cache in forgetMemoizedResults. I've replied on the commit thread for the patch with more details. llvm-svn: 309357	2017-07-28 03:25:07 +00:00
Dehao Chen	e70a472bad	Changing the default MaxNumPromotions from 2 to 3. Summary: In performance tuning, we see performance benefits when enlarge the maximum num promotion targets to 3. This is safe as soon as we have total percentage threshold properly setup (https://reviews.llvm.org/D35962) Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D35966 llvm-svn: 309346	2017-07-28 01:03:10 +00:00
Dehao Chen	f4240b5b91	Separate the ICP total threshold and remaining threshold. Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively. Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D35962 llvm-svn: 309345	2017-07-28 01:02:54 +00:00
Evgeny Astigeevich	61c1bd5abc	[InlineCost, NFC] Change CallAnalyzer::isGEPFree to use TTI::getUserCost instead of TTI::getGEPCost Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free. TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies (see https://reviews.llvm.org/D31186 for example). There is TTI::getUserCost which can calculate the cost more accurately by taking dependencies into account. Differential Revision: https://reviews.llvm.org/D33685 llvm-svn: 309268	2017-07-27 12:49:27 +00:00
Mohammed Agabaria	cef53dcb6f	[TTI] fixing a bug in the isLegalMaskedScatter API isLegalMaskedScatter called the Gather version which is a bug. use test case is provided within the patch of AVX2 gathers at: https://reviews.llvm.org/D35772 Differential Revision: https://reviews.llvm.org/D35786 llvm-svn: 309260	2017-07-27 10:28:16 +00:00
Max Kazantsev	f282aed428	[SCEV] Cache results of computeExitLimit This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of tests that take extensive time to compile are attached to the bug 33494. Differential Revision: https://reviews.llvm.org/D35827 llvm-svn: 309080	2017-07-26 04:55:54 +00:00
Sanjoy Das	469e740f2b	[SCEV] Remove unnecessary call to forgetMemoizedResults `SCEVUnknown::allUsesReplacedWith` does not need to call `forgetMemoizedResults` since RAUW does a value-equivalent replacement by assumption. If this assumption was false then the later setValPtr(New) call would be incorrect too. This is a non-trivial performance optimization for functions with a large number of loops since `forgetMemoizedResults` walks all loop backedge taken counts to see if any of them use the SCEVUnknown being RAUWed. However, this improvement is difficult to demonstrate without checking in an excessively large IR file. llvm-svn: 309072	2017-07-26 01:32:19 +00:00
Eugene Zelenko	48666a694c	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 308936	2017-07-24 23:16:33 +00:00
Max Kazantsev	0e9e0796f4	[SCEV] Limit max size of AddRecExpr during evolving When SCEV calculates product of two SCEVAddRecs from the same loop, it tries to combine them into one big AddRecExpr. If the sizes of the initial SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every operand of the resulting SCEV is combined from operands of initial SCEV and has much higher complexity than they have. As result, if we try to calculate something like: %x1 = {a,+,b} %x2 = mul i32 %x1, %x1 %x3 = mul i32 %x2, %x1 %x4 = mul i32 %x3, %x2 ... The size of such SCEVs grows as `2^N`, and the arguments become more and more complex as we go forth. This leads to long compilation and huge memory consumption. This patch sets a limit after which we don't try to combine two `SCEVAddRecExpr`s into one. By default, max allowed size of the resulting AddRecExpr is set to 16. Differential Revision: https://reviews.llvm.org/D35664 llvm-svn: 308847	2017-07-23 15:40:19 +00:00
Eugene Zelenko	38c02bc7f5	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 308787	2017-07-21 21:37:46 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Chandler Carruth	06a86301a1	[PM/LCG] Follow-up fix to r308088 to handle deletion of library functions. In the prior commit, we provide ordering to the LCG between functions and library function definitions that they might begin to call through transformations. But we still would delete these library functions from the call graph if they became dead during inlining. While this immediately crashed, it also exposed a loss of information. We shouldn't remove definitions of library functions that can still usefully participate in the LCG-powered CGSCC optimization process. If new call edges are formed, we want to have definitions to be called. We can still remove these functions if truly dead using global-dce, etc, but removing them during the CGSCC walk is premature. This fixes a crash in the new PM when optimizing some unusual libraries that end up with "internal" lib functions such as the code in the "R" language's libraries. llvm-svn: 308417	2017-07-19 04:12:25 +00:00
Dorit Nuzman	ca4fd18ddc	PSCEV] Create AddRec for Phis in cases of possible integer overflow, using runtime checks Extend the SCEVPredicateRewriter to work a bit harder when it encounters an UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes whose update chain involves casts that can be ignored under the proper runtime overflow test. This is one step towards addressing PR30654. Differential revision: http://reviews.llvm.org/D30041 llvm-svn: 308299	2017-07-18 11:57:08 +00:00
Craig Topper	9e465894f8	[Analysis] RemoveTotalMemInst counting in InstCount to avoid reading back other Statistic variables Summary: Previously, we counted TotalMemInst by reading certain instruction counters before and after calling visit and then finding the difference. But that wouldn't be thread safe if this same pass was being ran on multiple threads. This list of "memory instructions" doesn't make sense to me as it includes call/invoke and is missing atomics. This patch removes the counter all together. Reviewers: hfinkel, chandlerc, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D33608 llvm-svn: 308260	2017-07-18 02:41:12 +00:00
NAKAMURA Takumi	5869ba8792	Analysis/MemorySSA.cpp: Prune unused "llvm/Transforms/Scalar.h". llvm-svn: 308162	2017-07-17 04:31:26 +00:00
Craig Topper	dad7d8dfb0	[InstSimplify] Use commutable matchers to simplify some code. NFC llvm-svn: 308125	2017-07-16 06:57:41 +00:00
Chandler Carruth	f59a838720	[PM/LCG] Teach the LazyCallGraph to maintain reference edges from every function to every defined function known to LLVM as a library function. LLVM can introduce calls to these functions either by replacing other library calls or by recognizing patterns (such as memset_pattern or vector math patterns) and replacing those with calls. When these library functions are actually defined in the module, we need to have reference edges to them initially so that we visit them during the CGSCC walk in the right order and can effectively rebuild the call graph afterward. This was discovered when building code with Fortify enabled as that is a common case of both inline definitions of library calls and simplifications of code into calling them. This can in extreme cases of LTO-ing with libc introduce many more reference edges. I discussed a bunch of different options with folks but all of them are unsatisfying. They either make the graph operations substantially more complex even when there are no defined libfuncs, or they introduce some other complexity into the callgraph. So this patch goes with the simplest possible solution of actual synthetic reference edges. If this proves to be a memory problem, I'm happy to implement one of the clever techniques to save memory here. llvm-svn: 308088	2017-07-15 08:08:19 +00:00
Haicheng Wu	abdef9ee7e	[TTI] Refine the cost of EXT in getUserCost() Now, getUserCost() only checks the src and dst types of EXT to decide it is free or not. This change first checks the types, then calls isExtFreeImpl(), and check if EXT can form ExtLoad at last. Currently, only AArch64 has customized implementation of isExtFreeImpl() to check if EXT can be folded into its use. Differential Revision: https://reviews.llvm.org/D34458 llvm-svn: 308076	2017-07-15 02:12:16 +00:00
Jakub Kuderski	b292c22c8d	[Dominators] Make IsPostDominator a template parameter Summary: DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere. This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction. Reviewers: dberlin, sanjoy, davide, grosser Reviewed By: dberlin Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35315 llvm-svn: 308040	2017-07-14 18:26:09 +00:00
Chandler Carruth	051bdb0b22	[PM] Fix a silly bug in my recent update to the CG update logic. I used the wrong variable to update. This was even covered by a unittest I wrote, and the comments for the unittest were correct (if confusing) but the test itself just matched the buggy behavior. =[ llvm-svn: 307764	2017-07-12 09:08:11 +00:00
Mikael Holmen	ad7e718307	[MemoryBuiltins] Allow truncation in visitAllocaInst() Summary: Solves PR33689. If the pointer size is less than the size of the type used for the array size in an alloca (the <ty> type below) then we could trigger the assert in the PR. In that example we have pointer size i16 and <ty> is i32. <result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] Handle the situation by allowing truncation as well as zero extension in ObjectSizeOffsetVisitor::visitAllocaInst(). Also, we now detect overflow in visitAllocaInst(), similar to how it was already done in visitCallSite(). Reviewers: craig.topper, rnk, george.burgess.iv Reviewed By: george.burgess.iv Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D35003 llvm-svn: 307754	2017-07-12 06:19:10 +00:00
NAKAMURA Takumi	a089dd86a3	Whitespace. llvm-svn: 307614	2017-07-11 02:31:54 +00:00
NAKAMURA Takumi	76bab1f20b	Revert r307581, "Avoid doing conservative phi checks in aliasSameBasePointerGEPs() if no phis have been visited yet." It broke stage2 tests in selfhosting. llvm-svn: 307613	2017-07-11 02:31:51 +00:00
Farhana Aleen	2ff973f2a5	Avoid doing conservative phi checks in aliasSameBasePointerGEPs() if no phis have been visited yet. Reviewers: Daniel Berlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34478 llvm-svn: 307581	2017-07-10 20:15:40 +00:00
Hiroshi Inoue	a86c920b1e	fix typos in comments and error messages; NFC llvm-svn: 307533	2017-07-10 12:44:25 +00:00
Chandler Carruth	c213c67df8	[PM] Fix a nasty bug in the new PM where we failed to properly invalidation of analyses when merging SCCs. While I've added a bunch of testing of this, it takes something much more like the inliner to really trigger this as you need to have partially-analyzed SCCs with updates at just the right time. So I've added a direct test for this using the inliner and verifying the domtree. Without the changes here, this test ends up finding a stale dominator tree. However, to handle this properly, we need to invalidate analyses before merging the SCCs. After talking to Philip and Sanjoy about this they convinced me this was the right approach. To do this, we need a callback mechanism when merging SCCs so we can observe the cycle that will be merged before the merge happens. This API update ended up being surprisingly easy. With this commit, the new PM passes the test-suite again. It hadn't since MemorySSA was enabled for EarlyCSE as that also will find this bug very quickly. llvm-svn: 307498	2017-07-09 13:45:11 +00:00
Chandler Carruth	7c8964d885	[PM] Add unittesting of the call graph update logic with complex dependencies between analyses. This uncovers even more issues with the proxies and the splitting apart of SCCs which are fixed in this patch. I discovered this while trying to add more rigorous testing for a change I'm making to the call graph update invalidation logic. llvm-svn: 307497	2017-07-09 13:16:55 +00:00
Craig Topper	fde4723ebe	[IR] Add Type::isIntOrIntVectorTy(unsigned) similar to the existing isIntegerTy(unsigned), but also works for vectors. llvm-svn: 307492	2017-07-09 07:04:03 +00:00
Craig Topper	95d2347ae1	[IR] Make use of Type::isPtrOrPtrVectorTy/isIntOrIntVectorTy/isFPOrFPVectorTy to shorten code. NFC llvm-svn: 307491	2017-07-09 07:04:00 +00:00
Hiroshi Inoue	713b5ba2de	fix trivial typos; NFC sucessor -> successor llvm-svn: 307488	2017-07-09 05:54:44 +00:00
Chandler Carruth	bd9c29039e	[PM] Finish implementing and fix a chain of bugs uncovered by testing the invalidation propagation logic from an SCC to a Function. I wrote the infrastructure to test this but didn't actually use it in the unit test where it was designed to be used. =[ My bad. Once I actually added it to the test case I discovered that it also hadn't been properly implemented, so I've implemented it. The logic in the FAM proxy for an SCC pass to propagate invalidation follows the same ideas as the FAM proxy for a Module pass, but the implementation is a bit different to reflect the fact that it is forwarding just for an SCC. However, implementing this correctly uncovered a surprising "bug" (it was conservatively correct but relatively very expensive) in how we handle invalidation when splitting one SCC into multiple SCCs. We did an eager invalidation when in reality we should be deferring invaliadtion for the current SCC to the CGSCC pass manager and just invaliating the newly constructed SCCs. Otherwise we end up invalidating too much too soon. This was exposed by the inliner test case that I've updated. Now, we invalidate just the split off '(test1_f)' SCC when doing the CG update, and then the inliner finishes and invalidates the '(test1_g, test1_h)' SCC's analyses. The first few attempts at fixing this hit still more bugs, but all of those are covered by existing tests. For example, the inliner should also preserve the FAM proxy to avoid unnecesasry invalidation, and this is safe because the CG update routines it uses handle any necessary adjustments to the FAM proxy. Finally, the unittests for the CGSCC pass manager needed a bunch of updates where we weren't correctly preserving the FAM proxy because it hadn't been fully implemented and failing to preserve it didn't matter. Note that this doesn't yet fix the current crasher due to MemSSA finding a stale dominator tree, but without this the fix to that crasher doesn't really make any sense when testing because it relies on the proxy behavior. llvm-svn: 307487	2017-07-09 03:59:31 +00:00
Dehao Chen	64c46574b0	Increase the import-threshold for crtical functions. Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D35096 llvm-svn: 307439	2017-07-07 21:01:00 +00:00
Sanjay Patel	1bbdf4e11a	[DemandedBits] fix formatting; NFC llvm-svn: 307403	2017-07-07 14:39:26 +00:00
Chad Rosier	3f02123f7c	[ValueTracking] Fix the identity case (LHS => RHS) when the LHS is false. Prior to this commit both of the added test cases were passing. However, in the latter case (test7) we were doing a lot more work to arrive at the same answer (i.e., we were using isImpliedCondMatchingOperands() to determine the implication.). llvm-svn: 307400	2017-07-07 13:55:55 +00:00
Sean Fertile	9cd1cdf814	Extend memcpy expansion in Transform/Utils to handle wider operand types. Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346	2017-07-07 02:00:06 +00:00
Chad Rosier	a72a9ff557	[ValueTracking] Support icmps fed by 'and' and 'or'. This patch adds support for handling some forms of ands and ors in ValueTracking's isImpliedCondition API. PR33611 https://reviews.llvm.org/D34901 llvm-svn: 307304	2017-07-06 20:00:25 +00:00
Craig Topper	ca2c87653c	[Constants] Replace calls to ConstantInt::equalsInt(0)/equalsInt(1) with isZero and isOne. NFCI llvm-svn: 307293	2017-07-06 18:39:49 +00:00
Craig Topper	79ab643da8	[Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292	2017-07-06 18:39:47 +00:00
Brendon Cahoon	cb8c7b912d	[DependenceAnalysis] Make sure base objects are the same when comparing GEPs The dependence analysis was returning incorrect information when using the GEPs to compute dependences. The analysis uses the GEP indices under certain conditions, but was doing it incorrectly when the base objects of the GEP are aliases, but pointing to different locations in the same array. This patch adds another check for the base objects. If the base pointer SCEVs are not equal, then the dependence analysis should fall back on the path that uses the whole SCEV for the dependence check. This fixes PR33567. Differential Revision: https://reviews.llvm.org/D34702 llvm-svn: 307203	2017-07-05 21:35:47 +00:00
Hiroshi Inoue	ef1c2ba22a	fix trivial typos, NFC llvm-svn: 306952	2017-07-01 07:12:15 +00:00
Jakub Kuderski	604a22b9fb	[Dominators] Reapply r306892, r306893, r306893. This reverts commit r306907 and reapplies the patches in the title. The patches used to make one of the CodeGen/ARM/2011-02-07-AntidepClobber.ll test to fail because of a missing null check. llvm-svn: 306919	2017-07-01 00:23:01 +00:00
Brian Gesiak	4ef3daafef	[ORE] Add diagnostics hotness threshold Summary: Add an option to prevent diagnostics that do not meet a minimum hotness threshold from being output. When generating optimization remarks for large codebases with a ton of cold code paths, this option can be used to limit the optimization remark output at a reasonable size. Discussion of this change can be read here: http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html Reviewers: anemet, davidxl, hfinkel Reviewed By: anemet Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34867 llvm-svn: 306912	2017-06-30 23:14:53 +00:00
Jakub Kuderski	0c3d76179c	Revert "[Dominators] Teach IDF to use level information" This reverts commit r306894. Revert "[Dominators] Add NearestCommonDominator verification" This reverts commit r306893. Revert "[Dominators] Keep tree level in DomTreeNode and use it to find NCD and answer dominance queries" This reverts commit r306892. llvm-svn: 306907	2017-06-30 22:56:28 +00:00
Jakub Kuderski	c008779918	[Dominators] Teach IDF to use level information Summary: This patch teaches IteratedDominanceFrontier to use the level information stored in DomTreeNodes instead of calculating it manually. Reviewers: dberlin, sanjoy, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34703 llvm-svn: 306894	2017-06-30 21:51:43 +00:00
Brian Gesiak	44e5f6c4ac	[ORE] Unify spelling as "diagnostics hotness" Summary: To enable profile hotness information in diagnostics output, Clang takes the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an "s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`. LLVM, on the other hand, defines `LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not "diagnostics". It's a small difference, but it's confusing, typo-inducing, and frustrating. Add a new method with the spelling "diagnostics", and "deprecate" the old spelling. Reviewers: anemet, davidxl Reviewed By: anemet Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D34864 llvm-svn: 306848	2017-06-30 18:13:59 +00:00
Nikolai Bozhenov	bde9b14c6f	Revert of r306525: "Canonicalize clamp of float types to minmax" llvm-svn: 306815	2017-06-30 10:39:09 +00:00
Max Kazantsev	8d0322e612	[SCEV] Use depth limit instead of local cache for SExt and ZExt In rL300494 there was an attempt to deal with excessive compile time on invocations of getSign/ZeroExtExpr using local caching. This approach only helps if we request the same SCEV multiple times throughout recursion. But in the bug PR33431 we see a case where we request different values all the time, so caching does not help and the size of the cache grows enormously. In this patch we remove the local cache for this methods and add the recursion depth limit instead, as we do for arithmetics. This gives us a guarantee that the invocation sequence is limited and reasonably short. Differential Revision: https://reviews.llvm.org/D34273 llvm-svn: 306785	2017-06-30 05:04:09 +00:00
Davide Italiano	f6b3d21198	[CFLAA] Remove unneded function declaration. NFCI. llvm-svn: 306754	2017-06-29 22:57:37 +00:00
Alexandre Isoard	41044876fc	Reverting r306695 while investigating failing test case. Failing test case: Transforms/LoopVectorize.iv_outside_user.ll llvm-svn: 306723	2017-06-29 18:48:56 +00:00
Alexandre Isoard	aa29afc756	ScalarEvolution: Add URem support In LLVM IR the following code: %r = urem <ty> %t, %b is equivalent to: %q = udiv <ty> %t, %b %s = mul <ty> nuw %q, %b %r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented with minimal effort this way. Note: While SRem and SDiv are also related this way, SCEV does not provides SDiv yet. llvm-svn: 306695	2017-06-29 16:29:04 +00:00
Florian Hahn	8a44b7be76	[TBAA] Remove metadata keyword from IR examples in comments (NFC). The metadata keyword has been removed from the IR. llvm-svn: 306675	2017-06-29 13:55:23 +00:00
Evgeny Astigeevich	70ed78e504	[TargetTransformInfo, API] Add a list of operands to TTI::getUserCost The changes are a result of discussion of https://reviews.llvm.org/D33685. It solves the following problem: 1. We can inform getGEPCost about simplified indices to help it with calculating the cost. But getGEPCost does not take into account the context which GEPs are used in. 2. We have getUserCost which can take the context into account but we cannot inform about simplified indices. With the changes getUserCost will have access to additional information as getGEPCost has. The one parameter getUserCost is also provided. Differential Revision: https://reviews.llvm.org/D34057 llvm-svn: 306674	2017-06-29 13:42:12 +00:00
Eric Christopher	7ad02eee8a	Fix a typo. llvm-svn: 306599	2017-06-28 21:10:31 +00:00
Geoff Berry	66d9bdbca8	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Nikolai Bozhenov	6710ba07c7	Revert r306528 llvm-svn: 306536	2017-06-28 12:15:13 +00:00
Nikolai Bozhenov	77b5536e4e	[ValueTracking] Enabling existing ValueTracking patch by default. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 306528	2017-06-28 10:08:08 +00:00
Nikolai Bozhenov	b01e6b5a52	[InstCombine] Canonicalize clamp of float types to minmax in fast mode. Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 306525	2017-06-28 09:26:20 +00:00
Easwaran Raman	c5fa6358ba	[NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case Differential Revision: https://reviews.llvm.org/D34312 llvm-svn: 306484	2017-06-27 23:11:18 +00:00
Eugene Zelenko	4f820d0e01	[Analysis] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 306472	2017-06-27 21:52:05 +00:00
Davide Italiano	31d4c1bbbc	[CFLAA] Move a common function to the header to reduce duplication. Differential Revision: https://reviews.llvm.org/D34660 llvm-svn: 306354	2017-06-27 02:25:06 +00:00
Davide Italiano	604c003f5f	[CFLAA] Use raw pointers instead of Optional<Pointer>. NFC. Using Optional<> here doesn't seem to be terribly valuable, but this is not the main point of this change. The change enables us to merge the (now) two identical copies of parentFunctionOfValue() that Steensgaard's and Andersens' provide. llvm-svn: 306351	2017-06-27 00:33:37 +00:00
Davide Italiano	e34a806431	[CFLAA] Change FunctionHandle to be common to Steensgaard's and Andersens' Differential Revision: https://reviews.llvm.org/D34638 llvm-svn: 306348	2017-06-26 23:59:14 +00:00
Davide Italiano	9a02494230	[CFL-AA] Remove unneeded function declaration. NFCI. llvm-svn: 306268	2017-06-26 03:55:41 +00:00
Davide Italiano	f15fb368a3	[MemDep] Cleanup return after else & use `auto`. NFC. llvm-svn: 306255	2017-06-25 22:12:59 +00:00
Xin Tong	70f7512add	[AST] Fix a bug in aliasesUnknownInst. Make sure we are comparing the unknown instructions in the alias set and the instruction interested in. Summary: Make sure we are comparing the unknown instructions in the alias set and the instruction interested in. I believe this is clearly a bug (missed opportunity). I can also add some test cases if desired. Reviewers: hfinkel, davide, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34597 llvm-svn: 306241	2017-06-25 12:55:11 +00:00
Craig Topper	010203964d	[SCEV] Avoid copying ConstantRange just to get the min/max value Summary: This patch changes getRange to getRangeRef and returns a reference to the ConstantRange object stored inside the DenseMap caches. We then take advantage of that to add new helper methods that can return min/max value of a signed or unsigned ConstantRange using that reference without first copying the ConstantRange. getRangeRef calls itself recursively and I believe the reference return is fine for those calls. I've left getSignedRange and getUnsignedRange returning a ConstantRange object so they will make a copy now. This is to ensure safety since the reference will be invalidated if the DenseMap changes. I'm sure there are still more places that can take advantage of the reference and I'll submit future patches as I find them. Reviewers: sanjoy, davide Reviewed By: sanjoy Subscribers: zzheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D32978 llvm-svn: 306229	2017-06-24 23:34:50 +00:00
Hiroshi Inoue	a85d24b73d	fix trivial typos in comment, NFC llvm-svn: 306211	2017-06-24 16:00:26 +00:00
Craig Topper	8bec6a4e1c	[IR][AssumptionCache] Add m_Shift and m_BitwiseLogic matchers to replace a couple m_CombineOr Summary: m_CombineOr isn't very efficient. The code using it is also quite verbose. This patch adds m_Shift and m_BitwiseLogic matchers to make the using code more concise and improve the match efficiency. Reviewers: spatel, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34593 llvm-svn: 306206	2017-06-24 06:27:14 +00:00
Craig Topper	7b66ffe875	[ValueTracking][InstCombine] Use m_Shr instead m_CombineOr(m_LShr, m_AShr). NFC llvm-svn: 306205	2017-06-24 06:24:04 +00:00
Craig Topper	72ee6945af	[Analysis][Transforms] Use commutable matchers instead of m_CombineOr in a few places. NFC llvm-svn: 306204	2017-06-24 06:24:01 +00:00
Vitaly Buka	9c2a036276	Make visible isDereferenceableAndAlignedPointer(..., const APInt &Size, ...) Summary: Used by D34311 and D34467 Reviewers: hfinkel, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34585 llvm-svn: 306193	2017-06-24 01:35:13 +00:00
Jun Bum Lim	506cfb7ab7	[InlineCost] Do not take INT_MAX when Cost is negative Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost. Reviewers: hans, echristo, dmgreen Reviewed By: dmgreen Subscribers: mcrosier, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D34436 llvm-svn: 306118	2017-06-23 16:12:37 +00:00
Craig Topper	2c20c42cb6	[JumpThreading] Teach jump threading how to analyze (and (cmp A, C1), (cmp A, C2)) after InstCombine has turned it into (cmp (add A, C3), C4) Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition. But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare. This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it. Differential Revision: https://reviews.llvm.org/D33262 llvm-svn: 306085	2017-06-23 05:41:35 +00:00
Craig Topper	b60f866a8b	[LVI] Teach LVI to reason about ORs of icmps similar to how it reasons about ANDs of icmps Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see. Reviewers: anna, reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34431 llvm-svn: 306076	2017-06-23 01:08:16 +00:00
Craig Topper	d3711ee93e	[BasicAA] Add type check and Value equality check around code added in r305481. This matches the checks done at the beginning of isKnownNonEqual that this code is partially emulating. Without this we can get assertion failures due to the bit widths of the KnownBits not matching. llvm-svn: 306044	2017-06-22 19:04:14 +00:00
Michael Kruse	47f856095a	[BasicAA] Use MayAlias instead of PartialAlias for fallback. Using various methods, BasicAA tries to determine whether two GetElementPtr memory locations alias when its base pointers are known to be equal. When none of its heuristics are applicable, it falls back to PartialAlias to, according to a comment, protect TBAA making a wrong decision in case of unions and malloc. PartialAlias is not correct, because a PartialAlias result implies that some, but not all, bytes overlap which is not necessarily the case here. AAResults returns the first analysis result that is not MayAlias. BasicAA is always the first alias analysis. When it returns PartialAlias, no other analysis is queried to give a more exact result (which was the intention of returning PartialAlias instead of MayAlias). For instance, ScopedAA could return a more accurate result. The PartialAlias hack was introduced in r131781 (and re-applied in r132632 after some reverts) to fix llvm.org/PR9971 where TBAA returns a wrong NoAlias result due to a union. A test case for the malloc case mentioned in the comment was not provided and I don't think it is affected since it returns an omnipotent char anyway. Since r303851 (https://reviews.llvm.org/D33328) clang does emit specific TBAA for unions anymore (but "omnipotent char" instead). Hence, the PartialAlias workaround is not required anymore. This patch passes the test-suite and check-llvm/check-clang of a self-hoisted build on x64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D34318 llvm-svn: 305938	2017-06-21 18:25:37 +00:00
Max Kazantsev	eac01d4c62	[SCEV] Make MulOpsInlineThreshold lower to avoid excessive compilation time MulOpsInlineThreshold option of SCEV is defaulted to 1000, which is inadequately high. When constructing SCEVs of expressions like: x1 = a * a x2 = x1 * x1 x3 = x2 * x2 ... We actually have huge SCEVs with max allowed amount of operands inlined. Such expressions are easy to get from unrolling of loops looking like x = a for (i = 0; i < n; i++) x = x * x Or more tricky cases where big powers are involved. If some non-linear analysis tries to work with a SCEV that has 1000 operands, it may lead to excessively long compilation. The attached test does not pass within 1 minute with default threshold. This patch decreases its default value to 32, which looks much more reasonable if we use analyzes with complexity O(N^2) or O(N^3) working with SCEV. Differential Revision: https://reviews.llvm.org/D34397 llvm-svn: 305882	2017-06-21 07:28:13 +00:00
Max Kazantsev	0bcf6ec85c	[SCEV][NFC] Fix a misleading description of AddOpsInlineThreshold The description of this option was copy-pasted from another one and does not correspond to reality. Differential Revision: https://reviews.llvm.org/D34390 llvm-svn: 305782	2017-06-20 08:37:31 +00:00
Xin Tong	bb8dbcf915	[BDCE] Add comments. NFC llvm-svn: 305739	2017-06-19 20:10:41 +00:00
Max Kazantsev	35b2a18eb9	[SCEV] Teach SCEVExpander to expand BinPow Current implementation of SCEVExpander demonstrates a very naive behavior when it deals with power calculation. For example, a SCEV for x^8 looks like (x * x * x * x * x * x * x * x) If we try to expand it, it generates a very straightforward sequence of muls, like: x2 = mul x, x x3 = mul x2, x x4 = mul x3, x ... x8 = mul x7, x This is a non-efficient way of doing that. A better way is to generate a sequence of binary power calculation. In this case the expanded calculation will look like: x2 = mul x, x x4 = mul x2, x2 x8 = mul x4, x4 In some cases the code size reduction for such SCEVs is dramatic. If we had a loop: x = a; for (int i = 0; i < 3; i++) x = x * x; And this loop have been fully unrolled, we have something like: x = a; x2 = x * x; x4 = x2 * x2; x8 = x4 * x4; The SCEV for x8 is the same as in example above, and if we for some reason want to expand it, we will generate naively 7 multiplications instead of 3. The BinPow expansion algorithm here allows to keep code size reasonable. This patch teaches SCEV Expander to generate a sequence of BinPow multiplications if we have repeating arguments in SCEVMulExpressions. Differential Revision: https://reviews.llvm.org/D34025 llvm-svn: 305663	2017-06-19 06:24:53 +00:00
Alexander Timofeev	0f9c84cd93	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Craig Topper	587525468d	[BasicAA] Don't call isKnownNonEqual if we might be have gone through a PHINode. This is a fix for the test case in PR32314. Basic Alias Analysis can ask if two nodes are known non-equal after looking through a phi node to find a GEP. isAddOfNonZero saw an add of a constant from the same phi and said that its output couldn't be equal. But Basic Alias Analysis was really asking about the value from the previous loop iteration. This patch at least makes that case not happen anymore, I'm not sure if there were still other ways this can fail. As was discussed in the bug, it looks like fixing BasicAA would be difficult so this patch seemed like a possible workaround Differential Revision: https://reviews.llvm.org/D33136 llvm-svn: 305481	2017-06-15 17:16:56 +00:00
Max Kazantsev	dc80366d52	[ScalarEvolution] Apply Depth limit to getMulExpr This is a fix for PR33292 that shows a case of extremely long compilation of a single .c file with clang, with most time spent within SCEV. We have a mechanism of limiting recursion depth for getAddExpr to avoid long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr and back that do not propagate the info about depth. As result of this, a chain getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr can be extremely long, with every segment of getAddExpr's being up to max depth long. This leads either to long compilation or crash by stack overflow. We face this situation while analyzing big SCEVs in the test of PR33292. This patch applies the same limit on max expression depth for getAddExpr and getMulExpr. Differential Revision: https://reviews.llvm.org/D33984 llvm-svn: 305463	2017-06-15 11:48:21 +00:00
Craig Topper	f93b7b1c1f	[ValueTracking] Correct early out in computeKnownBitsFromOperator to work with non power of 2 bit widths There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask. This patch corrects this by rounding up to a power of 2 before doing the subtract and mask. Differential Revision: https://reviews.llvm.org/D34165 llvm-svn: 305400	2017-06-14 17:04:59 +00:00
Simon Pilgrim	7ce9926ce4	Strip UTF8 BOM that got added for some reason in rL305163 llvm-svn: 305282	2017-06-13 09:58:27 +00:00
Sanjay Patel	2ad88f81f0	fix typos/formatting; NFC llvm-svn: 305243	2017-06-12 22:34:37 +00:00
Yaron Keren	7d46392124	Address http://bugs.llvm.org/pr32207 by making BannerPrinted local to runOnSCC and skipping banner for function declarations. Reviewed By: Mehdi AMINI Differential Revision: https://reviews.llvm.org/D34086 llvm-svn: 305179	2017-06-12 02:18:50 +00:00
Simon Pilgrim	516938452f	Fix unused variable warning on non-debug EXPENSIVE_CHECKS builds llvm-svn: 305163	2017-06-11 12:49:29 +00:00
Davide Italiano	83122058cf	[MemorySSA] preservesAll() implies preserves<MemorySSA>(). NFCI. llvm-svn: 305160	2017-06-11 01:05:45 +00:00
Andrew Kaylor	647025f9e1	[InstSimplify] Don't constant fold or DCE calls that are marked nobuiltin Differential Revision: https://reviews.llvm.org/D33737 llvm-svn: 305132	2017-06-09 23:18:11 +00:00
Craig Topper	7ad13f259f	[LVI] Fix spelling error in comment. NFC llvm-svn: 305115	2017-06-09 21:21:17 +00:00
Craig Topper	6dd9dcf26e	[LVI] Const correct and rename the LVILatticeVal parameter to getPredicateResult. NFC Previously it was non-const reference named Result which would tend to make someone think that it was an outparam when really its an input. llvm-svn: 305114	2017-06-09 21:18:16 +00:00
Craig Topper	31ce4ec2fd	[LazyValueInfo] Don't run the more complex predicate handling code for EQ and NE in getPredicateResult Summary: Unless I'm mistaken, the special handling for EQ/NE should cover everything and there is no reason to fallthrough to the more complex code. For that matter I'm not sure there's any reason to special case EQ/NE other than avoiding creating temporary ConstantRanges. This patch moves the complex code into an else so we only do it when we are handling a predicate other than EQ/NE. Reviewers: anna, reames, resistor, Farhana Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34000 llvm-svn: 305086	2017-06-09 16:16:20 +00:00
Sanjay Patel	fef83e8fb9	[ValueTracking] fix typo; NFC llvm-svn: 305080	2017-06-09 14:21:18 +00:00
Peter Collingbourne	e357fbd243	Write summaries for merged modules when splitting modules for ThinLTO. This is to prepare to allow for dead stripping of globals in the merged modules. Differential Revision: https://reviews.llvm.org/D33921 llvm-svn: 305027	2017-06-08 23:01:49 +00:00
Craig Topper	db52809e77	[LazyValueInfo] Make LVILatticeVal intersect method take arguments by reference so we don't copy ConstantRanges unless we need to. llvm-svn: 304990	2017-06-08 17:08:58 +00:00
John Brawn	da4a68a1d2	[BPI] Don't assume that strcmp returning >0 is more likely than <0 The zero heuristic assumes that integers are more likely positive than negative, but this also has the effect of assuming that strcmp return values are more likely positive than negative. Given that for nonzero strcmp return values it's the ordering of arguments that determines the sign of the result there's no reason to assume that's true. Fix this by inspecting the LHS of the compare and using TargetLibraryInfo to decide if it's strcmp-like, and if so only assume that nonzero is more likely than zero i.e. strings are more often different than the same. This causes a slight code generation change in the spec2006 benchmark 403.gcc, but with no noticeable performance impact. The intent of this patch is to allow better optimisation of dhrystone on Cortex-M cpus, but currently it won't as there are also some changes that need to be made to if-conversion. Differential Revision: https://reviews.llvm.org/D33934 llvm-svn: 304970	2017-06-08 09:44:40 +00:00
David Blaikie	7a9b788830	GlobalsModRef: Ensure optnone+readonly/readnone attributes are respected llvm-svn: 304945	2017-06-07 21:37:39 +00:00
Alina Sbirlea	33e5872367	[mssa] Fix case when there is no definition in a block prior to an inserted use. Summary: Check that the first access before one being tested is valid. Before this patch, if there was no definition prior to the Use being tested, the first time Iter was deferenced, it hit the sentinel. Reviewers: dberlin, gbiv Subscribers: sanjoy, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D33950 llvm-svn: 304926	2017-06-07 16:46:53 +00:00
Craig Topper	73ba1c84be	[InstCombine][InstSimplify] Use APInt::isNullValue/isOneValue to reduce compiled code for comparing APInts with 0 and 1. NFC These methods are specifically optimized to only counting leading zeros without an additional uint64_t compare. llvm-svn: 304876	2017-06-07 07:40:37 +00:00
NAKAMURA Takumi	92c99cd6dc	Update libdeps to add BinaryFormat, introduced in r304864. llvm-svn: 304869	2017-06-07 04:48:49 +00:00
NAKAMURA Takumi	ef9d9481b5	Reorder and reformat. llvm-svn: 304868	2017-06-07 04:48:45 +00:00
Craig Topper	7945248267	[LazyValueInfo] Remove redundant calls to ConstantRange::contains. The same exact call was made in the if above and we already know it returned true. NFC llvm-svn: 304857	2017-06-07 00:58:09 +00:00
Davide Italiano	c88f2c712f	[CFLAA] Remove unused include. NFCI. llvm-svn: 304842	2017-06-06 23:16:19 +00:00
David Blaikie	c662b50150	GlobalsModRef+OptNone: Don't prove readnone/other properties from an optnone function Seems like at least one reasonable interpretation of optnone is that the optimizer never "looks inside" a function. This fix is consistent with that interpretation. Specifically this came up in the situation: f3 calls f2 calls f1 f2 is always_inline f1 is optnone The application of readnone to f1 (& thus to f2) caused the inliner to kill the call to f2 as being trivially dead (without even checking the cost function, as it happens - not sure if that's also a bug). llvm-svn: 304833	2017-06-06 20:51:15 +00:00
Anna Thomas	4acfc7e16e	[LVI Printer] Rely on the LVI analysis functions rather than the LVI cache Summary: LVIPrinter pass was previously relying on the LVICache. We now directly call the the LVI functions which solves the value if the LVI information is not already available in the cache. This has 2 benefits over the printing of LVI cache: 1. higher coverage (i.e. catches errors) in LVI code when cache value is invalidated. 2. relies on the core functions, and not dependent on the LVI cache (which may be scrapped at some point). It would still catch any cache invalidation errors, since we first go through the cache. Reviewers: reames, dberlin, sanjoy Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32135 llvm-svn: 304819	2017-06-06 19:25:31 +00:00
Anna Thomas	b2a212c070	[Atomics][LoopIdiom] Recognize unordered atomic memcpy Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806	2017-06-06 16:45:25 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Joey Gouly	61eaa63b65	[InstSimplify] Constant fold the new GEP in SimplifyGEPInst. llvm-svn: 304784	2017-06-06 10:17:14 +00:00
Craig Topper	aa9a24bd8b	[InstSimplify] Remove some redundant code from InstSimplify now that llvm::isKnownNonEqual handles vectors. isKnownNonEqual is called a little earlier in this function and can handle the case that we were checking here as well as more complex cases. llvm-svn: 304775	2017-06-06 07:13:17 +00:00
Craig Topper	3002d5b0bf	[ValueTracking] Remove scalar only restriction from isKnownNonEqual. The computeKnownBits and isKnownNonZero calls this code relies on should work fine for vectors. This will be used by another commit to remove some code from InstSimplify that is redundant for scalars, but was needed for vectors due to this issue. llvm-svn: 304774	2017-06-06 07:13:15 +00:00
Craig Topper	2dfb4804f2	[InstSimplify] Use the getTrue/getFalse helpers and make sure we use the computed result type instead of hardcoding to i1. NFC Currently, isKnownNonEqual punts on vectors so the hardcoding to i1 doesn't matter. But I plan to fix that in a future patch. llvm-svn: 304773	2017-06-06 07:13:13 +00:00
Craig Topper	8e662f7f81	[ValueTracking] Use the computeKnownBits version that returns a KnownBits object instead of taking one by reference. NFC llvm-svn: 304772	2017-06-06 07:13:11 +00:00
Craig Topper	8365df825e	[ValueTracking] Use APInt::intersects to avoid some temporary APInts. NFC llvm-svn: 304771	2017-06-06 07:13:09 +00:00
Craig Topper	c2790ecda8	[InstSimplify] Use ICmpInst::isEquality predicate method. NFC llvm-svn: 304770	2017-06-06 07:13:04 +00:00
Evgeny Stupachenko	f2b3b467e5	Fix PR23384 (part 2 of 3) NFC Summary: The patch moves LSR cost comparison to target part. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30561 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304750	2017-06-05 23:37:00 +00:00
Craig Topper	da8037f299	[InstSimplify] Use llvm::all_of instead of a manual loop. NFC llvm-svn: 304692	2017-06-04 22:41:56 +00:00
Craig Topper	d470d73c2d	[ConstantFolding] Combine an if statement into an earlier one that checked the same condition. NFC llvm-svn: 304681	2017-06-04 08:21:53 +00:00
Craig Topper	0dd29e2256	[ConstantFolding][X86] Replace an LLVM_FALLTHROUGH with a break because it really shouldn't fallthrough. This is actually NFC because the next case starts with the same if statement as this case did. So the result will be the same and it will fallthrough to the end of the switch. But there's no reason to rely on that so we should just break. llvm-svn: 304680	2017-06-04 08:21:51 +00:00
Craig Topper	fe9ad82e44	[ConstantFolding] Properly support constant folding of vector powi intrinsic. The second argument is not a vector so needs special treatment. llvm-svn: 304679	2017-06-04 07:30:28 +00:00
Craig Topper	7c553edced	[ConstantFolding] Fix constant folding for vector cttz and ctlz intrinsics to understand that the second argument is still a scalar. llvm-svn: 304668	2017-06-03 18:50:29 +00:00
Craig Topper	a803d5b8b0	[LazyValueInfo] Use Type::getIntegerBitWidth instead of casting to IntegerType to call getBitWidth. NFC llvm-svn: 304656	2017-06-03 07:47:14 +00:00
Craig Topper	0e5f1093ee	[LazyValueInfo] Make solveBlockValueCast take a CastInst* instead of Instruction*. Makes getOpcode return the appropriate enum without a cast. NFC llvm-svn: 304655	2017-06-03 07:47:08 +00:00
Jun Bum Lim	2960d41e68	[InlineCost] Enable the new switch cost heuristic Summary: This is to enable the new switch inline cost heuristic (r301649) by removing the old heuristic as well as the flag itself. In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and 8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64. No significant code size / performance regression was found in O3/O2/Os. No significant complain was reported from the llvm-dev thread. Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo Reviewed By: echristo Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D32653 llvm-svn: 304594	2017-06-02 20:42:54 +00:00
Craig Topper	9277a86f03	[LazyValueInfo] Fix formatting NFC. llvm-svn: 304567	2017-06-02 17:28:12 +00:00
Craig Topper	3778c8943b	[LazyValueInfo] Make solveBlockValueBinaryOp take a BinaryOperator* instead of Instruction*. This removes a cast of getOpcode to BinaryOps. llvm-svn: 304563	2017-06-02 16:33:13 +00:00
Craig Topper	84a9f168f1	[LazyValueInfo] Fix typo in comment. NFC llvm-svn: 304560	2017-06-02 16:21:13 +00:00
Craig Topper	b23e7c78a5	[InstSimplify][ConstantFolding] Teach constant folding how to handle icmp null, (inttoptr x) as well as it handles icmp (inttoptr x), null Summary: The constant folding code currently assumes that the constant expression will always be on the left and the simple null will be on the right. But that's not true at least on the path from InstSimplify. This patch adds support to ConstantFolding to detect the reversed case. Reviewers: spatel, dberlin, majnemer, davide, joey Reviewed By: joey Subscribers: joey, llvm-commits Differential Revision: https://reviews.llvm.org/D33801 llvm-svn: 304559	2017-06-02 16:17:32 +00:00
Benjamin Kramer	c1f5ae236c	[OrderedBasicBlock] Return false for comesBefore(A, A) So far it would return true for the first uncached query, then cached queries return false. llvm-svn: 304545	2017-06-02 13:10:31 +00:00
Eli Friedman	0d823d610d	Add opt-bisect support for region passes. This is necessary to get opt-bisect working with polly. Differential Revision: https://reviews.llvm.org/D33751 llvm-svn: 304476	2017-06-01 21:22:26 +00:00
Teresa Johnson	596b2e7ab2	[PGO] Adjust indirect call promotion threshold Summary: Reduce min percent required for indirect call promotion from 33% to 30%, which matches gcc's threshold and catches the same hot opportunities. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33798 llvm-svn: 304469	2017-06-01 21:10:10 +00:00
Evgeniy Stepanov	56584bbf16	(NFC) Track global summary liveness in GVFlags. Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of all the DeadSymbols sets. This is refactoring in order to make liveness information available in the RegularLTO pipeline. llvm-svn: 304466	2017-06-01 20:30:06 +00:00
Reid Kleckner	fc7ba565ed	[EH] Recognize __(gxx\|gcc)_personality_seh0 as the GNU EH personalities These are no-ops when there are no invokes. We don't need to emit LSDAs for them. Fixes PR33220. llvm-svn: 304367	2017-05-31 22:35:52 +00:00
Galina Kistanova	244621faad	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304361	2017-05-31 22:16:24 +00:00
Galina Kistanova	8514dd540d	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304358	2017-05-31 22:09:46 +00:00
Galina Kistanova	0b69e363f6	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304356	2017-05-31 22:02:05 +00:00
Galina Kistanova	c2b642d009	Added missing break; added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304340	2017-05-31 20:25:13 +00:00
Zaara Syeda	3a7578c658	[PPC] Inline expansion of memcmp This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313	2017-05-31 17:12:38 +00:00
George Burgess IV	0a7b989036	[CFLAA] Add missing break; note things are broken. Thanks to Galina Kistanova for finding the missing break! When trying to make a test for this, I realized our logic for handling extractvalue/insertvalue/... is somewhat broken. This makes constructing a test-case for this missing break nontrivial. llvm-svn: 304275	2017-05-31 02:35:26 +00:00
Daniel Berlin	71ff663e1b	InstructionSimplify: Remove now-redundant reachability tests, as dominates() already does them llvm-svn: 304270	2017-05-31 01:47:24 +00:00
Max Kazantsev	d8fe3eb9cb	[SCEV][NFC] Remove redundant params from isAvailableAtLoopEntry Params DT and LI are redundant, because these values are contained in fields anyways. Differential Revision: https://reviews.llvm.org/D33668 llvm-svn: 304204	2017-05-30 10:54:58 +00:00
Tobias Grosser	e3684d0b84	[SCEV] Assume parameters coming from function calls contain IVs The optimistic delinearization implemented in LLVM detects array sizes by looking for non-linear products between parameters and induction variables. In OpenCL code, such products often look like: A[get_global_id(0) * N + get_global_id(1)] Hence, the IV is hidden in the get_global_id() call and consequently delinearization would fail as no induction variable is available that helps us to identify N as array size parameter. We now use a very simple heuristic to change this. We assume that each parameter that comes directly from a function call is a hidden induction variable. As a result, we can delinearize the access above to: A[get_global_id(0)][get_global_id(1] llvm-svn: 304073	2017-05-27 15:17:49 +00:00
Keno Fischer	090f1959c1	[SCEVExpander] Try harder to avoid introducing inttoptr Summary: This fixes introduction of an incorrect inttoptr/ptrtoint pair in the included test case which makes use of non-integral pointers. I suspect there are more cases like this left, but this takes care of the one I was seeing at the moment. Reviewers: sanjoy Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33129 llvm-svn: 304058	2017-05-27 03:22:55 +00:00
Craig Topper	348314dfb8	[InstSimplify] Push commuted op checks for and/or of icmp further down to avoid duplicate work Previously, we called simplifyPossiblyCastedAndOrOfICmps twice with the operands commuted, but the call to simplifyAndOrOfICmpsWithConstants further down already handles commuting and doesn't need to be called both ways. This patch pushes double calls further down to just the individual routines that need to be called twice. Differential Revision: https://reviews.llvm.org/D33603 llvm-svn: 304044	2017-05-26 22:42:34 +00:00
Craig Topper	9bce1ad232	[InstSimplify] Move a variable declaration to make simplifyAndOfICmps look more like simplifyOrOfICmps. NFC llvm-svn: 304023	2017-05-26 19:04:02 +00:00
Craig Topper	c8bebb1e84	[InstSimplify] Use commutable matchers to shorten some code This code was replicated two additional times to handle commuted cases, but I think a commutable matcher can take care of it. Differential Revision: https://reviews.llvm.org/D33585 llvm-svn: 304022	2017-05-26 19:03:59 +00:00
Craig Topper	1da22c3244	[InstSimplify] Use m_APInt instead of m_ConstantInt in ((V + N) & C1) \| (V & C2) handling in order to support splat vectors. The tests here are have operands commuted to provide more coverage. I also commuted one of the instructions in the scalar tests so the 4 tests cover the 4 commuted variations Differential Revision: https://reviews.llvm.org/D33599 llvm-svn: 304021	2017-05-26 19:03:53 +00:00
Max Kazantsev	41450329f7	Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start" The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on many non-X86 configs with this patch. The reason of this is that the patch makes a correctless fix that changes optimizer's behavior for this test. Without the change, LSR was making an overconfident simplification basing on a wrong SCEV. Apparently it did not need the IV analysis to do this. With the change, it chose a different way to simplify (that wasn't so confident), and this way required the IV analysis. Now, following the right execution path, LSR tries to make a transformation relying on IV Users analysis. This analysis is target-dependent due to this code: // LSR is not APInt clean, do not touch integers bigger than 64-bits. // Also avoid creating IVs of non-native types. For example, we don't want a // 64-bit IV in 32-bit code just because the loop has one 64-bit cast. uint64_t Width = SE->getTypeSizeInBits(I->getType()); if (Width > 64 \|\| !DL.isLegalInteger(Width)) return false; To make a proper transformation in this test case, the type i32 needs to be legal for the specified data layout. When the test runs on some non-X86 configuration (e.g. pure ARM 64), opt gets confused by the specified target and does not use it, rejecting the specified data layout as well. Instead, it uses some default layout that does not treat i32 as a legal type (currently the layout that is used when it is not specified does not have legal types at all). As result, the transformation we expect to happen does not happen for this test. This re-enabling patch does not have any source code changes compared to the original patch rL303730. The only difference is that the failing test is moved to X86 directory and now has requirement of running on x86 only to comply with the specified target triple and data layout. Differential Revision: https://reviews.llvm.org/D33543 llvm-svn: 303971	2017-05-26 06:47:04 +00:00
Craig Topper	25d9ba9a12	[InstSimplify] Use APInt::isMask isntead of manually implementing it. NFC llvm-svn: 303968	2017-05-26 05:16:22 +00:00
Craig Topper	50500d5054	[InstSimplify] Use m_ConstantInt matchers to short some code. NFC llvm-svn: 303967	2017-05-26 05:16:20 +00:00
Chandler Carruth	29c22d2835	[LegacyPM] Make the 'addLoop' method accept a loop to add rather than having it internally allocate the loop. This is a much more flexible API and necessary in the new loop unswitch to reasonably support both new and old PMs in common code. It also just seems like a cleaner separation of concerns. NFC, this should just be a pure refactoring. Differential Revision: https://reviews.llvm.org/D33528 llvm-svn: 303834	2017-05-25 03:01:31 +00:00
Craig Topper	77e07cc010	[InstSimplify] Simplify uadd/sadd/umul/smul with overflow intrinsics when the Zero or Undef is on the LHS. Summary: This code was migrated from InstCombine a few years ago. InstCombine had nearby code that would move Constants to the RHS for these, but InstSimplify doesn't have such code on this path. Reviewers: spatel, majnemer, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33473 llvm-svn: 303774	2017-05-24 17:05:28 +00:00
Craig Topper	8205a1a9b6	[ValueTracking] Convert most of the calls to computeKnownBits to use the version that returns the KnownBits object. This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits. Differential Revision: https://reviews.llvm.org/D33431 llvm-svn: 303773	2017-05-24 16:53:07 +00:00
Craig Topper	a2025eaaef	[ValueTracking] Add OptimizationRemarkEmitter to the other signature for commuteKnownBits. This is needed for an upcoming patch. llvm-svn: 303772	2017-05-24 16:53:03 +00:00
Diana Picus	183863fc3b	Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start" This reverts commit r303730 because it broke all the buildbots. llvm-svn: 303747	2017-05-24 14:16:04 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Max Kazantsev	13e016bf48	[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that the loop of our base recurrency is the bottom-lost in terms of domination. This assumption may be broken by an expression which is treated as invariant, and which depends on a complex Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence. Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike other SCEVs, SCEVUnknown are sometimes position-bound. For example, here: for (...) { // loop phi = {A,+,B} } X = load ... Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment). It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop> may be existant. This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr, if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer expect such behavior. llvm-svn: 303730	2017-05-24 08:52:18 +00:00
Tim Northover	997f5f10c6	InstructionSimplify: don't speculate about Constants changing. When presented with an icmp/select pair, we can end up asking what would happen if we replaced one constant with another in an instruction. This is a mistake, while non-constant Values could become a constant, constants cannot change and trying to do so can lead to completely invalid IR (a GEP referencing a non-existant field in the original case). llvm-svn: 303580	2017-05-22 21:28:08 +00:00
Sanjoy Das	036dda25a5	[SCEV] Clarify behavior around max backedge taken count This is a re-application of a r303497 that was reverted in r303498. I thought it had broken a bot when it had not (the breakage did not go away with the revert). This change makes the split between the "exact" backedge taken count and the "maximum" backedge taken count a bit more obvious. Both of these are upper bounds on the number of times the loop header executes (since SCEV does not account for most kinds of abnormal control flow), but the latter is guaranteed to be a constant. There were a few places where the max backedge taken count was a non-constant; I've changed those to compute constants instead. At this point, I'm not sure if the constant max backedge count can be computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without losing precision. If it can, we can simplify even further by making `getMaxBackedgeTakenCount` a thin wrapper around `getBackedgeTakenCount` and `getUnsignedRange`. llvm-svn: 303531	2017-05-22 06:46:04 +00:00
Sanjoy Das	8963650cfa	Revert "[SCEV] Clarify behavior around max backedge taken count" This reverts commit r303497 since it breaks the msan bootstrap bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1379/ llvm-svn: 303498	2017-05-21 05:02:12 +00:00
Sanjoy Das	5207168383	[SCEV] Clarify behavior around max backedge taken count This change makes the split between the "exact" backedge taken count and the "maximum" backedge taken count a bit more obvious. Both of these are upper bounds on the number of times the loop header executes (since SCEV does not account for most kinds of abnormal control flow), but the latter is guaranteed to be a constant. There were a few places where the max backedge taken count was a non-constant; I've changed those to compute constants instead. At this point, I'm not sure if the constant max backedge count can be computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without losing precision. If it can, we can simplify even further by making `getMaxBackedgeTakenCount` a thin wrapper around `getBackedgeTakenCount` and `getUnsignedRange`. llvm-svn: 303497	2017-05-21 01:47:50 +00:00
Xin Tong	9fbfeefadf	Revert "Add pthread_self function prototype and make it speculatable." This reverts commit 143d7445b5dfa2f6d6c45bdbe0433d9fc531be21. Build breaking llvm-svn: 303496	2017-05-21 00:37:55 +00:00
Xin Tong	75af3af957	Add pthread_self function prototype and make it speculatable. Summary: This allows pthread_self to be pulled out of a loop by LICM. Reviewers: hfinkel, arsenm, davide Reviewed By: davide Subscribers: davide, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D32782 llvm-svn: 303495	2017-05-20 22:40:25 +00:00
Matthias Braun	57fd12db0c	Fix breakage after r303461 - Improve wchar_t size predicitions based on target triple. - Be less strict in wchar_t size verifier. llvm-svn: 303477	2017-05-20 01:28:52 +00:00
Matthias Braun	50ec0b5dce	SimplifyLibCalls: Optimize wcslen Refactor the strlen optimization code to work for both strlen and wcslen. This especially helps with programs in the wild where people pass L"string"s to const std::wstring& function parameters and the wstring constructor gets inlined. This also fixes a lingerind API problem/bug in getConstantStringInfo() where zeroinitializers would always give you an empty string (without a length) back regardless of the actual length of the initializer which did not work well in the TrimAtNul==false causing the PR mentioned below. Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG memcpy lowering and may lead to some cases for out-of-bounds zeroinitializer accesses not getting optimized anymore. So some code with UB may produce out of bound memory reads now instead of just producing zeros. The refactoring "accidentally" fixes http://llvm.org/PR32124 Differential Revision: https://reviews.llvm.org/D32839 llvm-svn: 303461	2017-05-19 22:37:09 +00:00
Daniel Berlin	a5130bbd12	BasicAA: Uninserted instructions have no parent, and notDifferentParent explicitly allows for this case, but getParent crashes when handed one. llvm-svn: 303442	2017-05-19 19:01:21 +00:00
Craig Topper	9c913bfd49	[InstSimplify] Fix 80 column violation. NFC llvm-svn: 303433	2017-05-19 16:56:53 +00:00
Reid Kleckner	96ab8726a3	[IR] De-virtualize ~Value to save a vptr Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362	2017-05-18 17:24:10 +00:00
Max Kazantsev	627ad0fec3	[SCEV][NFC] Remove duplication of isLoopInvariant code Replace two places that duplicate the code of isLoopInvariant method with the invocation of this method. Differential Revision: https://reviews.llvm.org/D33313 llvm-svn: 303336	2017-05-18 08:26:41 +00:00
Serguei Katkov	ba831f78fd	[BPI] Reduce the probability of unreachable edge to minimal value greater than 0 The probability of edge coming to unreachable block should be as low as possible. The change reduces the probability to minimal value greater than zero. The bug https://bugs.llvm.org/show_bug.cgi?id=32214 show the example when the probability of edge coming to unreachable block is greater than for edge coming to out of the loop and it causes incorrect loop rotation. Please note that with this change the behavior of unreachable heuristic is a bit different than others. Specifically, before this change the sum of probabilities coming to unreachable blocks have the same weight for all branches (it was just split over all edges of this block coming to unreachable blocks). With this change it might be slightly different but not to much due to probability of taken branch to unreachable block is really small. Reviewers: chandlerc, sanjoy, vsk, congh, junbuml, davidxl, dexonsmith Reviewed By: chandlerc, dexonsmith Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D30633 llvm-svn: 303327	2017-05-18 06:11:56 +00:00
Craig Topper	8a950275f7	[Statistics] Add a method to atomically update a statistic that contains a maximum Summary: There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways: MaxNumFoo = std::max(MaxNumFoo, NumFoo); or MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo; The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare. But we have no way of knowing if the value was changed by another thread between the reads and the writes. This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again. This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ Reviewers: dberlin, chandlerc, hfinkel, dblaikie Reviewed By: chandlerc Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D33301 llvm-svn: 303318	2017-05-18 00:51:39 +00:00
Sanjay Patel	e2787b9a35	[InstSimplify] handle all icmp i1 X, C in one place; NFCI We already handled all of the new tests identically, but several of those went through a lot of unnecessary processing before getting folded. Another motivation for grouping these cases together is that InstCombine needs a similar fold. Currently, it handles the 'not' cases inefficiently which can lead to bugs as described in the post-commit comments of: https://reviews.llvm.org/D32143 llvm-svn: 303295	2017-05-17 20:27:55 +00:00
Max Kazantsev	4c7f293d24	[SCEV] Always sort AddRecExprs from different loops by dominance Sorting of AddRecExprs by loop nesting does not make sense since we only invoke the CompareSCEVComplexity for AddRecExprs that are used by one SCEV. This guarantees that there is always a dominance relationship between them. This patch removes the sorting by nesting which is a dead code in current usage of this function. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33228 llvm-svn: 303235	2017-05-17 04:09:14 +00:00
Max Kazantsev	b67d344850	[SCEV][NFC] Replace redundant dyn_cast with cast in getAddExpr Replace dyn_cast which is ensured by isa just one line above with cast. Differential Revision: https://reviews.llvm.org/D33231 llvm-svn: 303234	2017-05-17 03:58:42 +00:00
Francis Visoiu Mistrih	b52e036600	BitVector: add iterators for set bits Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227	2017-05-17 01:07:53 +00:00
Sanjay Patel	877364ff99	[InstSimplify] add folds for constant mask of value shifted by constant We would eventually catch these via demanded bits and computing known bits in InstCombine, but I think it's better to handle the simple cases as soon as possible as a matter of efficiency. This fold allows further simplifications based on distributed ops transforms. eg: %a = lshr i8 %x, 7 %b = or i8 %a, 2 %c = and i8 %b, 1 InstSimplify can directly fold this now: %a = lshr i8 %x, 7 Differential Revision: https://reviews.llvm.org/D33221 llvm-svn: 303213	2017-05-16 21:51:04 +00:00
Easwaran Raman	3cd1479c3f	[Inliner] Do not mix callsite and callee hotness based updates. Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210	2017-05-16 21:18:09 +00:00
Easwaran Raman	dadc0f11ad	Add hasProfileSummary and has{Sample\|Instrumentation}Profile methods ProfileSummaryInfo already checks whether the module has sample profile in determining profile counts. This will also be useful in inliner to clean up threshold updates. llvm-svn: 303204	2017-05-16 20:14:39 +00:00
Max Kazantsev	b09b5db793	[SCEV] Fix sorting order for AddRecExprs The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs by loop depth, but does not pay attention to dominance of loops. This can lead us to the following buggy situation: for (...) { // loop1 op1 = {A,+,B} } for (...) { // loop2 op2 = {A,+,B} S = add op1, op2 } In this case there is no guarantee that in operand list of S the op2 comes before op1 (loop depth is the same, so they will be sorted just lexicographically), so we can incorrectly treat S as a recurrence of loop1, which is wrong. This patch changes the sorting logic so that it places the dominated recs before the dominating recs. This ensures that when we pick the first recurrency in the operands order, it will be the bottom-most in terms of domination tree. The attached test set includes some tests that produce incorrect SCEV estimations and crashes with oldlogic. Reviewers: sanjoy, reames, apilipenko, anna Reviewed By: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33121 llvm-svn: 303148	2017-05-16 07:27:06 +00:00
Peter Collingbourne	6f0ecca3b5	IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134	2017-05-16 00:39:01 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Sanjay Patel	a23b141cd2	[InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949) These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving: https://bugs.llvm.org/show_bug.cgi?id=9343 As shown here: http://rise4fun.com/Alive/C8 ...however, the sdiv exact case needs a stronger predicate. I opted for duplicated code instead of adding another fallthrough because I think that's easier to read (and edit in case we need/want to restrict/loosen the predicates any more). This should fix: https://bugs.llvm.org/show_bug.cgi?id=32949 https://bugs.llvm.org/show_bug.cgi?id=32948 Differential Revision: https://reviews.llvm.org/D32954 llvm-svn: 303104	2017-05-15 19:16:49 +00:00
Craig Topper	716cad8bb7	[SCEV] Use copy initialization of APInts instead of direct initialization. This is based on post commit feed back from r302769. llvm-svn: 303092	2017-05-15 18:14:16 +00:00
Craig Topper	1a36b7d836	[ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits. This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035	2017-05-15 06:39:41 +00:00
Sanjoy Das	f6f6fb903e	Move some code into ScalarEvolution.cpp; NFC I need to add some asserts to these constructors that are easier to add once they're in the .cpp file. llvm-svn: 303032	2017-05-15 04:22:09 +00:00
Craig Topper	bb9737247a	[InstCombine] Merge duplicate functionality between InstCombine and ValueTracking Summary: Merge overflow computation for signed add, appearing both in InstCombine and ValueTracking. As part of the merge, cleanup the interface for overflow checks in InstCombine. Patch by Yoav Ben-Shalom. Reviewers: craig.topper, majnemer Reviewed By: craig.topper Subscribers: takuto.ikuta, llvm-commits Differential Revision: https://reviews.llvm.org/D32946 llvm-svn: 303029	2017-05-15 02:44:08 +00:00
Craig Topper	479daaf74c	[InstSimplify] Add patterns for folding (A & B) \| (~A ^ B) -> (~A ^ B) and its commuted variants. We already had (A & ~B) \| (A ^ B), but we missed the cases where the not was part of the xor. llvm-svn: 303004	2017-05-14 07:54:43 +00:00
Craig Topper	dfc8955ee6	[BasicAA] Alphabetize includes. NFC llvm-svn: 303002	2017-05-14 06:18:34 +00:00
Craig Topper	9fe357971c	[ValueTracking] Remove const_casts on several calls to computeKnownBits and ComputeSignBit. NFC llvm-svn: 302991	2017-05-13 17:22:16 +00:00
Andrew Kaylor	b01e94ee8d	[TLI] Add mapping for various '__<func>_finite' forms of the math routines to SVML routines Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31789 llvm-svn: 302957	2017-05-12 22:11:26 +00:00
Andrew Kaylor	f7c864f89c	[ConstantFolding] Add folding for various math '__<func>_finite' routines generated from -ffast-math Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31788 llvm-svn: 302956	2017-05-12 22:11:20 +00:00
Andrew Kaylor	3cd8c16d7f	[TLI] Add declarations for various math header file routines from math-finite.h that create '__<func>_finite as functions Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31787 llvm-svn: 302955	2017-05-12 22:11:12 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Serguei Katkov	63c9c81152	[BPI] Ignore remainder while distributing the remaining probability from unreachanble This is a follow up patch for https://reviews.llvm.org/rL300440 to address a comment. To make implementation to be consistent with other cases we just ignore the remainder after distribution of remaining probability between reachable edges. If we reduced the probability of some edges coming to unreachable blocks we should distribute the remaining part across other edges coming to reachable blocks to satisfy the condition that sum of all probabilities should be equal to one. If this remaining part is not divided by number of "reachable" edges then we get this remainder. This remainder probability should be pretty small. Other cases just ignore if the sum of probabilities is not equal to one so we do the same. Reviewers: chandlerc, sanjoy, vsk, junbuml, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D32124 llvm-svn: 302883	2017-05-12 07:50:06 +00:00
Peter Collingbourne	f3e9f12296	CallGraph: Remove almost-unused field 'Root'. llvm-svn: 302852	2017-05-11 23:59:05 +00:00
Teresa Johnson	2a6b7991d4	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844	2017-05-11 23:18:05 +00:00
Easwaran Raman	c103ef89ee	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Craig Topper	e3e1a35f68	[SCEV] Reduce possible APInt allocations a bit. llvm-svn: 302769	2017-05-11 06:48:54 +00:00
Craig Topper	6694a4e6d6	[SCEV] Remove unneeded 'using namespace APIntOps'. llvm-svn: 302768	2017-05-11 06:48:51 +00:00
Teresa Johnson	94624aca2a	Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builder This fixes a ubsan bot failure after r302597, which made getProfileCount non-static, but ended up invoking it on a null ProfileSummaryInfo object in some cases from buildModuleSummaryIndex. Most testing passed because the non-static getProfileCount currently doesn't access any member variables, but I found this when testing a follow on patch (D32877) that adds a member variable access. llvm-svn: 302705	2017-05-10 18:52:16 +00:00
Amara Emerson	836b0f48c1	Add a late IR expansion pass for the experimental reduction intrinsics. This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631	2017-05-10 09:42:49 +00:00
Easwaran Raman	f5f9160072	[ProfileSummary] Make getProfileCount a non-static member function. This change is required because the notion of count is different for sample profiling and getProfileCount will need to determine the underlying profile type. Differential revision: https://reviews.llvm.org/D33012 llvm-svn: 302597	2017-05-09 23:21:10 +00:00
Amara Emerson	cf9daa33a7	Introduce experimental generic intrinsics for horizontal vector reductions. - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514	2017-05-09 10:43:25 +00:00
Craig Topper	ef869ecf0e	[SCEV] Don't use std::move on both inputs to APInt::operator+ or operator-. It might be confusing to the reader. NFC llvm-svn: 302448	2017-05-08 17:39:01 +00:00
Craig Topper	868813ffbb	[ValueTracking] Use KnownOnes to provide a better bound on known zeros for ctlz/cttz intrinics This patch uses KnownOnes of the input of ctlz/cttz to bound the value that can be returned from these intrinsics. This makes these intrinsics more similar to the handling for ctpop which already uses known bits to produce a similar bound. Differential Revision: https://reviews.llvm.org/D32521 llvm-svn: 302444	2017-05-08 17:22:34 +00:00
Sanjay Patel	6745447753	[InstSimplify] fix typo; NFC llvm-svn: 302439	2017-05-08 16:35:02 +00:00
Craig Topper	6e11a05e7e	[ValueTracking] Introduce a version of computeKnownBits that returns a KnownBits struct. Begin using it to replace internal usages of ComputeSignBit This introduces a new interface for computeKnownBits that returns the KnownBits object instead of requiring it to be pre-constructed and passed in by reference. This is a much more convenient interface as it doesn't require the caller to figure out the BitWidth to pre-construct the object. It's so convenient that I believe we can use this interface to remove the special ComputeSignBit flavor of computeKnownBits. As a step towards that idea, this patch replaces all of the internal usages of ComputeSignBit with this new interface. As you can see from the patch there were a couple places where we called ComputeSignBit which really called computeKnownBits, and then called computeKnownBits again directly. I've reduced those places to only making one call to computeKnownBits. I bet there are probably external users that do it too. A future patch will update the external users and remove the ComputeSignBit interface. I'll also working on moving more locations to the KnownBits returning interface for computeKnownBits. Differential Revision: https://reviews.llvm.org/D32848 llvm-svn: 302437	2017-05-08 16:22:48 +00:00
Sanjay Patel	2df38a80f1	[InstCombine/InstSimplify] add comments about code duplication; NFC llvm-svn: 302436	2017-05-08 16:21:55 +00:00
Zvi Rackover	558f86b4bc	InstructionSimplify: Refactor foldIdentityShuffles. NFC. Summary: Minor refactoring of foldIdentityShuffles() which allows the removal of a ConstantDataVector::get() in SimplifyShuffleVectorInstruction. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32955 Conflicts: lib/Analysis/InstructionSimplify.cpp llvm-svn: 302433	2017-05-08 15:46:58 +00:00
Zvi Rackover	dfbd3d7903	IR: Add a shufflevector mask commutation helper function. NFC. Summary: Following up on Sanjay's suggetion in D32955, move this functionality into ShuffleVectornstruction. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32956 llvm-svn: 302420	2017-05-08 12:40:18 +00:00
Craig Topper	389d8cebd1	[SCEV] Use APInt::operator*=(uint64_t) to avoid a temporary APInt for a constant. llvm-svn: 302404	2017-05-08 04:55:13 +00:00
Craig Topper	d6f2639fd7	[SCEV] Have getRangeForAffineARHelper take StartRange by const reference to avoid a copy in many of the cases. llvm-svn: 302398	2017-05-08 02:29:15 +00:00
Zvi Rackover	973ff7c74c	InstructionSimplify: Relanding r301766 Summary: Re-applying r301766 with a fix to a typo and a regression test. The log message for r301766 was: ================================================================================== InstructionSimplify: Canonicalize shuffle operands. NFC-ish. Summary: Apply canonicalization rules: 1. Input vectors with no elements selected from can be replaced with undef. 2. If only one input vector is constant it shall be the second one. This allows constant-folding to cover more ad-hoc simplifications that were in place and avoid duplication for RHS and LHS checks. There are more rules we may want to add in the future when we see a justification. e.g. mask elements that select undef elements can be replaced with undef. ================================================================================== Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32863 llvm-svn: 302373	2017-05-07 18:16:37 +00:00
Craig Topper	252682a41b	[SCEV] Use move semantics in ScalarEvolution::setRange Summary: This makes setRange take ConstantRange by rvalue reference since most callers were passing an unnamed temporary ConstantRange. We can then move that ConstantRange into the DenseMap caches. For the callers that weren't passing a temporary, I've added std::move to to the local variable being passed. Reviewers: sanjoy, mzolotukhin, efriedma Reviewed By: sanjoy Subscribers: takuto.ikuta, llvm-commits Differential Revision: https://reviews.llvm.org/D32943 llvm-svn: 302371	2017-05-07 16:28:17 +00:00
Sanjay Patel	599e65b1ff	[InstSimplify] use ConstantRange to simplify or-of-icmps We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases. I had to check some of these with Alive to prove to myself it's right, but everything seems to check out. Eg, the deleted code in instcombine was completely ignoring predicates with mismatched signedness. This is a follow-up to: https://reviews.llvm.org/rL301260 https://reviews.llvm.org/D32143 llvm-svn: 302370	2017-05-07 15:11:40 +00:00
Sanjoy Das	df8c2ebe73	Remove unnecessary const_cast llvm-svn: 302368	2017-05-07 05:29:36 +00:00
Sanjoy Das	40415eeb59	Use array_pod_sort instead of std::sort llvm-svn: 302367	2017-05-07 05:29:34 +00:00
Craig Topper	6c5e22a4b8	[SCEV] Remove extra APInt copies from getRangeForAffineARHelper. This changes one parameter to be a const APInt& since we only read from it. Use std::move on local APInts once they are no longer needed so we can reuse their allocations. Lastly, use operator+=(uint64_t) instead of adding 1 to an APInt twice creating a new APInt each time. llvm-svn: 302335	2017-05-06 06:03:07 +00:00
Craig Topper	69f1af29fb	[SCEV] Use std::move to avoid some APInt copies. llvm-svn: 302334	2017-05-06 05:22:56 +00:00
Craig Topper	c97fdb846e	[SCEV] Use APInt's uint64_t operations instead of creating a temporary APInt to hold 1. llvm-svn: 302333	2017-05-06 05:15:11 +00:00
Craig Topper	8f26b7945e	[SCEV] Avoid a couple APInt copies by capturing by reference since the method returns a reference. llvm-svn: 302332	2017-05-06 05:15:09 +00:00
Craig Topper	2b195fd2c3	[LazyValueInfo] Avoid unnecessary copies of ConstantRanges Summary: ConstantRange contains two APInts which can allocate memory if their width is larger than 64-bits. So we shouldn't copy it when we can avoid it. This changes LVILatticeVal::getConstantRange() to return its internal ConstantRange by reference. This allows many places that just need a ConstantRange reference to avoid making a copy. Several places now capture the return value of getConstantRange() by reference so they can call methods on it that don't need a new object. Lastly it adds std::move in one place to capture to move a local ConstantRange into an LVILatticeVal. Reviewers: reames, dberlin, sanjoy, anna Reviewed By: reames Subscribers: grandinj, llvm-commits Differential Revision: https://reviews.llvm.org/D32884 llvm-svn: 302331	2017-05-06 03:35:15 +00:00
Matthias Braun	60b40b8fec	TargetLibraryInfo: Introduce wcslen wcslen is part of the C99 and C++98 standards. - This introduces the function to TargetLibraryInfo. - Also set attributes for wcslen in llvm::inferLibFuncAttributes(). Differential Revision: https://reviews.llvm.org/D32837 llvm-svn: 302278	2017-05-05 20:25:50 +00:00
Craig Topper	f0aeee01c3	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262	2017-05-05 17:36:09 +00:00
Sanjay Patel	e42b4d566e	[InstSimplify] add folds for or-of-casted-icmps The sibling folds for 'and' with casts were added with https://reviews.llvm.org/rL273200. This is a preliminary step for adding the 'or' variants for the folds added with https://reviews.llvm.org/rL301260. The reason for the strange form with constant LHS in the 1st test is because there's another missing fold in that case for the inverted predicate. That should be fixed when we add the ConstantRange functionality for 'or-of-icmps' that already exists for 'and-of-icmps'. I'm hoping to share more code for the and/or cases, so we won't have these differences. This will allow us to remove code from InstCombine. It's also possible that we can remove some code here in InstSimplify. I think we have some duplicated folds because patterns are not matched in a general way. Differential Revision: https://reviews.llvm.org/D32876 llvm-svn: 302189	2017-05-04 19:51:34 +00:00
Sanjay Patel	142cb83768	[InstSimplify] move logic-of-icmps helper functions; NFC Putting these next to each other should make it easier to see what's missing from each side. Patch to plug one of those holes should be posted soon. llvm-svn: 302178	2017-05-04 18:19:17 +00:00
Peter Collingbourne	9667b91b13	Re-apply r302108, "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI." with a fix for the clang backend. llvm-svn: 302176	2017-05-04 18:03:25 +00:00
Michael Zolotukhin	3207d30fdd	Fix a typo. llvm-svn: 302175	2017-05-04 17:42:34 +00:00
Eric Liu	f6039f255e	Revert "IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI." This reverts commit r302108. This causes crash in clang bootstrap with LTO. Contacted the auther in the original commit. llvm-svn: 302140	2017-05-04 11:49:39 +00:00
Peter Collingbourne	5f85a9deda	IR: Use pointers instead of GUIDs to represent edges in the module summary. NFCI. When profiling a no-op incremental link of Chromium I found that the functions computeImportForFunction and computeDeadSymbols were consuming roughly 10% of the profile. The goal of this change is to improve the performance of those functions by changing the map lookups that they were previously doing into pointer dereferences. This is achieved by changing the ValueInfo data structure to be a pointer to an element of the global value map owned by ModuleSummaryIndex, and changing reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs. This means that a ValueInfo will take a client directly to the summary list for a given GUID. Differential Revision: https://reviews.llvm.org/D32471 llvm-svn: 302108	2017-05-04 03:36:16 +00:00
Michael Zolotukhin	37162adf3e	[SCEV] createAddRecFromPHI: Optimize for the most common case. Summary: The existing implementation creates a symbolic SCEV expression every time we analyze a phi node and then has to remove it, when the analysis is finished. This is very expensive, and in most of the cases it's also unnecessary. According to the data I collected, ~60-70% of analyzed phi nodes (measured on SPEC) have the following form: PN = phi(Start, OP(Self, Constant)) Handling such cases separately significantly speeds this up. Reviewers: sanjoy, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32663 llvm-svn: 302096	2017-05-03 23:53:38 +00:00
Craig Topper	8189a87a1e	[KnownBits] Add methods for determining if KnownBits is a constant value This patch adds isConstant and getConstant for determining if KnownBits represents a constant value and to retrieve the value. Use them to simplify code. Differential Revision: https://reviews.llvm.org/D32785 llvm-svn: 302091	2017-05-03 23:12:29 +00:00
Craig Topper	6b3940a4b3	[ValueTracking] Remove handling for BitWidth being 0 in ComputeSignBit and isKnownNonZero. I don't believe its possible to have non-zero values here since DataLayout became required. The APInt constructor inside of the KnownBits object will assert if this ever happens. llvm-svn: 302089	2017-05-03 22:25:19 +00:00
Craig Topper	d938fd1397	[KnownBits] Add zext, sext, and trunc methods to KnownBits This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible. Differential Revision: https://reviews.llvm.org/D32784 llvm-svn: 302088	2017-05-03 22:07:25 +00:00
Reid Kleckner	a0b45f4bfc	[IR] Abstract away ArgNo+1 attribute indexing as much as possible Summary: Do three things to help with that: - Add AttributeList::FirstArgIndex, which is an enumerator currently set to 1. It allows us to change the indexing scheme with fewer changes. - Add addParamAttr/removeParamAttr. This just shortens addAttribute call sites that would otherwise need to spell out FirstArgIndex. - Remove some attribute-specific getters and setters from Function that take attribute list indices. Most of these were only used from BuildLibCalls, and doesNotAlias was only used to test or set if the return value is malloc-like. I'm happy to split the patch, but I think they are probably easier to review when taken together. This patch should be NFC, but it sets the stage to change the indexing scheme to this, which is more convenient when indexing into an array: 0: func attrs 1: retattrs 2...: arg attrs Reviewers: chandlerc, pete, javed.absar Subscribers: david2050, llvm-commits Differential Revision: https://reviews.llvm.org/D32811 llvm-svn: 302060	2017-05-03 18:17:31 +00:00
Matt Arsenault	6a288c1e32	Replace hardcoded intrinsic list with speculatable attribute. No change in which intrinsics should be speculated. llvm-svn: 301995	2017-05-03 02:26:10 +00:00
Peter Collingbourne	e95901caa4	Revert r295861, "[ModuleSummaryAnalysis] Don't crash when referencing unnamed globals." We should always expect values to be named before running the module summary analysis (see NameAnonGlobals pass), so it's fine if we crash in that case. llvm-svn: 301991	2017-05-03 00:18:48 +00:00
Sanjay Patel	d091e76e0e	revert r301766: InstructionSimplify: Canonicalize shuffle operands. NFC-ish Turns out this wasn't NFC-ish at all because there's a bug processing shuffles that change the size of their input vectors (that case always seems to trip us up). This should fix PR32872 while we investigate how it failed and reduce a testcase: https://bugs.llvm.org/show_bug.cgi?id=32872 llvm-svn: 301977	2017-05-02 21:37:28 +00:00
Xinliang David Li	351d9b01b9	Refactor callsite cost computation into a helper function /NFC Makes code more readable. The function will also be used by the partial inlining's cost analysis. llvm-svn: 301899	2017-05-02 05:38:41 +00:00
George Burgess IV	7bc507a2e8	Revert r301880 This change caused buildbot failures, apparently because we're not passing around types that InstSimplify is used to seeing. I'm not overly familiar with InstSimplify, so I'm reverting this until I can figure out what exactly is wrong. llvm-svn: 301885	2017-05-01 23:54:41 +00:00
George Burgess IV	6935aefdf0	[InstSimplify] Handle selects of GEPs with 0 offset In particular (since it wouldn't fit nicely in the summary): (select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V) Differential Revision: https://reviews.llvm.org/D31435 llvm-svn: 301880	2017-05-01 23:12:08 +00:00
Sanjoy Das	e6bca0eecb	Rename WeakVH to WeakTrackingVH; NFC This relands r301424. llvm-svn: 301812	2017-05-01 17:07:49 +00:00
Sanjoy Das	08989c7ecd	Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC Summary: programUndefinedIfPoison makes more sense, given what the function does; and I'm about to add a function with a name similar to isKnownNotFullPoison (so do the rename to avoid confusion). Reviewers: broune, majnemer, bjarke.roune Reviewed By: broune Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30444 llvm-svn: 301776	2017-04-30 19:41:19 +00:00
Zvi Rackover	9d8cd821e6	InstructionSimplify: Canonicalize shuffle operands. NFC-ish. Summary: Apply canonicalization rules: 1. Input vectors with no elements selected from can be replaced with undef. 2. If only one input vector is constant it shall be the second one. This allows constant-folding to cover more ad-hoc simplifications that were in place and avoid duplication for RHS and LHS checks. There are more rules we may want to add in the future when we see a justification. e.g. mask elements that select undef elements can be replaced with undef. Reviewers: spatel, RKSimon, andreadb, davide Reviewed By: spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32338 llvm-svn: 301766	2017-04-30 06:25:04 +00:00
Zvi Rackover	0411e46fff	InstructionSimplify: One getShuffleMask() replacing multiple getMaskValue(). NFC. Summary: This is a preparatory step for D32338. Reviewers: RKSimon, spatel Reviewed By: RKSimon, spatel Subscribers: spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D32388 llvm-svn: 301765	2017-04-30 06:10:54 +00:00
Zvi Rackover	4086e13e0d	InstructionSimplify: Simplify a shuffle with a undef mask to undef Summary: Following the discussion in pr32486, adding the simplification: shuffle %x, %y, undef -> undef Reviewers: spatel, RKSimon, andreadb, davide Reviewed By: spatel Subscribers: jroelofs, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D32293 llvm-svn: 301764	2017-04-30 06:06:26 +00:00
Craig Topper	ca48af3c87	[KnownBits] Add methods for determining if the known bits represent a negative/nonnegative number and add methods for changing the negative/nonnegative state Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit. Reviewers: RKSimon, spatel, davide Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32651 llvm-svn: 301747	2017-04-29 16:43:11 +00:00
Michael Zolotukhin	146a221260	[SCEV] Use early exit in createAddRecFromPHI. NFC. llvm-svn: 301703	2017-04-28 22:14:27 +00:00
Matt Arsenault	cf5e7fe358	[ValueTracking] Teach isSafeToSpeculativelyExecute() about the speculatable attribute Patch by Tom Stellard llvm-svn: 301688	2017-04-28 21:13:09 +00:00
Daniel Berlin	4d0fe64ae3	Kill off the old SimplifyInstruction API by converting remaining users. llvm-svn: 301673	2017-04-28 19:55:38 +00:00
Reid Kleckner	6652a52e2b	Use Argument::hasAttribute and AttributeList::ReturnIndex more This eliminates many extra 'Idx' induction variables in loops over arguments in CodeGen/ and Target/. It also reduces the number of places where we assume that ReturnIndex is 0 and that we should add one to argument numbers to get the corresponding attribute list index. NFC llvm-svn: 301666	2017-04-28 18:37:16 +00:00
Craig Topper	24db6b800f	[APInt] Add clearSignBit method. Use it and setSignBit in a few places. NFCI llvm-svn: 301656	2017-04-28 16:58:05 +00:00
Craig Topper	96d6ee8576	[LazyValueInfo] Fix typo in comment. NFC llvm-svn: 301655	2017-04-28 16:57:59 +00:00
Craig Topper	9eb2d72a1d	[ValueTracking] Use APInt::isSubsetOf and APInt::intersects. NFC llvm-svn: 301654	2017-04-28 16:57:55 +00:00
Jun Bum Lim	919f9e8d65	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Craig Topper	f42b23f7d8	[ValueTracking] Convert computeKnownBitsFromRangeMetadata to use KnownBits struct. llvm-svn: 301626	2017-04-28 06:28:56 +00:00
Daniel Berlin	99397cea69	Kill the old Simplify* APIs, leave SimplifyInstruction for the moment llvm-svn: 301467	2017-04-26 20:56:17 +00:00
Daniel Berlin	e6cb21a287	PHITransAddr: Use new SimplifyQuery based API. llvm-svn: 301465	2017-04-26 20:56:13 +00:00
Craig Topper	b45eabcf82	[ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero\|One) so we don't write it out everywhere. Maybe a method for (Zero\|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432	2017-04-26 16:39:58 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Sanjoy Das	01de557738	Rename WeakVH to WeakTrackingVH; NFC Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424	2017-04-26 16:20:52 +00:00
Daniel Berlin	3fef15b73f	InstructionSimplify: Use braced initializer list for SimplifyQuery creation llvm-svn: 301381	2017-04-26 04:10:02 +00:00
Daniel Berlin	e8d74dce81	InstructionSimplify: Have SimplifyFPBinOp pass FastMathFlags by value, like we do everywhere else llvm-svn: 301380	2017-04-26 04:10:00 +00:00
Daniel Berlin	5e3fcb1a2b	InstructionSimplify: End our long national nightmare of ever-growing Simplify* arguments. Summary: Expose the internal query structure, start using it. Note: This is the most minimal change possible i could create. I have trivial followups, like fixing the one use of const FastMathFlags &, the renaming of CtxI to be consistent, etc. This should be NFC. Reviewers: majnemer, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32448 llvm-svn: 301379	2017-04-26 04:09:56 +00:00
Craig Topper	f3dbd17d0a	[APInt] Use isSubsetOf, intersects, and bit counting methods to reduce temporary APInts This patch uses various APInt methods to reduce temporary APInt creation. This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking. I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time. Differential Revision: https://reviews.llvm.org/D32495 llvm-svn: 301338	2017-04-25 17:46:30 +00:00
Craig Topper	0b650d3569	[InstSimplify] Handle (~A & ~B) \| (~A ^ B) -> ~A ^ B The code Sanjay Patel moved over from InstCombine doesn't work properly if the 'and' has both inputs as nots because we used a commuted op matcher on the 'and' first. But this will bind to the first 'not' on 'and' when there could be two 'not's. InstCombine could rely on DeMorgan to ensure the 'and' wouldn't have two 'not's eventually, but InstSimplify can't rely on that. This patch matches the xor first then checks for the ands and allows a not of either operand of the xor. Differential Revision: https://reviews.llvm.org/D32458 llvm-svn: 301329	2017-04-25 17:01:32 +00:00
Craig Topper	2d9afa7745	[ValueTracking] Use APInt::operator\|=(uint64_t) instead of creating a temporary APInt. NFC llvm-svn: 301325	2017-04-25 16:48:14 +00:00
Craig Topper	da8ff4181c	[ValueTracking] Use APInt instead of auto. NFC This is a pre-commit for a patch I'm working on to turn KnownZero/One into a struct. Once I do that the type here will be less obvious. llvm-svn: 301324	2017-04-25 16:48:09 +00:00
Craig Topper	9c932d31e1	[ValueTracking] Use BitWidth local variable instead of re-reading it from KnownZero. NFC This is a pre-commit for a patch that I'm working on to merge KnownZero/KnownOne into a KnownBits struct which would have had to touch this line. llvm-svn: 301323	2017-04-25 16:48:03 +00:00
Sanjoy Das	561247a823	[IVUsers] Don't bail out of normalizing non-affine add recs Summary: In a previous change I changed SCEV's normalization / denormalization to work with non-affine add recs. So the bailout in IVUsers can be removed. Reviewers: atrick, efriedma Reviewed By: atrick Subscribers: davide, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32105 llvm-svn: 301298	2017-04-25 06:53:25 +00:00
Sanjoy Das	bbebcb6c4d	Teach SCEV normalization to de/normalize non-affine add recs Summary: Before this change, SCEV Normalization would incorrectly normalize non-affine add recurrences. To work around this there was (still is) a check in place to make sure we only tried to normalize affine add recurrences. We recently found a bug in aforementioned check to bail out of normalizing non-affine add recurrences. However, instead of fixing the bailout, I have decided to teach SCEV normalization to work correctly with non-affine add recurrences, making the bailout unnecessary (I'll remove it in a subsequent change). I've also added some unit tests (which would have failed before this change). Reviewers: atrick, sunfish, efriedma Reviewed By: atrick Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32104 llvm-svn: 301281	2017-04-25 00:09:19 +00:00
Sanjay Patel	35c362ebbb	[InstSimplify] use ConstantRange to simplify more and-of-icmps We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases. I had to check some of these with Alive to prove to myself it's right, but everything seems to check out. Eg, the code in instcombine was completely ignoring predicates with mismatched signedness. Handling or-of-icmps would be a follow-up step. Differential Revision: https://reviews.llvm.org/D32143 llvm-svn: 301260	2017-04-24 21:52:39 +00:00
Piotr Padlewski	610c966a4e	Handle invariant.group.barrier in BasicAA Summary: llvm.invariant.group.barrier returns pointer that mustalias pointer it takes. It can't be marked with `returned` attribute, because it would be remove easily. The other reason is that only Alias Analysis can know about this, because if any other pass would know it, then the result would be replaced with it's argument, which would be invalid. We can think about returned pointer as something that mustalias, but it doesn't have to be bitwise the same as the argument. Reviewers: dberlin, chandlerc, hfinkel, sanjoy Subscribers: reames, nlewycky, rsmith, anna, amharc Differential Revision: https://reviews.llvm.org/D31585 llvm-svn: 301227	2017-04-24 19:37:17 +00:00
Sanjay Patel	0889225f51	[InstSimplify] move (A & ~B) \| (A ^ B) -> (A ^ B) from InstCombine This is a straight cut and paste, but there's a bigger problem: if this fold exists for simplifyOr, there should be a DeMorganized version for simplifyAnd. But more than that, we have a patchwork of ad hoc logic optimizations in InstCombine. There should be some structure to ensure that we're not missing sibling folds across and/or/xor. llvm-svn: 301213	2017-04-24 18:24:36 +00:00
Davide Italiano	ebd77645cc	[DomPrinter] Add a way to programmatically dump a dot representation. Differential Revision: https://reviews.llvm.org/D32145 llvm-svn: 301205	2017-04-24 17:48:44 +00:00
Sanjoy Das	0cdcdf018e	Revert "[SCEV] Enable SCEV verification by default in EXPENSIVE_CHECKS builds" This reverts commit r301150. It breaks CodeGen/Hexagon/hwloop-wrap2.ll, reverting while I investigate. llvm-svn: 301154	2017-04-24 02:35:19 +00:00
Sanjoy Das	25972aa82e	Fix unused variables / fields warnings in release builds llvm-svn: 301151	2017-04-24 00:46:40 +00:00

... 8 9 10 11 12 ...

8113 Commits