llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	95055d8f8b	[PM] Fix a bug where through CGSCC iteration we can get infinite-inlining across multiple runs of the inliner by keeping a tiny history of internal-to-SCC inlining decisions. This is still a bit gross, but I don't yet have any fundamentally better ideas and numerous people are blocked on this to use new PM and ThinLTO together. The core of the idea is to detect when we are about to do an inline that has a chance of re-splitting an SCC which we have split before with a similar inlining step. That is a critical component in the inlining forming a cycle and so far detects all of the various cyclic patterns I can come up with as well as the original real-world test case (which comes from a ThinLTO build of libunwind). I've added some tests that I think really demonstrate what is going on here. They are essentially state machines that march the inliner through various steps of a cycle and check that we stop when the cycle is closed and that we actually did do inlining to form that cycle. A lot of thanks go to Eric Christopher and Sanjoy Das for the help understanding this issue and improving the test cases. The biggest "yuck" here is the layering issue -- the CGSCC pass manager is providing somewhat magical state to the inliner for it to use to make itself converge. This isn't great, but I don't honestly have a lot of better ideas yet and at least seems nicely isolated. I have tested this patch, and it doesn't block any inlining on the entire LLVM test suite and SPEC, so it seems sufficiently narrowly targeted to the issue at hand. We have come up with hypothetical issues that this patch doesn't cover, but so far none of them are practical and we don't have a viable solution yet that covers the hypothetical stuff, so proceeding here in the interim. Definitely an area that we will be back and revisiting in the future. Differential Revision: https://reviews.llvm.org/D36188 llvm-svn: 309784	2017-08-02 02:09:22 +00:00
Chandler Carruth	06a86301a1	[PM/LCG] Follow-up fix to r308088 to handle deletion of library functions. In the prior commit, we provide ordering to the LCG between functions and library function definitions that they might begin to call through transformations. But we still would delete these library functions from the call graph if they became dead during inlining. While this immediately crashed, it also exposed a loss of information. We shouldn't remove definitions of library functions that can still usefully participate in the LCG-powered CGSCC optimization process. If new call edges are formed, we want to have definitions to be called. We can still remove these functions if truly dead using global-dce, etc, but removing them during the CGSCC walk is premature. This fixes a crash in the new PM when optimizing some unusual libraries that end up with "internal" lib functions such as the code in the "R" language's libraries. llvm-svn: 308417	2017-07-19 04:12:25 +00:00
Chandler Carruth	bd9c29039e	[PM] Finish implementing and fix a chain of bugs uncovered by testing the invalidation propagation logic from an SCC to a Function. I wrote the infrastructure to test this but didn't actually use it in the unit test where it was designed to be used. =[ My bad. Once I actually added it to the test case I discovered that it also hadn't been properly implemented, so I've implemented it. The logic in the FAM proxy for an SCC pass to propagate invalidation follows the same ideas as the FAM proxy for a Module pass, but the implementation is a bit different to reflect the fact that it is forwarding just for an SCC. However, implementing this correctly uncovered a surprising "bug" (it was conservatively correct but relatively very expensive) in how we handle invalidation when splitting one SCC into multiple SCCs. We did an eager invalidation when in reality we should be deferring invaliadtion for the current SCC to the CGSCC pass manager and just invaliating the newly constructed SCCs. Otherwise we end up invalidating too much too soon. This was exposed by the inliner test case that I've updated. Now, we invalidate just the split off '(test1_f)' SCC when doing the CG update, and then the inliner finishes and invalidates the '(test1_g, test1_h)' SCC's analyses. The first few attempts at fixing this hit still more bugs, but all of those are covered by existing tests. For example, the inliner should also preserve the FAM proxy to avoid unnecesasry invalidation, and this is safe because the CG update routines it uses handle any necessary adjustments to the FAM proxy. Finally, the unittests for the CGSCC pass manager needed a bunch of updates where we weren't correctly preserving the FAM proxy because it hadn't been fully implemented and failing to preserve it didn't matter. Note that this doesn't yet fix the current crasher due to MemSSA finding a stale dominator tree, but without this the fix to that crasher doesn't really make any sense when testing because it relies on the proxy behavior. llvm-svn: 307487	2017-07-09 03:59:31 +00:00
David Blaikie	6d0f39476a	Inliner: Avoid calling shouldInline until it's absolutely necessary This restores the order of evaluation (& conditionalized evaluation) of isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline (with the addition of a shouldInline call after isTriviallyDeadInstruction) from before r305245. llvm-svn: 305267	2017-06-13 02:24:09 +00:00
David Blaikie	ae8c4af4ac	Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner llvm-svn: 305245	2017-06-12 23:01:17 +00:00
David Blaikie	cb9327b02d	Inliner: Don't touch indirect calls Other comments/implications are that this isn't intended behavior (nor perserved/reimplemented in the new inliner) & complicates fixing the 'inlining' of trivially dead calls without consulting the cost function first. llvm-svn: 305052	2017-06-09 03:29:20 +00:00
Easwaran Raman	f5f9160072	[ProfileSummary] Make getProfileCount a non-static member function. This change is required because the notion of count is different for sample profiling and getProfileCount will need to determine the underlying profile type. Differential revision: https://reviews.llvm.org/D33012 llvm-svn: 302597	2017-05-09 23:21:10 +00:00
Evgeny Astigeevich	7823c66e05	r286814 resulted that CallPenalty can be subtracted twice: - First time, during calculation of the cost in InlineCost.cpp - Second time, during calculation of the cost in Inliner.cpp This patches fixes this. Differential Revision: https://reviews.llvm.org/D31137 llvm-svn: 298496	2017-03-22 12:01:57 +00:00
Chandler Carruth	814e0df1c5	[PM/Inliner] Fix a bug in r297374 where we would leave stale calls in the work queue and crash when trying to visit them after deleting the function containing those calls. llvm-svn: 297940	2017-03-16 10:45:42 +00:00
Chandler Carruth	20e588e1af	[PM/Inliner] Make the new PM's inliner process call edges across an entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into insanely large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char str; void g(const char ); template <bool K, int N> void f(bool B, bool E) { if (K) g(str<N>); if (B == E) return; if (B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool B, bool E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool B, bool E) { return f<true, 0>(B, E); } extern bool arr, end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an awesome* job here as long as it does ok. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here should be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374	2017-03-09 11:35:40 +00:00
Taewook Oh	f22fa72e4a	Do not apply redundant LastCallToStaticBonus Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075	2017-02-14 17:30:05 +00:00
Chandler Carruth	aaad9f84be	[PM/LCG] Teach the LazyCallGraph how to replace a function without disturbing the graph or having to update edges. This is motivated by porting argument promotion to the new pass manager. Because of how LLVM IR Function objects work, in order to change their signature a new object needs to be created. This is efficient and straight forward in the IR but previously was very hard to implement in LCG. We could easily replace the function a node in the graph represents. The challenging part is how to handle updating the edges in the graph. LCG previously used an edge to a raw function to represent a node that had not yet been scanned for calls and references. This was the core of its laziness. However, that model causes this kind of update to be very hard: 1) The keys to lookup an edge need to be `Function`s that would all need to be updated when we update the node. 2) There will be some unknown number of edges that haven't transitioned from `Function` edges to `Node` edges. All of this complexity isn't necessary. Instead, we can always build a node around any function, always pointing edges at it and always using it as the key to lookup an edge. To maintain the laziness, we need to sink the edges* of a node into a secondary object and explicitly model transitioning a node from empty to populated by scanning the function. This design seems much cleaner in a number of ways, but importantly there is now exactly one place where the `Function` has to be updated! Some other cleanups that fall out of this include having something to model the entry* edges more accurately. Rather than hand rolling parts of the node in the graph itself, we have an explicit `EdgeSequence` object that gives us exactly the functionality needed. We also have a consistent place to define the edge iterators and can use them for both the entry edges and the internal edges of the graph. The API used to model the separation between a node and its edges is intentionally very thin as most clients are expected to deal with nodes that have populated edges. We model this exactly as an optional does with an additional method to populate the edges when that is a reasonable thing for a client to do. This is based on API design suggestions from Richard Smith and David Blaikie, credit goes to them for helping pick how to model this without it being either too explicit or too implicit. The patch is somewhat noisy due to shifting around iterator types and new syntax for walking the edges of a node, but most of the functionality change is in the `Edge`, `EdgeSequence`, and `Node` types. Differential Revision: https://reviews.llvm.org/D29577 llvm-svn: 294653	2017-02-09 23:24:13 +00:00
Peter Collingbourne	cea1e4e79a	De-duplicate some code for creating an AARGetter suitable for the legacy PM. I'm about to use this in a couple more places. Differential Revision: https://reviews.llvm.org/D29793 llvm-svn: 294648	2017-02-09 23:11:52 +00:00
Adam Nemet	e7bdf227f6	[Inliner] Fold analysis remarks into missed remarks This significantly reduces the noise level of these messages. llvm-svn: 293492	2017-01-30 16:22:45 +00:00
Haicheng Wu	f8dc2d8c8b	[Inliner] Fix a comment to match the code. NFC. TotalAltCost => TotalSecondaryCost Differential Revision: https://reviews.llvm.org/D29231 llvm-svn: 293490	2017-01-30 16:15:14 +00:00
Chandler Carruth	6acdca78a0	[PH] Replace uses of AssertingVH from members of analysis results with a lazy-asserting PoisoningVH. AssertVH is fundamentally incompatible with cache-invalidation of analysis results. The invaliadtion happens after the AssertingVH has already fired. Instead, use a PoisoningVH that will assert if the dangling handle is ever used rather than merely be assigned or destroyed. This patch also removes all of the (numerous) doomed attempts to work around this fundamental incompatibility. It is a pretty significant simplification IMO. The most interesting change is in the Inliner where we still do some clearing because we don't want to rely on the coarse grained invalidation strategy of the containing pass manager. However, I prefer the approach that contains this logic to the cleanup phase of the Inliner, and I think we could enhance the CGSCC analysis management layer to make this even better in the future if desired. The rest is straight cleanup. I've also added a test for one of the harder cases to work around: when a module analysis contains many AssertingVHes pointing at functions. Differential Revision: https://reviews.llvm.org/D29006 llvm-svn: 292928	2017-01-24 12:55:57 +00:00
Chandler Carruth	9524af4aac	[PM] Clear any analyses for a dead function after inlining it and before clearing its body. This is essential to avoid triggering asserting value handles in analyses on the function's body. I'm working on a test case for this behavior in LLVM, but Clang has a great one that managed to trigger this on all of the bots already. llvm-svn: 292770	2017-01-23 07:03:41 +00:00
Chandler Carruth	b698d5964d	[PM] Fix a really nasty bug introduced when adding PGO support to the new PM's inliner. The bug happens when we refine an SCC after having computed a proxy for the FunctionAnalysisManager, and then proceed to compute fresh analyses for functions in the new SCC using the manager provided by the old SCC's proxy. And when we manage to mutate a function in this new SCC in a way that invalidates those analyses. This can be... challenging to reproduce. I've managed to contrive a set of functions that trigger this and added a test case, but it is a bit brittle. I've directly checked that the passes run in the expected ways to help avoid the test just becoming silently irrelevant. This gets the new PM back to passing the LLVM test suite after the PGO improvements landed. llvm-svn: 292757	2017-01-22 10:34:01 +00:00
Chandler Carruth	d4be9f4b8d	[PM] Add some debug logging to the new PM inliner to make it easier to trace its behavior. llvm-svn: 292756	2017-01-22 10:33:58 +00:00
Easwaran Raman	12585b0148	Improve PGO support for the new inliner This adds the following to the new PM based inliner in PGO mode: * Use block frequency analysis to derive callsite's profile count and use that to adjust thresholds of hot and cold callsites. * Incrementally update the BFI of the caller after a callee gets inlined into it. This incremental update is only within an invocation of the run method - BFI is not preserved across calls to run. Update the function entry count of the callee after inlining it into a caller. * I've tuned the thresholds for the hot and cold callsites using a hacked up version of the old inliner that explicitly computes BFI on a set of internal benchmarks and spec. Once the new PM based pipeline stabilizes (IIRC Chandler mentioned there are known issues) I'll benchmark this again and adjust the thresholds if required. Inliner PGO support. Differential revision: https://reviews.llvm.org/D28331 llvm-svn: 292666	2017-01-20 22:44:04 +00:00
Chandler Carruth	9900d18bab	[PM] Teach the inliner's call graph update to handle inserting new edges when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649	2016-12-28 03:13:12 +00:00
Chandler Carruth	141bf5d14d	[PM] Add one of the features left out of the initial inliner patch: skipping indirectly recursive inline chains. To do this, we implicitly build an inline stack for each callsite and check prior to inlining that doing so would not form a cycle. This uses the exact same technique and even shares some code with the legacy PM inliner. This solution remains deeply unsatisfying to me because it means we cannot actually iterate the inliner externally. Doing so would not be able to easily detect and avoid such cycles. Some day I would very much like to have a solution that works without this internal state to detect cycles, but this is not that day. llvm-svn: 290590	2016-12-27 06:46:20 +00:00
Chandler Carruth	03130d981c	[PM] Teach the inliner in the new PM to merge attributes after inlining. Also enable the new PM in the attributes test case which caught this issue. llvm-svn: 290572	2016-12-27 03:39:54 +00:00
Chandler Carruth	6e9bb7e064	[PM] Teach the always inliner in the new pass manager to support removing fully-dead comdats without removing dead entries in comdats with live members. This factors the core logic out of the current inliner's internals to a reusable utility and leverages that in both places. The factored out code should also be (minorly) more efficient in cases where we have very few dead functions or dead comdats to consider. I've added a test case to cover this behavior of the always inliner. This is the last significant bug in the new PM's always inliner I've found (so far). llvm-svn: 290557	2016-12-26 23:43:27 +00:00
Easwaran Raman	180bd9f6b3	Pass GetAssumptionCache to InlineFunctionInfo constructor Differential revision: https://reviews.llvm.org/D28038 llvm-svn: 290295	2016-12-22 01:07:01 +00:00
Chandler Carruth	1d96311447	[PM] Provide an initial, minimal port of the inliner to the new pass manager. This doesn't implement every feature of the existing inliner, but tries to implement the most important ones for building a functional optimization pipeline and beginning to sort out bugs, regressions, and other problems. Notable, but intentional omissions: - No alloca merging support. Why? Because it isn't clear we want to do this at all. Active discussion and investigation is going on to remove it, so for simplicity I omitted it. - No support for trying to iterate on "internally" devirtualized calls. Why? Because it adds what I suspect is inappropriate coupling for little or no benefit. We will have an outer iteration system that tracks devirtualization including that from function passes and iterates already. We should improve that rather than approximate it here. - Optimization remarks. Why? Purely to make the patch smaller, no other reason at all. The last one I'll probably work on almost immediately. But I wanted to skip it in the initial patch to try to focus the change as much as possible as there is already a lot of code moving around and both of these could be skipped without really disrupting the core logic. A summary of the different things happening here: 1) Adding the usual new PM class and rigging. 2) Fixing minor underlying assumptions in the inline cost analysis or inline logic that don't generally hold in the new PM world. 3) Adding the core pass logic which is in essence a loop over the calls in the nodes in the call graph. This is a bit duplicated from the old inliner, but only a handful of lines could realistically be shared. (I tried at first, and it really didn't help anything.) All told, this is only about 100 lines of code, and most of that is the mechanics of wiring up analyses from the new PM world. 4) Updating the LazyCallGraph (in the new PM) based on the newly inlined calls and references. This is very minimal because we cannot form cycles. 5) When inlining removes the last use of a function, eagerly nuking the body of the function so that any "one use remaining" inline cost heuristics are immediately refined, and queuing these functions to be completely deleted once inlining is complete and the call graph updated to reflect that they have become dead. 6) After all the inlining for a particular function, updating the LazyCallGraph and the CGSCC pass manager to reflect the function-local simplifications that are done immediately and internally by the inline utilties. These are the exact same fundamental set of CG updates done by arbitrary function passes. 7) Adding a bunch of test cases to specifically target CGSCC and other subtle aspects in the new PM world. Many thanks to the careful review from Easwaran and Sanjoy and others! Differential Revision: https://reviews.llvm.org/D24226 llvm-svn: 290161	2016-12-20 03:15:32 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Simon Pilgrim	7d18a70dac	Fix spelling mistakes in Transforms comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287488	2016-11-20 13:19:49 +00:00
Xinliang David Li	f450b88191	Fix typo llvm-svn: 285978	2016-11-04 03:00:52 +00:00
Mehdi Amini	732afdd09a	Turn cl::values() (for enum) from a vararg function to using C++ variadic template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671	2016-10-08 19:41:06 +00:00
Dehao Chen	5461d8bdb5	Refactor the ProfileSummaryInfo to use doInitialization and doFinalization to handle Module update. Summary: This refactors the change in r282616 Reviewers: davidxl, eraman, mehdi_amini Subscribers: mehdi_amini, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D25041 llvm-svn: 282630	2016-09-28 21:00:58 +00:00
Adam Nemet	c507ac96f5	[Inliner] Port all opt remarks to new streaming API llvm-svn: 282559	2016-09-27 23:47:03 +00:00
Adam Nemet	04758ba385	Shorten DiagnosticInfoOptimizationRemark* to OptimizationRemark*. NFC With the new streaming interface, these class names need to be typed a lot and it's way too looong. llvm-svn: 282544	2016-09-27 22:19:23 +00:00
Adam Nemet	1142147e41	[Inliner] Fold the analysis remark into the missed remark There is really no reason for these to be separate. The vectorizer started this pretty bad tradition that the text of the missed remarks is pretty meaningless, i.e. vectorization failed. There, you have to query analysis to get the full picture. I think we should just explain the reason for missing the optimization in the missed remark when possible. Analysis remarks should provide information that the pass gathers regardless whether the optimization is passing or not. llvm-svn: 282542	2016-09-27 21:58:17 +00:00
Adam Nemet	a62b7e1a28	Output optimization remarks in YAML (Re-committed after moving the template specialization under the yaml namespace. GCC was complaining about this.) This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282539	2016-09-27 20:55:07 +00:00
Adam Nemet	cc2a3fa8e8	Revert "Output optimization remarks in YAML" This reverts commit r282499. The GCC bots are failing llvm-svn: 282503	2016-09-27 16:39:24 +00:00
Adam Nemet	92e928c10a	Output optimization remarks in YAML This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282499	2016-09-27 16:15:16 +00:00
Adam Nemet	cef3314156	[Inliner] Report when inlining fails because callee's def is unavailable Summary: This is obviously an interesting case because it may motivate code restructuring or LTO. Reporting this requires instantiation of ORE in the loop where the call sites are first gathered. I've checked compile-time overhead with -Rpass-with-hotness and the worst slow-down was 6% in mcf and quickly tailing off. As before without -Rpass-with-hotness there is no overhead. Because this could be a pretty noisy diagnostics, it is currently qualified as 'verbose'. As of this patch, 'verbose' diagnostics are only emitted with -Rpass-with-hotness, i.e. when the output is expected to be filtered. Reviewers: eraman, chandlerc, davidxl, hfinkel Subscribers: tejohnson, Prazek, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D23415 llvm-svn: 279860	2016-08-26 20:21:05 +00:00
Chandler Carruth	f702d8ecb6	[Inliner] Add a flag to disable manual alloca merging in the Inliner. This is off for now while testing can take place to make sure that in fact we do sufficient stack coloring to fully obviate the manual alloca array merging. Some context on why we should be using stack coloring rather than merging allocas in this way: LLVM relies very heavily on analyzing pointers as coming from different allocas in order to make aliasing decisions. These are some of the most powerful aliasing signals available in LLVM. So merging allocas is an extremely destructive operation on the LLVM IR -- it takes away highly valuable and hard to reconstruct information. As a consequence, inlined functions which happen to have array allocas that this pattern matches will fail to be properly interleaved unless SROA manages to hoist everything to an SSA register. Instead, the inliner will have added an unnecessary dependence that one inlined function execute after the other because they will have been rewritten to refer to the same memory. All that said, folks will reasonably want some time to experiment here and make sure there are no significant regressions. A flag should give us an easy knob to test. For more context, see the thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html Differential Revision: https://reviews.llvm.org/D23052 llvm-svn: 278892	2016-08-17 02:40:23 +00:00
Piotr Padlewski	d89875ca39	Changed sign of LastCallToStaticBouns Summary: I think it is much better this way. When I firstly saw line: Cost += InlineConstants::LastCallToStaticBonus; I though that this is a bug, because everywhere where the cost is being reduced it is usuing -=. Reviewers: eraman, tejohnson, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23222 llvm-svn: 278290	2016-08-10 21:15:22 +00:00
Adam Nemet	896c09bd10	[Inliner,OptDiag] Add hotness attribute to opt diagnostics Summary: The inliner not being a function pass requires the work-around of generating the OptimizationRemarkEmitter and in turn BFI on demand. This will go away after the new PM is ready. BFI is only computed inside ORE if the user has requested hotness information for optimization diagnostitics (-pass-remark-with-hotness at the 'opt' level). Thus there is no additional overhead without the flag. Reviewers: hfinkel, davidxl, eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22694 llvm-svn: 278185	2016-08-10 00:44:44 +00:00
Benjamin Kramer	41e66dade1	[Inliner] Use function_ref for functors which are never taken ownership of. llvm-svn: 277922	2016-08-06 12:33:46 +00:00
Chandler Carruth	8562d3a5e4	[Inliner] clang-format various parts of the inliner prior to changes here. NFC. llvm-svn: 277557	2016-08-03 01:02:31 +00:00
Piotr Padlewski	84abc74f2c	Added ThinLTO inlining statistics Summary: copypasta doc of ImportedFunctionsInliningStatistics class \brief Calculate and dump ThinLTO specific inliner stats. The main statistics are: (1) Number of inlined imported functions, (2) Number of imported functions inlined into importing module (indirect), (3) Number of non imported functions inlined into importing module (indirect). The difference between first and the second is that first stat counts all performed inlines on imported functions, but the second one only the functions that have been eventually inlined to a function in the importing module (by a chain of inlines). Because llvm uses bottom-up inliner, it is possible to e.g. import function `A`, `B` and then inline `B` to `A`, and after this `A` might be too big to be inlined into some other function that calls it. It calculates this statistic by building graph, where the nodes are functions, and edges are performed inlines and then by marking the edges starting from not imported function. If `Verbose` is set to true, then it also dumps statistics per each inlined function, sorted by the greatest inlines count like - number of performed inlines - number of performed inlines to importing module Reviewers: eraman, tejohnson, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D22491 llvm-svn: 277089	2016-07-29 00:27:16 +00:00
Sean Silva	ab6a683765	Avoid using a raw AssumptionCacheTracker in various inliner functions. This unblocks the new PM part of River's patch in https://reviews.llvm.org/D22706 Conveniently, this same change was needed for D21921 and so these changes are just spun out from there. llvm-svn: 276515	2016-07-23 04:22:50 +00:00
Easwaran Raman	71069cf67d	Use ProfileSummaryInfo in inline cost analysis. Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo. Differential revision: http://reviews.llvm.org/D21045 llvm-svn: 272321	2016-06-09 22:23:21 +00:00
Andrew Kaylor	9c81d0fdeb	Avoid including AlwaysInliner pass in opt-bisect search. Differential Revision: http://reviews.llvm.org/D19640 llvm-svn: 270495	2016-05-23 21:57:54 +00:00
Xinliang David Li	4b2fdccad9	Reapply r268107 after fixing a bug breaks debug build. Makes the new method to set data needed by debug dump. llvm-svn: 268130	2016-04-29 22:59:36 +00:00
Xinliang David Li	0552521b03	Revert r268107 -- debug build failure llvm-svn: 268116	2016-04-29 21:43:28 +00:00
Xinliang David Li	1ffa28a3f1	[inliner]: Refactor inline deferring logic into its own method /NFC The implemented heuristic has a large body of code which better sits in its own function for better readability. It also allows adding more heuristics easier in the future. llvm-svn: 268107	2016-04-29 21:21:44 +00:00
Andrew Kaylor	aa641a5171	Re-commit optimization bisect support (r267022) without new pass manager support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231	2016-04-22 22:06:11 +00:00
Vedant Kumar	6013f45f92	Revert "Initial implementation of optimization bisect support." This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115	2016-04-22 06:51:37 +00:00
Andrew Kaylor	f0f279291c	Initial implementation of optimization bisect support. This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022	2016-04-21 17:58:54 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Easwaran Raman	b1bd398ceb	Revert revisions 262636, 262643, 262679, and 262682. llvm-svn: 262883	2016-03-08 00:36:35 +00:00
Easwaran Raman	3b7a8246c9	Fix a use-after-free bug introduced in r262636 llvm-svn: 262679	2016-03-04 00:44:01 +00:00
Easwaran Raman	3035719c86	Infrastructure for PGO enhancements in inliner This patch provides the following infrastructure for PGO enhancements in inliner: Enable the use of block level profile information in inliner Incremental update of block frequency information during inlining Update the function entry counts of callees when they get inlined into callers. Differential Revision: http://reviews.llvm.org/D16381 llvm-svn: 262636	2016-03-03 18:26:33 +00:00
Chandler Carruth	12884f7f80	[AA] Hoist the logic to reformulate various AA queries in terms of other parts of the AA interface out of the base class of every single AA result object. Because this logic reformulates the query in terms of some other aspect of the API, it would easily cause O(n^2) query patterns in alias analysis. These could in turn be magnified further based on the number of call arguments, and then further based on the number of AA queries made for a particular call. This ended up causing problems for Rust that were actually noticable enough to get a bug (PR26564) and probably other places as well. When originally re-working the AA infrastructure, the desire was to regularize the pattern of refinement without losing any generality. While I think it was successful, that is clearly proving to be too costly. And the cost is needless: we gain no actual improvement for this generality of making a direct query to tbaa actually be able to re-use some other alias analysis's refinement logic for one of the other APIs, or some such. In short, this is entirely wasted work. To the extent possible, delegation to other API surfaces should be done at the aggregation layer so that we can avoid re-walking the aggregation. In fact, this significantly simplifies the logic as we no longer need to smuggle the aggregation layer into each alias analysis (or the TargetLibraryInfo into each alias analysis just so we can form argument memory locations!). However, we also have some delegation logic inside of BasicAA and some of it even makes sense. When the delegation logic is baking in specific knowledge of aliasing properties of the LLVM IR, as opposed to simply reformulating the query to utilize a different alias analysis interface entry point, it makes a lot of sense to restrict that logic to a different layer such as BasicAA. So one aspect of the delegation that was in every AA base class is that when we don't have operand bundles, we re-use function AA results as a fallback for callsite alias results. This relies on the IR properties of calls and functions w.r.t. aliasing, and so seems a better fit to BasicAA. I've lifted the logic up to that point where it seems to be a natural fit. This still does a bit of redundant work (we query function attributes twice, once via the callsite and once via the function AA query) but it is exactly twice here, no more. The end result is that all of the delegation logic is hoisted out of the base class and into either the aggregation layer when it is a pure retargeting to a different API surface, or into BasicAA when it relies on the IR's aliasing properties. This should fix the quadratic query pattern reported in PR26564, although I don't have a stand-alone test case to reproduce it. It also seems general goodness. Now the numerous AAs that don't need target library info don't carry it around and depend on it. I think I can even rip out the general access to the aggregation layer and only expose that in BasicAA as it is the only place where we re-query in that manner. However, this is a non-trivial change to the AA infrastructure so I want to get some additional eyes on this before it lands. Sadly, it can't wait long because we should really cherry pick this into 3.8 if we're going to go this route. Differential Revision: http://reviews.llvm.org/D17329 llvm-svn: 262490	2016-03-02 15:56:53 +00:00
Sanjoy Das	1c481f50d2	Add an "addUsedAAAnalyses" helper function Summary: Passes that call `getAnalysisIfAvailable<T>` also need to call `addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the legacy pass manager that it uses `T`. This contract was being violated by passes that used `createLegacyPMAAResults`. This change fixes this by exposing a helper in AliasAnalysis.h, `addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults and does the right thing when called from `getAnalysisUsage`. Reviewers: chandlerc Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17010 llvm-svn: 260183	2016-02-09 01:21:57 +00:00
Easwaran Raman	f4bb2f0dc3	Refactor threshold computation for inline cost analysis Differential Revision: http://reviews.llvm.org/D15401 llvm-svn: 257832	2016-01-14 23:16:29 +00:00
Easwaran Raman	b9f7120e7a	Refactor inline costs analysis by removing the InlineCostAnalysis class InlineCostAnalysis is an analysis pass without any need for it to be one. Once it stops being an analysis pass, it doesn't maintain any useful state and the member functions inside can be made free functions. NFC. Differential Revision: http://reviews.llvm.org/D15701 llvm-svn: 256521	2015-12-28 20:28:19 +00:00
Akira Hatanaka	1cb242eb13	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r256277 with two changes: - In emitFnAttrCompatCheck, change FuncName's type to std::string to fix a use-after-free bug. - Remove an unnecessary install-local target in lib/IR/Makefile. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 256304	2015-12-22 23:57:37 +00:00
Akira Hatanaka	9c05cc5670	Revert r256277 and r256279. Some of the bots failed again. llvm-svn: 256280	2015-12-22 20:29:09 +00:00
Akira Hatanaka	a61deb249b	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r252990 and r252949. I've added member function getKind to the Attr classes which returns the enum or string of the attribute. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 256277	2015-12-22 20:00:05 +00:00
Easwaran Raman	bdb6f1dcc3	Determine callee's hotness and adjust threshold based on that. NFC. This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222	2015-12-22 00:32:35 +00:00
Akira Hatanaka	5af7ace4ee	Revert r252990. Some of the buildbots are still failing. llvm-svn: 252999	2015-11-13 01:44:32 +00:00
Akira Hatanaka	c7dfb76fe7	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r252949. I've changed the type of FuncName to be std::string instead of StringRef in emitFnAttrCompatCheck. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 252990	2015-11-13 01:23:11 +00:00
Akira Hatanaka	f3aa82f666	Revert r252949. It broke some of the bots including clang-x64-ninja-win7. llvm-svn: 252951	2015-11-12 21:19:18 +00:00
Akira Hatanaka	61b81a563a	Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 252949	2015-11-12 20:59:43 +00:00
Evgeniy Stepanov	d8b86f7cdc	Move dbg.declare intrinsics when merging and replacing allocas. Place new and update dbg.declare calls immediately after the corresponding alloca. Current code in replaceDbgDeclareForAlloca puts the new dbg.declare at the end of the basic block. LLVM codegen has problems emitting debug info in a situation when dbg.declare appears after all uses of the variable. This usually kinda works for inlining and ASan (two users of this function) but not for SafeStack (see the pending change in http://reviews.llvm.org/D13178). llvm-svn: 248769	2015-09-29 00:30:19 +00:00
Chandler Carruth	7b560d40bd	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167	2015-09-09 17:55:00 +00:00
Sanjay Patel	278004be39	Variable names should start with an upper case letter; NFC llvm-svn: 244618	2015-08-11 16:05:43 +00:00
David Blaikie	a5d7de9f08	-Wdeprecated cleanup: Make CallGraph movable by default by using unique_ptr members rather than raw pointers. The only place that tries to return a CallGraph by value (CallGraphAnalysis::run) doesn't seem to be used right now, but it's a reasonable bit of cleanup anyway. llvm-svn: 244122	2015-08-05 20:55:50 +00:00
Sanjay Patel	924879ad2c	wrap OptSize and MinSize attributes for easier and consistent access (NFCI) Create wrapper methods in the Function class for the OptimizeForSize and MinSize attributes. We want to hide the logic of "or'ing" them together when optimizing just for size (-Os). Currently, we are not consistent about this and rely on a front-end to always set OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here that should be added as follow-on patches with regression tests. This patch is NFC-intended: it just replaces existing direct accesses of the attributes by the equivalent wrapper call. Differential Revision: http://reviews.llvm.org/D11734 llvm-svn: 243994	2015-08-04 15:49:57 +00:00
Yaron Keren	c66c06b899	Narrow Callee scope, suggestion from David Blaikie. llvm-svn: 242644	2015-07-19 15:48:07 +00:00
Yaron Keren	611f614ee1	De-duplicate CS.getCalledFunction() expression. Not sure if the optimizer will save the call as getCalledFunction() is not a trivial access function but the code is clearer this way. llvm-svn: 242641	2015-07-19 11:52:02 +00:00
Yaron Keren	6967cbb319	Remove whitespace from start of line, NFC. llvm-svn: 241268	2015-07-02 14:25:09 +00:00
Yaron Keren	62064d6d38	Rangify for loop in Inliner.cpp. NFC. llvm-svn: 240678	2015-06-25 19:28:24 +00:00
Yaron Keren	4c548f2dd9	Rangify for loops in Inliner::runOnSCC(), NFC. llvm-svn: 240215	2015-06-20 07:12:33 +00:00
Peter Collingbourne	82437bf7a5	Protection against stack-based memory corruption errors using SafeStack This patch adds the safe stack instrumentation pass to LLVM, which separates the program stack into a safe stack, which stores return addresses, register spills, and local variables that are statically verified to be accessed in a safe way, and the unsafe stack, which stores everything else. Such separation makes it much harder for an attacker to corrupt objects on the safe stack, including function pointers stored in spilled registers and return addresses. You can find more information about the safe stack, as well as other parts of or control-flow hijack protection technique in our OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf) and our project website (http://levee.epfl.ch). The overhead of our implementation of the safe stack is very close to zero (0.01% on the Phoronix benchmarks). This is lower than the overhead of stack cookies, which are supported by LLVM and are commonly used today, yet the security guarantees of the safe stack are strictly stronger than stack cookies. In some cases, the safe stack improves performance due to better cache locality. Our current implementation of the safe stack is stable and robust, we used it to recompile multiple projects on Linux including Chromium, and we also recompiled the entire FreeBSD user-space system and more than 100 packages. We ran unit tests on the FreeBSD system and many of the packages and observed no errors caused by the safe stack. The safe stack is also fully binary compatible with non-instrumented code and can be applied to parts of a program selectively. This patch is our implementation of the safe stack on top of LLVM. The patches make the following changes: - Add the safestack function attribute, similar to the ssp, sspstrong and sspreq attributes. - Add the SafeStack instrumentation pass that applies the safe stack to all functions that have the safestack attribute. This pass moves all unsafe local variables to the unsafe stack with a separate stack pointer, whereas all safe variables remain on the regular stack that is managed by LLVM as usual. - Invoke the pass as the last stage before code generation (at the same time the existing cookie-based stack protector pass is invoked). - Add unit tests for the safe stack. Original patch by Volodymyr Kuznetsov and others at the Dependable Systems Lab at EPFL; updates and upstreaming by myself. Differential Revision: http://reviews.llvm.org/D6094 llvm-svn: 239761	2015-06-15 21:07:11 +00:00
David Majnemer	ac256cfed2	[Inliner] Discard empty COMDAT groups COMDAT groups which have become rendered unused because of inline are discardable if we can prove that we've made the group empty. This fixes PR22285. llvm-svn: 236539	2015-05-05 20:14:22 +00:00
Benjamin Kramer	799003bf8c	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. llvm-svn: 232998	2015-03-23 19:32:43 +00:00
Sanjay Patel	f1b0db1545	remove function names from comments; NFC llvm-svn: 231801	2015-03-10 16:42:24 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Duncan P. N. Exon Smith	2c79ad974c	Transforms: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202	2015-02-14 01:11:29 +00:00
Chandler Carruth	b98f63dbdb	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
Chandler Carruth	62d4215baa	[PM] Move TargetLibraryInfo into the Analysis library. While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078	2015-01-15 02:16:27 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
David Majnemer	ac07703842	Inliner: Non-local functions in COMDATs shouldn't be dropped A function with discardable linkage cannot be discarded if its a member of a COMDAT group without considering all the other COMDAT members as well. This sort of thing is already handled by GlobalOpt/GlobalDCE. This fixes PR21206. llvm-svn: 219335	2014-10-08 19:32:32 +00:00
Hal Finkel	74c2f355d2	Add an Assumption-Tracking Pass This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334	2014-09-07 12:44:26 +00:00
Hal Finkel	0c083024f0	Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata This feeds AA through the IFI structure into the inliner so that AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which functions only access their arguments (instead of just hard-coding some knowledge of memory intrinsics). Most of the information is only available from BasicAA; this is important for preserving alias scoping information for target-specific intrinsics when doing the noalias parameter attribute to metadata conversion. llvm-svn: 216866	2014-09-01 09:01:39 +00:00
Rafael Espindola	3cf4af11d5	Add the missing hasLinkOnceODRLinkage predicate. llvm-svn: 214312	2014-07-30 15:57:51 +00:00
Diego Novillo	7f8af8bf91	Add support for missed and analysis optimization remarks. Summary: This adds two new diagnostics: -pass-remarks-missed and -pass-remarks-analysis. They take the same values as -pass-remarks but are intended to be triggered in different contexts. -pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed, which passes call when they tried to apply a transformation but couldn't. -pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis, which passes call when they want to inform the user about analysis results. The patch also: 1- Adds support in the inliner for the two new remarks and a test case. 2- Moves emitOptimizationRemark* functions to the llvm namespace. 3- Adds an LLVMContext argument instead of making them member functions of LLVMContext. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3682 llvm-svn: 209442	2014-05-22 14:19:46 +00:00
Manman Ren	3c44067a30	[inline cold threshold] Command line argument for inline threshold will override the default cold threshold. When we use command line argument to set the inline threshold, the default cold threshold will not be used. This is in line with how we use OptSizeThreshold. When we want a higher threshold for all functions, we do not have to set both inline threshold and cold threshold. llvm-svn: 207245	2014-04-25 17:34:55 +00:00
Craig Topper	f40110f4d8	[C++] Use 'nullptr'. Transforms edition. llvm-svn: 207196	2014-04-25 05:29:35 +00:00
Chandler Carruth	964daaaf19	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Transforms/... edition. This one is tricky for two reasons. We again have a couple of passes that define something else before the includes as well. I've sunk their name macros with the DEBUG_TYPE. Also, InstCombine contains headers that need DEBUG_TYPE, so now those headers #define and #undef DEBUG_TYPE around their code, leaving them well formed modular headers. Fixing these headers was a large motivation for all of these changes, as "leaky" macros of this form are hard on the modules implementation. llvm-svn: 206844	2014-04-22 02:55:47 +00:00
NAKAMURA Takumi	cd1fc4bc1b	Inliner::OptimizationRemark: Fix crash in clang/test/Frontend/optimization-remark.c on some hosts, including --vg. DebugLoc in Callsite would not live after Inliner. It should be copied before Inliner. llvm-svn: 206459	2014-04-17 12:22:14 +00:00
Diego Novillo	a9298b2297	Add support for optimization reports. Summary: This patch adds backend support for -Rpass=, which indicates the name of the optimization pass that should emit remarks stating when it made a transformation to the code. Pass names are taken from their DEBUG_NAME definitions. When emitting an optimization report diagnostic, the lack of debug information causes the diagnostic to use "<unknown>:0:0" as the location string. This is the back end counterpart for http://llvm-reviews.chandlerc.com/D3226 Reviewers: qcolombet CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D3227 llvm-svn: 205774	2014-04-08 16:42:34 +00:00
Chandler Carruth	cdf4788401	[C++11] Add range based accessors for the Use-Def chain of a Value. This requires a number of steps. 1) Move value_use_iterator into the Value class as an implementation detail 2) Change it to actually be a Use iterator rather than a User iterator. 3) Add an adaptor which is a User iterator that always looks through the Use to the User. 4) Wrap these in Value::use_iterator and Value::user_iterator typedefs. 5) Add the range adaptors as Value::uses() and Value::users(). 6) Update all of the callers to correctly distinguish between whether they wanted a use_iterator (and to explicitly dig out the User when needed), or a user_iterator which makes the Use itself totally opaque. Because #6 requires churning essentially everything that walked the Use-Def chains, I went ahead and added all of the range adaptors and switched them to range-based loops where appropriate. Also because the renaming requires at least churning every line of code, it didn't make any sense to split these up into multiple commits -- all of which would touch all of the same lies of code. The result is still not quite optimal. The Value::use_iterator is a nice regular iterator, but Value::user_iterator is an iterator over Users rather than over the User objects themselves. As a consequence, it fits a bit awkwardly into the range-based world and it has the weird extra-dereferencing 'operator->' that so many of our iterators have. I think this could be fixed by providing something which transforms a range of T&s into a range of Ts, but that can be separated into another patch, and it isn't yet 100% clear whether this is the right move. However, this change gets us most of the benefit and cleans up a substantial amount of code around Use and User. =] llvm-svn: 203364	2014-03-09 03:16:01 +00:00
Chandler Carruth	219b89b987	[Modules] Move CallSite into the IR library where it belogs. It is abstracting between a CallInst and an InvokeInst, both of which are IR concepts. llvm-svn: 202816	2014-03-04 11:01:28 +00:00
Rafael Espindola	935125126c	Make DataLayout a plain object, not a pass. Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168	2014-02-25 17:30:31 +00:00
Rafael Espindola	612886fc8c	Rename a few more DataLayout variables. llvm-svn: 201833	2014-02-21 01:53:35 +00:00
Manman Ren	d461244972	Set default of inlinecold-threshold to 225. 225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898	2014-02-06 01:59:22 +00:00
Manman Ren	e8781b1a36	Inliner uses a smaller inline threshold for callees with cold attribute. Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886	2014-02-05 22:53:44 +00:00
Chandler Carruth	6378cf539f	[PM] Split the CallGraph out from the ModulePass which creates the CallGraph. This makes the CallGraph a totally generic analysis object that is the container for the graph data structure and the primary interface for querying and manipulating it. The pass logic is separated into its own class. For compatibility reasons, the pass provides wrapper methods for most of the methods on CallGraph -- they all just forward. This will allow the new pass manager infrastructure to provide its own analysis pass that constructs the same CallGraph object and makes it available. The idea is that in the new pass manager, the analysis pass's 'run' method returns a concrete analysis 'result'. Here, that result is a 'CallGraph'. The 'run' method will typically do only minimal work, deferring much of the work into the implementation of the result object in order to be lazy about computing things, but when (like DomTree) there is some up-front computation, the analysis does it prior to handing the result back to the querying pass. I know some of this is fairly ugly. I'm happy to change it around if folks can suggest a cleaner interim state, but there is going to be some amount of unavoidable ugliness during the transition period. The good thing is that this is very limited and will naturally go away when the old pass infrastructure goes away. It won't hang around to bother us later. Next up is the initial new-PM-style call graph analysis. =] llvm-svn: 195722	2013-11-26 04:19:30 +00:00
Hal Finkel	ec7cd26968	Fix comparisons of alloca alignment in inliner merging Duncan pointed out a mistake in my fix in r186425 when only one of the allocas being compared had the target-default alignment. This is essentially his suggested solution. Thanks! llvm-svn: 186510	2013-07-17 14:32:41 +00:00
Hal Finkel	9caa8f7ba7	When the inliner merges allocas, it must keep the larger alignment For safety, the inliner cannot decrease the allignment on an alloca when merging it with another. I've included two variants of the test case for this: one with DataLayout available, and one without. When DataLayout is not available, if only one of the allocas uses the default alignment (getAlignment() == 0), then they cannot be safely merged. llvm-svn: 186425	2013-07-16 17:10:55 +00:00
Bill Wendling	d154e283f2	Add the IR attribute 'sspstrong'. SSPStrong applies a heuristic to insert stack protectors in these situations: * A Protector is required for functions which contain an array, regardless of type or length. * A Protector is required for functions which contain a structure/union which contains an array, regardless of type or length. Note, there is no limit to the depth of nesting. * A protector is required when the address of a local variable (i.e., stack based variable) is exposed. (E.g., such as through a local whose address is taken as part of the RHS of an assignment or a local whose address is taken as part of a function argument.) This patch implements the SSPString attribute to be equivalent to SSPRequired. This will change in a subsequent patch. llvm-svn: 173230	2013-01-23 06:41:41 +00:00
Chandler Carruth	9fb823bbd4	Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366	2013-01-02 11:36:10 +00:00
Bill Wendling	698e84fc4f	Remove the Function::getFnAttributes method in favor of using the AttributeSet directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253	2012-12-30 10:32:01 +00:00
Chandler Carruth	e40e60eed5	Make this parameter be named consistently with most other getAnalysisUsage implementations. llvm-svn: 171157	2012-12-27 11:17:15 +00:00
Bill Wendling	3d7b0b8ac7	Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future. llvm-svn: 170502	2012-12-19 07:18:57 +00:00
Quentin Colombet	c0dba2035a	Take into account minimize size attribute in the inliner. Better controls the inlining of functions when the caller function has MinSize attribute. Basically, when the caller function has this attribute, we do not "force" the inlining of callee functions carrying the InlineHint attribute (i.e., functions defined with inline keyword) llvm-svn: 170065	2012-12-13 01:05:25 +00:00
Chandler Carruth	ed0881b2a6	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Bill Wendling	f319e9905f	Have 'addFnAttr' take the attribute enum value. Then have it build the attribute object and add it appropriately. No functionality change. llvm-svn: 165595	2012-10-10 03:12:49 +00:00
Bill Wendling	c9b22d735a	Create enums for the different attributes. We use the enums to query whether an Attributes object has that attribute. The opaque layer is responsible for knowing where that specific attribute is stored. llvm-svn: 165488	2012-10-09 07:45:08 +00:00
Micah Villmow	cdfe20b97f	Move TargetData to DataLayout. llvm-svn: 165402	2012-10-08 16:38:25 +00:00
Bill Wendling	863bab689a	Remove the `hasFnAttr' method from Function. The hasFnAttr method has been replaced by querying the Attributes explicitly. No intended functionality change. llvm-svn: 164725	2012-09-26 21:48:26 +00:00
Nadav Rotem	97d44349c9	Fix an 80 char line limit. llvm-svn: 163808	2012-09-13 16:27:32 +00:00
Benjamin Kramer	8bcc971174	Make MemoryBuiltins aware of TargetLibraryInfo. This disables malloc-specific optimization when -fno-builtin (or -ffreestanding) is specified. This has been a problem for a long time but became more severe with the recent memory builtin improvements. Since the memory builtin functions are used everywhere, this required passing TLI in many places. This means that functions that now have an optional TLI argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead mallocs anymore if the TLI argument is missing. I've updated most passes to do the right thing. Fixes PR13694 and probably others. llvm-svn: 162841	2012-08-29 15:32:21 +00:00
Benjamin Kramer	bde9176663	Fix typos found by http://github.com/lyda/misspell-check llvm-svn: 157885	2012-06-02 10:20:22 +00:00
Patrik Hägglund	8a1e316c15	Fix the inliner so that the optsize function attribute don't alter the inline threshold if the global inline threshold is lower (as for -Oz). Reviewed by Chandler Carruth and Bill Wendling. llvm-svn: 157323	2012-05-23 13:42:57 +00:00
Chandler Carruth	7ae90d4d2d	Add two statistics to help track how we are computing the inline cost. Yea, 'NumCallerCallersAnalyzed' isn't a great name, suggestions welcome. llvm-svn: 154492	2012-04-11 10:15:10 +00:00
Chandler Carruth	45ae88f5fc	Belatedly address some code review from Chris. As a side note, I really dislike array_pod_sort... Do we really still care about any STL implementations that get this so wrong? Does libc++? llvm-svn: 153834	2012-04-01 10:41:24 +00:00
Chandler Carruth	edd2826f3e	Remove a bunch of empty, dead, and no-op methods from all of these interfaces. These methods were used in the old inline cost system where there was a persistent cache that had to be updated, invalidated, and cleared. We're now doing more direct computations that don't require this intricate dance. Even if we resume some level of caching, it would almost certainly have a simpler and more narrow interface than this. llvm-svn: 153813	2012-03-31 12:48:08 +00:00
Chandler Carruth	0539c071ea	Initial commit for the rewrite of the inline cost analysis to operate on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the front of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was extremely flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code path, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is not yet mirrored is to allow inlining of functions with a dynamic alloca if the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. llvm-svn: 153812	2012-03-31 12:42:41 +00:00
Chandler Carruth	b9e35fbc1e	Make a seemingly tiny change to the inliner and fix the generated code size bloat. Unfortunately, I expect this to disable the majority of the benefit from r152737. I'm hopeful at least that it will fix PR12345. To explain this requires... quite a bit of backstory I'm afraid. TL;DR: The change in r152737 actually did The Wrong Thing for linkonce-odr functions. This change makes it do the right thing. The benefits we saw were simple luck, not any actual strategy. Benchmark numbers after a mini-blog-post so that I've written down my thoughts on why all of this works and doesn't work... To understand what's going on here, you have to understand how the "bottom-up" inliner actually works. There are two fundamental modes to the inliner: 1) Standard fixed-cost bottom-up inlining. This is the mode we usually think about. It walks from the bottom of the CFG up to the top, looking at callsites, taking information about the callsite and the called function and computing th expected cost of inlining into that callsite. If the cost is under a fixed threshold, it inlines. It's a touch more complicated than that due to all the bonuses, weights, etc. Inlining the last callsite to an internal function gets higher weighth, etc. But essentially, this is the mode of operation. 2) Deferred bottom-up inlining (a term I just made up). This is the interesting mode for this patch an r152737. Initially, this works just like mode #1, but once we have the cost of inlining into the callsite, we don't just compare it with a fixed threshold. First, we check something else. Let's give some names to the entities at this point, or we'll end up hopelessly confused. We're considering inlining a function 'A' into its callsite within a function 'B'. We want to check whether 'B' has any callers, and whether it might be inlined into those callers. If so, we also check whether inlining 'A' into 'B' would block any of the opportunities for inlining 'B' into its callers. We take the sum of the costs of inlining 'B' into its callers where that inlining would be blocked by inlining 'A' into 'B', and if that cost is less than the cost of inlining 'A' into 'B', then we skip inlining 'A' into 'B'. Now, in order for #2 to make sense, we have to have some confidence that we will actually have the opportunity to inline 'B' into its callers when cheaper, and that we'll be able to revisit the decision and inline 'A' into 'B' if that ever becomes the correct tradeoff. This often isn't true for external functions -- we can see very few of their callers, and we won't be able to re-consider inlining 'A' into 'B' if 'B' is external when we finally see more callers of 'B'. There are two cases where we believe this to be true for C/C++ code: functions local to a translation unit, and functions with an inline definition in every translation unit which uses them. These are represented as internal linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for linkonce-odr in r152737. Unfortunately, when I did that, I also introduced a subtle bug. There was an implicit assumption that the last caller of the function within the TU was the last caller of the function in the program. We want to bonus the last caller of the function in the program by a huge amount for inlining because inlining that callsite has very little cost. Unfortunately, the last caller in the TU of a linkonce-odr function is not the last caller in the program, and so we don't want to apply this bonus. If we do, we can apply it to one callsite per-TU. Because of the way deferred inlining works, when it sees this bonus applied to one callsite in the TU for 'B', it decides that inlining 'B' is of the utmost importance just so we can get that final bonus. It then proceeds to essentially force deferred inlining regardless of the actual cost tradeoff. The result? PR12345: code bloat, code bloat, code bloat. Another result is getting damn lucky on a few benchmarks, and the over-inlining exposing critically important optimizations. I would very much like a list of benchmarks that regress after this change goes in, with bitcode before and after. This will help me greatly understand what opportunities the current cost analysis is missing. Initial benchmark numbers look very good. WebKit files that exhibited the worst of PR12345 went from growing to shrinking compared to Clang with r152737 reverted. - Bootstrapped Clang is 3% smaller with this change. - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4% faster with this change. Please let me know about any other performance impact you see. Thanks to Nico for reporting and urging me to actually fix, Richard Smith, Duncan Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues today. llvm-svn: 153506	2012-03-27 10:48:28 +00:00
Chandler Carruth	2121199241	Move the instruction simplification of callsite arguments in the inliner to instead rely on much more generic and powerful instruction simplification in the function cloner (and thus inliner). This teaches the pruning function cloner to use instsimplify rather than just the constant folder to fold values during cloning. This can simplify a large number of things that constant folding alone cannot begin to touch. For example, it will realize that 'or' and 'and' instructions with certain constant operands actually become constants regardless of what their other operand is. It also can thread back through the caller to perform simplifications that are only possible by looking up a few levels. In particular, GEPs and pointer testing tend to fold much more heavily with this change. This should (in some cases) have a positive impact on compile times with optimizations on because the inliner itself will simply avoid cloning a great deal of code. It already attempted to prune proven-dead code, but now it will be use the stronger simplifications to prove more code dead. llvm-svn: 153403	2012-03-25 04:03:40 +00:00
Chandler Carruth	d7a5f2adb0	Start removing the use of an ad-hoc 'never inline' set and instead directly query the function information which this set was representing. This simplifies the interface of the inline cost analysis, and makes the always-inline pass significantly more efficient. Previously, always-inline would first make a single set of every function in the module except those marked with the always-inline attribute. It would then query this set at every call site to see if the function was a member of the set, and if so, refuse to inline it. This is quite wasteful. Instead, simply check the function attribute directly when looking at the callsite. The normal inliner also had similar redundancy. It added every function in the module with the noinline attribute to its set to ignore, even though inside the cost analysis function we already tested the noinline attribute and produced the same result. The only tricky part of removing this is that we have to be able to correctly remove only the functions inlined by the always-inline pass when finalizing, which requires a bit of a hack. Still, much less of a hack than the set of all non-always-inline functions was. While I was touching this function, I switched a heavy-weight set to a vector with sort+unique. The algorithm already had a two-phase insert and removal pattern, we were just needlessly paying the uniquing cost on every insert. This probably speeds up some compiles by a small amount (-O0 compiles with lots of always-inline, so potentially heavy libc++ users), but I've not tried to measure it. I believe there is no functional change here, but yell if you spot one. None are intended. Finally, the direction this is going in is to greatly simplify the inline cost query interface so that we can replace its implementation with a much more clever one. Along the way, all the APIs get simplified, so it seems incrementally good. llvm-svn: 152903	2012-03-16 06:10:13 +00:00
Chandler Carruth	30b8416d2c	Change where we enable the heuristic that delays inlining into functions which are small enough to themselves be inlined. Delaying in this manner can be harmful if the function is inelligible for inlining in some (or many) contexts as it pessimizes the code of the function itself in the event that inlining does not eventually happen. Previously the check was written to only do this delaying of inlining for static functions in the hope that they could be entirely deleted and in the knowledge that all callers of static functions will have the opportunity to inline if it is in fact profitable. However, with C++ we get two other important sources of functions where the definition is always available for inlining: inline functions and templated functions. This patch generalizes the inliner to allow linkonce-ODR (the linkage such C++ routines receive) to also qualify for this delay-based inlining. Benchmarking across a range of large real-world applications shows roughly 2% size increase across the board, but an average speedup of about 0.5%. Some benhcmarks improved over 2%, and the 'clang' binary itself (when bootstrapped with this feature) shows a 1% -O0 performance improvement when run over all Sema, Lex, and Parse source code smashed into a single file. A clean re-build of Clang+LLVM with a bootstrapped Clang shows approximately 2% improvement, but that measurement is often noisy. llvm-svn: 152737	2012-03-14 20:16:41 +00:00
Chandler Carruth	595fda8466	When inlining a function and adding its inner call sites to the candidate set for subsequent inlining, try to simplify the arguments to the inner call site now that inlining has been performed. The goal here is to propagate and fold constants through deeply nested call chains. Without doing this, we loose the inliner bonus that should be applied because the arguments don't match the exact pattern the cost estimator uses. Reviewed on IRC by Benjamin Kramer. llvm-svn: 152556	2012-03-12 11:19:33 +00:00
Chad Rosier	07d37bc1ed	Add support for disabling llvm.lifetime intrinsics in the AlwaysInliner. These are optimization hints, but at -O0 we're not optimizing. This becomes a problem when the alwaysinline attribute is abused. rdar://10921594 llvm-svn: 151429	2012-02-25 02:56:01 +00:00
Eli Friedman	1923a330e6	Refactor code from inlining and globalopt that checks whether a function definition is unused, and enhance it so it can tell that functions which are only used by a blockaddress are in fact dead. This probably doesn't happen much on most code, but the Linux kernel's _THIS_IP_ can trigger this issue with blockaddress. (GlobalDCE can also handle the given tescase, but we only run that at -O3.) Found while looking at PR11180. llvm-svn: 142572	2011-10-20 05:23:42 +00:00
Chris Lattner	229907cd11	land David Blaikie's patch to de-constify Type, with a few tweaks. llvm-svn: 135375	2011-07-18 04:54:35 +00:00
Jay Foad	1a180156b6	Remove unused STL header includes. llvm-svn: 130068	2011-04-23 19:53:52 +00:00
Dale Johannesen	a71d2cc88d	Improve the accuracy of the inlining heuristic looking for the case where a static caller is itself inlined everywhere else, and thus may go away if it doesn't get too big due to inlining other things into it. If there are references to the caller other than calls, it will not be removed; account for this. This results in same-day completion of the case in PR8853. llvm-svn: 122821	2011-01-04 19:01:54 +00:00
Chris Lattner	fb212de06d	Fix PR8735, a really terrible problem in the inliner's "alloca merging" optimization. Consider: static void foo() { A = alloca ... } static void bar() { B = alloca ... call foo(); } void main() { bar() } The inliner proceeds bottom up, but lets pretend it decides not to inline foo into bar. When it gets to main, it inlines bar into main(), and says "hey, I just inlined an alloca "B" into main, lets remember that. Then it keeps going and finds that it now contains a call to foo. It decides to inline foo into main, and says "hey, foo has an alloca A, and I have an alloca B from another inlined call site, lets reuse it". The problem with this of course, is that the lifetime of A and B are nested, not disjoint. Unfortunately I can't create a reasonable testcase for this: the one in the PR is both huge and extremely sensitive, because you minor tweaks end up causing foo to get inlined into bar too early. We already have tests for the basic alloca merging optimization and this does not break them. llvm-svn: 120995	2010-12-06 07:52:42 +00:00
Chris Lattner	5b6a865f2e	improve -debug output and comments a little. llvm-svn: 120993	2010-12-06 07:38:40 +00:00
Jakob Stoklund Olesen	31a7eb40c1	Let the -inline-threshold command line argument take precedence over the threshold given to createFunctionInliningPass(). Both opt -O3 and clang would silently ignore the -inline-threshold option. llvm-svn: 118117	2010-11-02 23:40:26 +00:00
Owen Anderson	a7aed18624	Reapply r110396, with fixes to appease the Linux buildbot gods. llvm-svn: 110460	2010-08-06 18:33:48 +00:00
Owen Anderson	bda59bd247	Revert r110396 to fix buildbots. llvm-svn: 110410	2010-08-06 00:23:35 +00:00
Owen Anderson	755aceb5d0	Don't use PassInfo* as a type identifier for passes. Instead, use the address of the static ID member as the sole unique type identifier. Clean up APIs related to this change. llvm-svn: 110396	2010-08-05 23:42:04 +00:00
Gabor Greif	62f0aac99d	simplify by using CallSite constructors; virtually eliminates CallSite::get from the tree llvm-svn: 109687	2010-07-28 22:50:26 +00:00
Eric Christopher	ea282034b6	Grammar. llvm-svn: 108252	2010-07-13 18:27:13 +00:00
Benjamin Kramer	5ac57e3440	Avoid swap when a copy suffices. llvm-svn: 105220	2010-05-31 12:50:41 +00:00
Chris Lattner	b49a622fe9	revert r102831. We already delete dead readonly calls in other places, killing a valid transformation is not the right answer. llvm-svn: 102850	2010-05-01 17:19:38 +00:00
Owen Anderson	550986ea90	Disable the call-deletion transformation introduced in r86975. Without halting analysis, it is illegal to delete a call to a read-only function. The correct solution is almost certainly to add a "must halt" attribute and only allow deletions in its presence. XFAIL the relevant testcase for now. llvm-svn: 102831	2010-05-01 08:34:28 +00:00
Chris Lattner	c2432b9d44	rename InlineInfo.DevirtualizedCalls -> InlinedCalls to reflect that it includes all inlined calls now, not just devirtualized ones. llvm-svn: 102824	2010-05-01 01:26:13 +00:00

1 2 3 4 5 ...

361 Commits