llvm-project

Commit Graph

Author	SHA1	Message	Date
Haicheng Wu	0812c5bea3	[InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes. Currently, the inline cost model will bail once the inline cost exceeds the inline threshold in order to avoid unnecessary compile-time. However, when debugging it is useful to compute the full cost, so this command line option is added to override the default behavior. I took over this work from Chad Rosier (mcrosier@codeaurora.org). Differential Revision: https://reviews.llvm.org/D35850 llvm-svn: 311371	2017-08-21 20:00:09 +00:00
Chad Rosier	4eb18742ca	[InlineCost] Add more debug during inline cost computation. llvm-svn: 311370	2017-08-21 19:56:46 +00:00
Chandler Carruth	bba762a13f	[InlineCost] Refactor the checks for different analyses to be a bit more localized to the code that uses those analyses. Technically, this can change behavior as we no longer require the existence of the ProfileSummaryInfo analysis to use local profile information via BFI. We didn't actually require the PSI to have an interesting profile though, so this only really impacts the behavior in non-default pass pipelines. IMO, this makes it substantially less surprising how everything works -- before an analysis that wasn't actually used had to exist to trigger any profile aware inlining. I think the new organization makes it more obvious where various checks for profile signals happen. Differential Revision: https://reviews.llvm.org/D36710 llvm-svn: 310888	2017-08-14 21:25:00 +00:00
Easwaran Raman	ff77cc750c	[Inliner] Fix a typo in option description. NFC. llvm-svn: 310073	2017-08-04 17:15:17 +00:00
Easwaran Raman	974d4eea93	[Inliner] Increase threshold for hot callsites without PGO. Summary: This increases the inlining threshold for hot callsites. Hotness is defined in terms of block frequency of the callsite relative to the caller's entry block's frequency. Since this requires BFI in the inliner, this only affects the new PM pipeline. This is enabled by default at -O3. This improves the performance of some internal benchmarks. Notably, an internal benchmark for Gipfeli compression (https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006 improves by ~2.5%. I am running more experiments and will update the thread if other benchmarks show improvement/regression. In terms of text size, LLVM test-suite shows an 1.22% text size increase. Diving into the results, 13 of the benchmarks in the test-suite increases by > 10%. Most of these are small, but Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase) have >250K text size. On a large application, the text size increases by 2% Reviewers: chandlerc, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36199 llvm-svn: 309994	2017-08-03 22:23:33 +00:00
Chad Rosier	5ce28f4f92	[InlineCost] Remove redundant call. NFC. llvm-svn: 309819	2017-08-02 14:50:27 +00:00
Chad Rosier	2e1c050e52	[InlineCost] Simplify more 'and' and 'or' operations. Differential Revision: https://reviews.llvm.org/D35856 llvm-svn: 309817	2017-08-02 14:40:42 +00:00
Easwaran Raman	51b809bf2f	[Inliner] Do not apply any bonus for cold callsites. Summary: Inlining threshold is increased by application of bonuses when the callee has a single reachable basic block or is rich in vector instructions. Similarly, inlining cost is reduced by applying a large bonus when the last call to a static function is considered for inlining. This patch disables the application of these bonuses when the callsite or the callee is cold. The intention here is to prevent a large cold callsite from being inlined to a non-cold caller that could prevent the caller from being inlined. This is especially important when the cold callsite is a last call to a static since the associated bonus is very high. Reviewers: chandlerc, davidxl Subscribers: danielcdh, llvm-commits Differential Revision: https://reviews.llvm.org/D35823 llvm-svn: 309441	2017-07-28 21:47:36 +00:00
Evgeny Astigeevich	61c1bd5abc	[InlineCost, NFC] Change CallAnalyzer::isGEPFree to use TTI::getUserCost instead of TTI::getGEPCost Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free. TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies (see https://reviews.llvm.org/D31186 for example). There is TTI::getUserCost which can calculate the cost more accurately by taking dependencies into account. Differential Revision: https://reviews.llvm.org/D33685 llvm-svn: 309268	2017-07-27 12:49:27 +00:00
Eric Christopher	7ad02eee8a	Fix a typo. llvm-svn: 306599	2017-06-28 21:10:31 +00:00
Easwaran Raman	c5fa6358ba	[NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case Differential Revision: https://reviews.llvm.org/D34312 llvm-svn: 306484	2017-06-27 23:11:18 +00:00
Jun Bum Lim	506cfb7ab7	[InlineCost] Do not take INT_MAX when Cost is negative Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost. Reviewers: hans, echristo, dmgreen Reviewed By: dmgreen Subscribers: mcrosier, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D34436 llvm-svn: 306118	2017-06-23 16:12:37 +00:00
Andrew Kaylor	647025f9e1	[InstSimplify] Don't constant fold or DCE calls that are marked nobuiltin Differential Revision: https://reviews.llvm.org/D33737 llvm-svn: 305132	2017-06-09 23:18:11 +00:00
Jun Bum Lim	2960d41e68	[InlineCost] Enable the new switch cost heuristic Summary: This is to enable the new switch inline cost heuristic (r301649) by removing the old heuristic as well as the flag itself. In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and 8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64. No significant code size / performance regression was found in O3/O2/Os. No significant complain was reported from the llvm-dev thread. Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo Reviewed By: echristo Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D32653 llvm-svn: 304594	2017-06-02 20:42:54 +00:00
Easwaran Raman	3cd1479c3f	[Inliner] Do not mix callsite and callee hotness based updates. Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210	2017-05-16 21:18:09 +00:00
Easwaran Raman	c103ef89ee	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Xinliang David Li	351d9b01b9	Refactor callsite cost computation into a helper function /NFC Makes code more readable. The function will also be used by the partial inlining's cost analysis. llvm-svn: 301899	2017-05-02 05:38:41 +00:00
Jun Bum Lim	919f9e8d65	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Easwaran Raman	e1bd7cceca	Remove a repeated comment line. NFC. llvm-svn: 301059	2017-04-21 23:12:16 +00:00
Eric Christopher	908ed7f20c	Tidy checking for the soft float attribute. llvm-svn: 300394	2017-04-15 06:14:52 +00:00
Eric Christopher	85be8ca881	Cache the DataLayout rather than looking it up frequently. llvm-svn: 300393	2017-04-15 06:14:50 +00:00
Reid Kleckner	fb502d2f5e	[IR] Make paramHasAttr to use arg indices instead of attr indices This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern. Previously we were testing return value attributes with index 0, so I introduced hasReturnAttr() for that use case. llvm-svn: 300367	2017-04-14 20:19:02 +00:00
Chandler Carruth	927d8e610a	[IR] Redesign the case iterator in SwitchInst to actually be an iterator and to expose a handle to represent the actual case rather than having the iterator return a reference to itself. All of this allows the iterator to be used with common STL facilities, standard algorithms, etc. Doing this exposed some missing facilities in the iterator facade that I've fixed and required some work to the actual iterator to fully support the necessary API. Differential Revision: https://reviews.llvm.org/D31548 llvm-svn: 300032	2017-04-12 07:27:28 +00:00
Vassil Vassilev	e1f12fadc0	Remove unused functions. Remove static qualifier from functions in header files. NFC. llvm-svn: 299947	2017-04-11 14:55:32 +00:00
Dehao Chen	9907e9d860	Do not inline hot callsites for samplepgo in thinlto compile phase. Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase. Reviewers: tejohnson, eraman Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31201 llvm-svn: 298428	2017-03-21 19:55:36 +00:00
Easwaran Raman	a8b9cdc9e2	[InlineCost] Move the code in isGEPOffsetConstant to a lambda. Differential revision: https://reviews.llvm.org/D30112 llvm-svn: 296208	2017-02-25 00:10:22 +00:00
Easwaran Raman	617f63640b	Refactor instruction simplification code in visitors. NFC. Several visitors check if operands to the instruction are constants, either as it is or after looking up SimplifiedValues, check if the result is a constant and update the SimplifiedValues map. This refactoring splits it into a common function that does the checking of whether the operands are constants and updating of the SimplifiedValues table, and an instruction specific part that is implemented by each instruction visitor as a lambda and passed to the common function. Differential revision: https://reviews.llvm.org/D30104 llvm-svn: 295552	2017-02-18 17:22:52 +00:00
Easwaran Raman	12585b0148	Improve PGO support for the new inliner This adds the following to the new PM based inliner in PGO mode: * Use block frequency analysis to derive callsite's profile count and use that to adjust thresholds of hot and cold callsites. * Incrementally update the BFI of the caller after a callee gets inlined into it. This incremental update is only within an invocation of the run method - BFI is not preserved across calls to run. Update the function entry count of the callee after inlining it into a caller. * I've tuned the thresholds for the hot and cold callsites using a hacked up version of the old inliner that explicitly computes BFI on a set of internal benchmarks and spec. Once the new PM based pipeline stabilizes (IIRC Chandler mentioned there are known issues) I'll benchmark this again and adjust the thresholds if required. Inliner PGO support. Differential revision: https://reviews.llvm.org/D28331 llvm-svn: 292666	2017-01-20 22:44:04 +00:00
Haicheng Wu	201b191b82	Recommit "[InlineCost] Use TTI to check if GEP is free." #3 This is the third attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292633	2017-01-20 18:51:22 +00:00
Haicheng Wu	71ef5bc0ff	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2" This reverts commit r292616 because the test case still has problem. llvm-svn: 292618	2017-01-20 16:52:22 +00:00
Haicheng Wu	8f34ae2aae	Recommit "[InlineCost] Use TTI to check if GEP is free." #2 This is the second attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292616	2017-01-20 16:36:34 +00:00
Haicheng Wu	8f2aca388b	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free."" This reverts commit r292570. The test still has problem. llvm-svn: 292572	2017-01-20 03:40:41 +00:00
Haicheng Wu	1af1f071ea	Recommit "[InlineCost] Use TTI to check if GEP is free." This recommits r292526 which is reverted in r292529 after fixing the test case. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292570	2017-01-20 03:09:11 +00:00
Haicheng Wu	e036df4723	Revert "[InlineCost] Use TTI to check if GEP is free." This reverts commit r292526. The test case has problem. llvm-svn: 292529	2017-01-19 22:51:03 +00:00
Haicheng Wu	da556345dc	[InlineCost] Use TTI to check if GEP is free. Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. Differential Revision: https://reviews.llvm.org/D28693 llvm-svn: 292526	2017-01-19 22:28:34 +00:00
Easwaran Raman	e08b139d7d	Refactor inline threshold update code. Functional change: Previously, if a callee is cold, we used ColdThreshold if it minimizes the existing threshold. This was irrespective of whether we were optimizing for minsize (-Oz) or not. But -Oz uses very low threshold to begin with and the inlining with -Oz is expected to be tuned for lowering code size, so there is no good reason to set an even lower threshold for cold callees. We now lower the threshold for cold callees only when -Oz is not used. For default values of -inlinethreshold and -inlinecold-threshold, this change has no effect and this simplifies the code. NFC changes: Group all threshold updates that are guarded by !Caller->optForMinSize() and within that group threshold updates that require profile summary info. Differential revision: https://reviews.llvm.org/D28369 llvm-svn: 291487	2017-01-09 21:56:26 +00:00
Chandler Carruth	1d96311447	[PM] Provide an initial, minimal port of the inliner to the new pass manager. This doesn't implement every feature of the existing inliner, but tries to implement the most important ones for building a functional optimization pipeline and beginning to sort out bugs, regressions, and other problems. Notable, but intentional omissions: - No alloca merging support. Why? Because it isn't clear we want to do this at all. Active discussion and investigation is going on to remove it, so for simplicity I omitted it. - No support for trying to iterate on "internally" devirtualized calls. Why? Because it adds what I suspect is inappropriate coupling for little or no benefit. We will have an outer iteration system that tracks devirtualization including that from function passes and iterates already. We should improve that rather than approximate it here. - Optimization remarks. Why? Purely to make the patch smaller, no other reason at all. The last one I'll probably work on almost immediately. But I wanted to skip it in the initial patch to try to focus the change as much as possible as there is already a lot of code moving around and both of these could be skipped without really disrupting the core logic. A summary of the different things happening here: 1) Adding the usual new PM class and rigging. 2) Fixing minor underlying assumptions in the inline cost analysis or inline logic that don't generally hold in the new PM world. 3) Adding the core pass logic which is in essence a loop over the calls in the nodes in the call graph. This is a bit duplicated from the old inliner, but only a handful of lines could realistically be shared. (I tried at first, and it really didn't help anything.) All told, this is only about 100 lines of code, and most of that is the mechanics of wiring up analyses from the new PM world. 4) Updating the LazyCallGraph (in the new PM) based on the newly inlined calls and references. This is very minimal because we cannot form cycles. 5) When inlining removes the last use of a function, eagerly nuking the body of the function so that any "one use remaining" inline cost heuristics are immediately refined, and queuing these functions to be completely deleted once inlining is complete and the call graph updated to reflect that they have become dead. 6) After all the inlining for a particular function, updating the LazyCallGraph and the CGSCC pass manager to reflect the function-local simplifications that are done immediately and internally by the inline utilties. These are the exact same fundamental set of CG updates done by arbitrary function passes. 7) Adding a bunch of test cases to specifically target CGSCC and other subtle aspects in the new PM world. Many thanks to the careful review from Easwaran and Sanjoy and others! Differential Revision: https://reviews.llvm.org/D24226 llvm-svn: 290161	2016-12-20 03:15:32 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Craig Topper	107b187d2a	[Analysis] Fix typo in comment. NFC llvm-svn: 289171	2016-12-09 02:18:04 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
James Molloy	6df8f27c95	[InlineCost] Remove skew when calculating call costs When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814	2016-11-14 11:14:41 +00:00
Dehao Chen	84287abf43	Rename isHotFunction/isColdFunction to isFunctionEntryHot/isFunctionEntryCold. (NFC) This is in preparation for https://reviews.llvm.org/D25048 llvm-svn: 283805	2016-10-10 21:47:28 +00:00
Piotr Padlewski	f3d122cd02	NFC fix doxygen comments llvm-svn: 282950	2016-09-30 21:05:49 +00:00
Easwaran Raman	7060af9d22	Fix a thinko in r278189. llvm-svn: 280008	2016-08-29 20:45:51 +00:00
Easwaran Raman	0d58fcac99	Make more fields of InlineParams Optional. Differential revision: https://reviews.llvm.org/D23386 llvm-svn: 278312	2016-08-11 03:58:05 +00:00
Piotr Padlewski	d89875ca39	Changed sign of LastCallToStaticBouns Summary: I think it is much better this way. When I firstly saw line: Cost += InlineConstants::LastCallToStaticBonus; I though that this is a bug, because everywhere where the cost is being reduced it is usuing -=. Reviewers: eraman, tejohnson, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23222 llvm-svn: 278290	2016-08-10 21:15:22 +00:00
Easwaran Raman	1c57cc2b68	Do not directly use inline threshold cl options in cost analysis. This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use. Differential revision: https://reviews.llvm.org/D22120 llvm-svn: 278189	2016-08-10 00:48:04 +00:00
Dehao Chen	e1c7c57d11	Remove cold callsite heuristic that is not necessary because of cold callee heuristic. llvm-svn: 277863	2016-08-05 20:49:04 +00:00
Dehao Chen	de39cb9384	Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites. Reviewers: davidxl, eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D22368 llvm-svn: 277860	2016-08-05 20:28:41 +00:00
Sean Silva	ab6a683765	Avoid using a raw AssumptionCacheTracker in various inliner functions. This unblocks the new PM part of River's patch in https://reviews.llvm.org/D22706 Conveniently, this same change was needed for D21921 and so these changes are just spun out from there. llvm-svn: 276515	2016-07-23 04:22:50 +00:00
Dehao Chen	9232f98279	Implement callsite-hotness based inline cost for Sample-based PGO Summary: For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch. E.g. if (A1 && A2 && A3 && ..... && A10) { for (i=0; i < 100000000; i++) { callsite(); } } Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value. In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness. Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22118 llvm-svn: 275073	2016-07-11 16:48:54 +00:00
Easwaran Raman	22eb80a114	Fix size computation of array allocation in inline cost analysis Differential revision: http://reviews.llvm.org/D21690 llvm-svn: 273952	2016-06-27 22:31:53 +00:00
Easwaran Raman	71069cf67d	Use ProfileSummaryInfo in inline cost analysis. Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo. Differential revision: http://reviews.llvm.org/D21045 llvm-svn: 272321	2016-06-09 22:23:21 +00:00
Easwaran Raman	bb578ef0dd	Allow -inline-threshold to override default threshold. Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior. Differential Revision: http://reviews.llvm.org/D20452 llvm-svn: 270153	2016-05-19 23:02:09 +00:00
Easwaran Raman	9b792923d0	Revert r269131 llvm-svn: 269138	2016-05-10 23:26:04 +00:00
Easwaran Raman	7eccf4ee0e	Reapply r266477 and r266488 llvm-svn: 269131	2016-05-10 22:03:23 +00:00
Sanjay Patel	0f153424a9	[Inliner] don't assume that a Constant alloca size is a ConstantInt (PR27277) Differential Revision: http://reviews.llvm.org/D20077 llvm-svn: 268980	2016-05-09 21:51:53 +00:00
Chad Rosier	567556aa9c	[Inliner] Formatting. NFC. Patch by Aditya Kumar! Differential Revision: http://reviews.llvm.org/D19047 llvm-svn: 267888	2016-04-28 14:47:23 +00:00
Peter Collingbourne	7dd8dbf486	Introduce llvm.load.relative intrinsic. This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223	2016-04-22 21:18:02 +00:00
Eric Liu	d09f15ea6f	Revert "Replace the use of MaxFunctionCount module flag" This reverts commit r266477. This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData", while "ProfileData" depends on "Object", which depends on "BitCode", which depends on "Analysis". llvm-svn: 266619	2016-04-18 15:31:11 +00:00
Easwaran Raman	f53baca686	Replace the use of MaxFunctionCount module flag Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count. Differential Revision: http://reviews.llvm.org/D18622 llvm-svn: 266477	2016-04-15 21:39:58 +00:00
Justin Lebar	8650a4da93	[TTI] Add getInliningThresholdMultiplier. Summary: InlineCost's threshold is multiplied by this value. This lets us adjust the inlining threshold up or down on a per-target basis. For example, we might want to increase the threshold on targets where calls are unusually expensive. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18560 llvm-svn: 266405	2016-04-15 01:38:48 +00:00
Easwaran Raman	d295b00ae9	Return immediately from analyzeCall if analyzeBlock returns false. This is part of the patch reviewed at http://reviews.llvm.org/D17584 llvm-svn: 266249	2016-04-13 21:20:22 +00:00
Easwaran Raman	9a3fc17ad4	Refactor Threshold computation. NFC. This is part of changes reviewed in http://reviews.llvm.org/D17584. llvm-svn: 265852	2016-04-08 21:28:02 +00:00
Sanjoy Das	5ce3272833	Don't IPO over functions that can be de-refined Summary: Fixes PR26774. If you're aware of the issue, feel free to skip the "Motivation" section and jump directly to "This patch". Motivation: I define "refinement" as discarding behaviors from a program that the optimizer has license to discard. So transforming: ``` void f(unsigned x) { unsigned t = 5 / x; (void)t; } ``` to ``` void f(unsigned x) { } ``` is refinement, since the behavior went from "if x == 0 then undefined else nothing" to "nothing" (the optimizer has license to discard undefined behavior). Refinement is a fundamental aspect of many mid-level optimizations done by LLVM. For instance, transforming `x == (x + 1)` to `false` also involves refinement since the expression's value went from "if x is `undef` then { `true` or `false` } else { `false` }" to "`false`" (by definition, the optimizer has license to fold `undef` to any non-`undef` value). Unfortunately, refinement implies that the optimizer cannot assume that the implementation of a function it can see has all of the behavior an unoptimized or a differently optimized version of the same function can have. This is a problem for functions with comdat linkage, where a function can be replaced by an unoptimized or a differently optimized version of the same source level function. For instance, FunctionAttrs cannot assume a comdat function is actually `readnone` even if it does not have any loads or stores in it; since there may have been loads and stores in the "original function" that were refined out in the currently visible variant, and at the link step the linker may in fact choose an implementation with a load or a store. As an example, consider a function that does two atomic loads from the same memory location, and writes to memory only if the two values are not equal. The optimizer is allowed to refine this function by first CSE'ing the two loads, and the folding the comparision to always report that the two values are equal. Such a refined variant will look like it is `readonly`. However, the unoptimized version of the function can still write to memory (since the two loads //can// result in different values), and selecting the unoptimized version at link time will retroactively invalidate transforms we may have done under the assumption that the function does not write to memory. Note: this is not just a problem with atomics or with linking differently optimized object files. See PR26774 for more realistic examples that involved neither. This patch: This change introduces a new set of linkage types, predicated as `GlobalValue::mayBeDerefined` that returns true if the linkage type allows a function to be replaced by a differently optimized variant at link time. It then changes a set of IPO passes to bail out if they see such a function. Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18634 llvm-svn: 265762	2016-04-08 00:48:30 +00:00
Easwaran Raman	b1bd398ceb	Revert revisions 262636, 262643, 262679, and 262682. llvm-svn: 262883	2016-03-08 00:36:35 +00:00
Easwaran Raman	588c68a87b	Fix a memory leak. llvm-svn: 262682	2016-03-04 01:18:40 +00:00
Easwaran Raman	fd6557e368	Fix breakage caused by r262636. Use LLVM_ATTRIBUTE_UNUSED instead of __attribute_((unused)) llvm-svn: 262643	2016-03-03 18:53:20 +00:00
Easwaran Raman	3035719c86	Infrastructure for PGO enhancements in inliner This patch provides the following infrastructure for PGO enhancements in inliner: Enable the use of block level profile information in inliner Incremental update of block frequency information during inlining Update the function entry counts of callees when they get inlined into callers. Differential Revision: http://reviews.llvm.org/D16381 llvm-svn: 262636	2016-03-03 18:26:33 +00:00
Hans Wennborg	00ab73dcb0	CallAnalyzer::analyzeCall: change the condition back to "Cost < Threshold" In r252595, I inadvertently changed the condition to "Cost <= Threshold", which caused a significant size regression in Chrome. This commit rectifies that. llvm-svn: 259915	2016-02-05 20:32:42 +00:00
Jun Bum Lim	53907161cc	Avoid inlining call sites in unreachable-terminated block Summary: If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner. Reviewers: manmanren, majnemer, hfinkel, davidxl, mcrosier, dblaikie, eraman Subscribers: dblaikie, davidxl, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16616 llvm-svn: 259403	2016-02-01 20:55:11 +00:00
Yaron Keren	eb2a25467e	Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment. clang part in r259232, this is the LLVM part of the patch. llvm-svn: 259240	2016-01-29 20:50:44 +00:00
Easwaran Raman	30a93c1848	Lower inlining threshold when the caller has minsize attribute. When the caller has optsize attribute, we reduce the inlinining threshold to OptSizeThreshold (=75) if it is not already lower than that. We don't do the same for minsize and I suspect it was not intentional. This also addresses a FIXME regarding checking optsize attribute explicitly instead of using the right wrapper. Differential Revision: http://reviews.llvm.org/D16493 llvm-svn: 259120	2016-01-28 23:44:41 +00:00
Manuel Jacob	e902459c4b	Change ConstantFoldInstOperands to take Instruction instead of opcode and type. NFC. Summary: The previous form, taking opcode and type, is moved to an internal helper and the new form, taking an instruction, is a wrapper around this helper. Although this is a slight cleanup on its own, the main motivation is to refactor the constant folding API to ease migration to opaque pointers. This will be follow-up work. Reviewers: eddyb Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D16383 llvm-svn: 258391	2016-01-21 06:33:22 +00:00
Easwaran Raman	f4bb2f0dc3	Refactor threshold computation for inline cost analysis Differential Revision: http://reviews.llvm.org/D15401 llvm-svn: 257832	2016-01-14 23:16:29 +00:00
Easwaran Raman	b9f7120e7a	Refactor inline costs analysis by removing the InlineCostAnalysis class InlineCostAnalysis is an analysis pass without any need for it to be one. Once it stops being an analysis pass, it doesn't maintain any useful state and the member functions inside can be made free functions. NFC. Differential Revision: http://reviews.llvm.org/D15701 llvm-svn: 256521	2015-12-28 20:28:19 +00:00
Akira Hatanaka	1cb242eb13	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r256277 with two changes: - In emitFnAttrCompatCheck, change FuncName's type to std::string to fix a use-after-free bug. - Remove an unnecessary install-local target in lib/IR/Makefile. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 256304	2015-12-22 23:57:37 +00:00
Akira Hatanaka	9c05cc5670	Revert r256277 and r256279. Some of the bots failed again. llvm-svn: 256280	2015-12-22 20:29:09 +00:00
Akira Hatanaka	a61deb249b	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r252990 and r252949. I've added member function getKind to the Attr classes which returns the enum or string of the attribute. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 256277	2015-12-22 20:00:05 +00:00
Easwaran Raman	6d90d9f102	Use updated threshold for indirect call bonus When considering foo->bar inlining, if there is an indirect call in foo which gets resolved to a direct call (say baz), then we try to inline baz into bar with a threshold T and subtract max(T - Cost(bar->baz), 0) from Cost(foo->bar). This patch uses max(Threshold(bar->baz) - Cost(bar->baz)) instead, where Thresheld(bar->baz) could be different from T due to bonuses or subtractions. Threshold(bar->baz) - Cost(bar->baz) better represents the desirability of inlining baz into bar. Differential Revision: http://reviews.llvm.org/D14309 llvm-svn: 254945	2015-12-07 21:21:20 +00:00
Easwaran Raman	3676da4b4a	Test commit. Remove blank spaces at the end of comments llvm-svn: 254630	2015-12-03 19:03:20 +00:00
Akira Hatanaka	5af7ace4ee	Revert r252990. Some of the buildbots are still failing. llvm-svn: 252999	2015-11-13 01:44:32 +00:00
Akira Hatanaka	c7dfb76fe7	Provide a way to specify inliner's attribute compatibility and merging. This reapplies r252949. I've changed the type of FuncName to be std::string instead of StringRef in emitFnAttrCompatCheck. Original commit message for r252949: Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 252990	2015-11-13 01:23:11 +00:00
Akira Hatanaka	f3aa82f666	Revert r252949. It broke some of the bots including clang-x64-ninja-win7. llvm-svn: 252951	2015-11-12 21:19:18 +00:00
Akira Hatanaka	61b81a563a	Provide a way to specify inliner's attribute compatibility and merging rules using table-gen. NFC. This commit adds new classes CompatRule and MergeRule to Attributes.td, which are used to generate code to check attribute compatibility and merge attributes of the caller and callee. rdar://problem/19836465 llvm-svn: 252949	2015-11-12 20:59:43 +00:00
Hans Wennborg	21ce8ecb09	Inliner: Do zero-cost inlines even if above a negative threshold (PR24851) Differential Revision: http://reviews.llvm.org/D14499 llvm-svn: 252595	2015-11-10 09:47:48 +00:00
Duncan P. N. Exon Smith	5a82c916b0	Analysis: Remove implicit ilist iterator conversions Remove implicit ilist iterator conversions from LLVMAnalysis. I came across something really scary in `llvm::isKnownNotFullPoison()` which relied on `Instruction::getNextNode()` being completely broken (not surprising, but scary nevertheless). This function is documented (and coded to) return `nullptr` when it gets to the sentinel, but with an `ilist_half_node` as a sentinel, the sentinel check looks into some other memory and we don't recognize we've hit the end. Rooting out these scary cases is the reason I'm removing the implicit conversions before doing anything else with `ilist`; I'm not at all surprised that clients rely on badness. I found another scary case -- this time, not relying on badness, just bad (but I guess getting lucky so far) -- in `ObjectSizeOffsetEvaluator::compute_()`. Here, we save out the insertion point, do some things, and then restore it. Previously, we let the iterator auto-convert to `Instruction`, and then set it back using the `Instruction` version: Instruction PrevInsertPoint = Builder.GetInsertPoint(); / Logic that may change insert point */ if (PrevInsertPoint) Builder.SetInsertPoint(PrevInsertPoint); The check for `PrevInsertPoint` doesn't protect correctly against bad accesses. If the insertion point has been set to the end of a basic block (i.e., `SetInsertPoint(SomeBB)`), then `GetInsertPoint()` returns an iterator pointing at the list sentinel. The version of `SetInsertPoint()` that's getting called will then call `PrevInsertPoint->getParent()`, which explodes horribly. The only reason this hasn't blown up is that it's fairly unlikely the builder is adding to the end of the block; usually, we're adding instructions somewhere before the terminator. llvm-svn: 249925	2015-10-10 00:53:03 +00:00
Sanjay Patel	e9434e80d1	80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC llvm-svn: 247700	2015-09-15 15:26:25 +00:00
Joseph Tremoulet	8220bcc570	[WinEH] Require token linkage in EH pad/ret signatures Summary: WinEHPrepare is going to require that cleanuppad and catchpad produce values of token type which are consumed by any cleanupret or catchret exiting the pad. This change updates the signatures of those operators to require/enforce that the type produced by the pads is token type and that the rets have an appropriate argument. The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that restriction, this change adds a notion of an operator constraint to both LLParser and BitcodeReader, allowing appropriate sentinels to be constructed for forward references and appropriate error messages to be emitted for illegal inputs. Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad predecessor must have no other predecessors; this ensures that WinEHPrepare will see the expected linear relationship between sibling catches on the same try. Lastly, remove some superfluous/vestigial casts from instruction operand setters operating on BasicBlocks. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12108 llvm-svn: 245797	2015-08-23 00:26:33 +00:00
Chandler Carruth	7adc3a2b0e	[PM/AA] Remove the last relics of the separate IPA library from LLVM, folding the code into the main Analysis library. There already wasn't much of a distinction between Analysis and IPA. A number of the passes in Analysis are actually IPA passes, and there doesn't seem to be any advantage to separating them. Moreover, it makes it hard to have interactions between analyses that are both local and interprocedural. In trying to make the Alias Analysis infrastructure work with the new pass manager, it becomes particularly awkward to navigate this split. I've tried to find all the places where we referenced this, but I may have missed some. I have also adjusted the C API to continue to be equivalently functional after this change. Differential Revision: http://reviews.llvm.org/D12075 llvm-svn: 245318	2015-08-18 17:51:53 +00:00
Chandler Carruth	d73bc5fbe2	Sink InlineCost.cpp into IPA -- it is now officially an interprocedural analysis. How cute that it wasn't previously. ;] Part of this confusion stems from the flattened header file tree. Thanks to Benjamin for pointing out the goof on IRC, and we're considering un-flattening the headers, so speak now if that would bug you. llvm-svn: 173033	2013-01-21 12:09:41 +00:00
Chandler Carruth	b8cf510d81	Move the inline cost analysis's primary cost query to TTI instead of the old CodeMetrics system. TTI has the specific advantage of being extensible and customizable by targets to reflect target-specific cost metrics. llvm-svn: 173032	2013-01-21 12:05:16 +00:00
Chandler Carruth	42f3dceb63	Now that the inline cost analysis is a pass, we can easily have it depend on and use other analyses (as long as they're either immutable passes or CGSCC passes of course -- nothing in the pass manager has been fixed here). Leverage this to thread TargetTransformInfo down through the inline cost analysis. No functionality changed here, this just threads things through. llvm-svn: 173031	2013-01-21 11:55:09 +00:00
Chandler Carruth	4319e2948d	Make the inline cost a proper analysis pass. This remains essentially a dynamic analysis done on each call to the routine. However, now it can use the standard pass infrastructure to reference other analyses, instead of a silly setter method. This will become more interesting as I teach it about more analysis passes. This updates the two inliner passes to use the inline cost analysis. Doing so highlights how utterly redundant these two passes are. Either we should find a cheaper way to do always inlining, or we should merge the two and just fiddle with the thresholds to get the desired behavior. I'm leaning increasingly toward the latter as it would also remove the Inliner sub-class split. llvm-svn: 173030	2013-01-21 11:39:18 +00:00
Chandler Carruth	9fb823bbd4	Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366	2013-01-02 11:36:10 +00:00
Bill Wendling	698e84fc4f	Remove the Function::getFnAttributes method in favor of using the AttributeSet directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253	2012-12-30 10:32:01 +00:00
Chandler Carruth	86ed53089f	Fix a stunning oversight in the inline cost analysis. It was never propagating one of the values it simplified to a constant across a myriad of instructions. Notably, ptrtoint instructions when we had a constant pointer (say, 0) didn't propagate that, blocking a massive number of down-stream optimizations. This was uncovered when investigating why we fail to inline and delete the boilerplate in: void f() { std::vector<int> v; v.push_back(1); } It turns out most of the efforts I've made thus far to improve the analysis weren't making it far purely because of this. After this is fixed, the store-to-load forwarding patch enables LLVM to optimize the above to an empty function. We still can't nuke a second push_back, but for different reasons. There is a very real chance this will cause somewhat noticable changes in inlining behavior, so please let me know if you see regressions (or improvements!) because of this patch. llvm-svn: 171196	2012-12-28 14:43:42 +00:00
Chandler Carruth	753e21d057	Teach the inline cost analysis about calls that can be simplified and how to propagate constants through insert and extract value instructions. With the recent improvements to instsimplify, this allows inline cost analysis to constant fold through intrinsic functions, including notably the with.overflow intrinsic math routines which often show up inside of STL abstractions. This is yet another piece in the puzzle of breaking down the code for: void f() { std::vector<int> v; v.push_back(1); } But it still isn't enough. There are a pile of bugs in inline cost still blocking this. llvm-svn: 171195	2012-12-28 14:23:32 +00:00
James Molloy	4f6fb953a7	Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call. Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage). llvm-svn: 170704	2012-12-20 16:04:27 +00:00

1 2 3 4 5 ...

258 Commits