Commit Graph

258 Commits

Author SHA1 Message Date
Haicheng Wu 0812c5bea3 [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes.
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.

I took over this work from Chad Rosier (mcrosier@codeaurora.org).

Differential Revision: https://reviews.llvm.org/D35850

llvm-svn: 311371
2017-08-21 20:00:09 +00:00
Chad Rosier 4eb18742ca [InlineCost] Add more debug during inline cost computation.
llvm-svn: 311370
2017-08-21 19:56:46 +00:00
Chandler Carruth bba762a13f [InlineCost] Refactor the checks for different analyses to be a bit more
localized to the code that uses those analyses.

Technically, this can change behavior as we no longer require the
existence of the ProfileSummaryInfo analysis to use local profile
information via BFI. We didn't actually require the PSI to have an
interesting profile though, so this only really impacts the behavior in
non-default pass pipelines.

IMO, this makes it substantially less surprising how everything works --
before an analysis that wasn't actually used had to exist to trigger
*any* profile aware inlining. I think the new organization makes it more
obvious where various checks for profile signals happen.

Differential Revision: https://reviews.llvm.org/D36710

llvm-svn: 310888
2017-08-14 21:25:00 +00:00
Easwaran Raman ff77cc750c [Inliner] Fix a typo in option description. NFC.
llvm-svn: 310073
2017-08-04 17:15:17 +00:00
Easwaran Raman 974d4eea93 [Inliner] Increase threshold for hot callsites without PGO.
Summary:
This increases the inlining threshold for hot callsites. Hotness is
defined in terms of block frequency of the callsite relative to the
caller's entry block's frequency. Since this requires BFI in the
inliner, this only affects the new PM pipeline. This is enabled by
default at -O3.

This improves the performance of some internal benchmarks. Notably, an
internal benchmark for Gipfeli compression
(https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006
improves by ~2.5%. I am running more experiments and will update the
thread if other benchmarks show improvement/regression.

In terms of text size, LLVM test-suite shows an 1.22% text size
increase. Diving into the results, 13 of the benchmarks in the
test-suite increases by > 10%. Most of these are small, but
Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase)
have >250K text size. On a large application, the text size increases by
2%

Reviewers: chandlerc, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36199

llvm-svn: 309994
2017-08-03 22:23:33 +00:00
Chad Rosier 5ce28f4f92 [InlineCost] Remove redundant call. NFC.
llvm-svn: 309819
2017-08-02 14:50:27 +00:00
Chad Rosier 2e1c050e52 [InlineCost] Simplify more 'and' and 'or' operations.
Differential Revision: https://reviews.llvm.org/D35856

llvm-svn: 309817
2017-08-02 14:40:42 +00:00
Easwaran Raman 51b809bf2f [Inliner] Do not apply any bonus for cold callsites.
Summary:
Inlining threshold is increased by application of bonuses when the
callee has a single reachable basic block or is rich in vector
instructions. Similarly, inlining cost is reduced by applying a large
bonus when the last call to a static function is considered for
inlining. This patch disables the application of these bonuses when the
callsite or the callee is cold. The intention here is to prevent a large
cold callsite from being inlined to a non-cold caller that could prevent
the caller from being inlined. This is especially important when the
cold callsite is a last call to a static since the associated bonus is
very high.

Reviewers: chandlerc, davidxl

Subscribers: danielcdh, llvm-commits

Differential Revision: https://reviews.llvm.org/D35823

llvm-svn: 309441
2017-07-28 21:47:36 +00:00
Evgeny Astigeevich 61c1bd5abc [InlineCost, NFC] Change CallAnalyzer::isGEPFree to use TTI::getUserCost instead of TTI::getGEPCost
Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free.
TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies
(see https://reviews.llvm.org/D31186 for example).
There is TTI::getUserCost which can calculate the cost more accurately by
taking dependencies into account.

Differential Revision: https://reviews.llvm.org/D33685

llvm-svn: 309268
2017-07-27 12:49:27 +00:00
Eric Christopher 7ad02eee8a Fix a typo.
llvm-svn: 306599
2017-06-28 21:10:31 +00:00
Easwaran Raman c5fa6358ba [NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case
Differential Revision: https://reviews.llvm.org/D34312

llvm-svn: 306484
2017-06-27 23:11:18 +00:00
Jun Bum Lim 506cfb7ab7 [InlineCost] Do not take INT_MAX when Cost is negative
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.

Reviewers: hans, echristo, dmgreen

Reviewed By: dmgreen

Subscribers: mcrosier, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34436

llvm-svn: 306118
2017-06-23 16:12:37 +00:00
Andrew Kaylor 647025f9e1 [InstSimplify] Don't constant fold or DCE calls that are marked nobuiltin
Differential Revision: https://reviews.llvm.org/D33737

llvm-svn: 305132
2017-06-09 23:18:11 +00:00
Jun Bum Lim 2960d41e68 [InlineCost] Enable the new switch cost heuristic
Summary:
This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.

Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo

Reviewed By: echristo

Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini

Differential Revision: https://reviews.llvm.org/D32653

llvm-svn: 304594
2017-06-02 20:42:54 +00:00
Easwaran Raman 3cd1479c3f [Inliner] Do not mix callsite and callee hotness based updates.
Update threshold based on callee's hotness only when BFI is not available.
Otherwise use only callsite's hotness. This makes it easier to reason about
hotness related threshold updates.

Differential revision: https://reviews.llvm.org/D33157

llvm-svn: 303210
2017-05-16 21:18:09 +00:00
Easwaran Raman c103ef89ee Decrease inlinecold-threshold to 45
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:

Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.

I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.

Differential revision: https://reviews.llvm.org/D33106

llvm-svn: 302829
2017-05-11 21:36:28 +00:00
Xinliang David Li 351d9b01b9 Refactor callsite cost computation into a helper function /NFC
Makes code more readable. The function will also be used
by the partial inlining's cost analysis.

llvm-svn: 301899
2017-05-02 05:38:41 +00:00
Jun Bum Lim 919f9e8d65 [InlineCost] Improve the cost heuristic for Switch
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets

```
lor.lhs.false2:                                   ; preds = %if.then
  switch i32 %Status, label %if.then27 [
    i32 -7012, label %if.end35
    i32 -10008, label %if.end35
    i32 -10016, label %if.end35
    i32 15000, label %if.end35
    i32 14013, label %if.end35
    i32 10114, label %if.end35
    i32 10107, label %if.end35
    i32 10105, label %if.end35
    i32 10013, label %if.end35
    i32 10011, label %if.end35
    i32 7008, label %if.end35
    i32 7007, label %if.end35
    i32 5002, label %if.end35
  ]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)

```
.LBB853_9:                              // %lor.lhs.false2
        mov     w8, #10012
        cmp             w19, w8
        b.gt    .LBB853_14
// BB#10:                               // %lor.lhs.false2
        mov     w8, #5001
        cmp             w19, w8
        b.gt    .LBB853_18
// BB#11:                               // %lor.lhs.false2
        mov     w8, #-10016
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#12:                               // %lor.lhs.false2
        mov     w8, #-10008
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#13:                               // %lor.lhs.false2
        mov     w8, #-7012
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_14:                             // %lor.lhs.false2
        mov     w8, #14012
        cmp             w19, w8
        b.gt    .LBB853_21
// BB#15:                               // %lor.lhs.false2
        mov     w8, #-10105
        add             w8, w19, w8
        cmp             w8, #9          // =9
        b.hi    .LBB853_17
// BB#16:                               // %lor.lhs.false2
        orr     w9, wzr, #0x1
        lsl     w8, w9, w8
        mov     w9, #517
        and             w8, w8, w9
        cbnz    w8, .LBB853_23
.LBB853_17:                             // %lor.lhs.false2
        mov     w8, #10013
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_18:                             // %lor.lhs.false2
        mov     w8, #-7007
        add             w8, w19, w8
        cmp             w8, #2          // =2
        b.lo    .LBB853_23
// BB#19:                               // %lor.lhs.false2
        mov     w8, #5002
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#20:                               // %lor.lhs.false2
        mov     w8, #10011
        cmp             w19, w8
        b.eq    .LBB853_23
        b       .LBB853_3
.LBB853_21:                             // %lor.lhs.false2
        mov     w8, #14013
        cmp             w19, w8
        b.eq    .LBB853_23
// BB#22:                               // %lor.lhs.false2
        mov     w8, #15000
        cmp             w19, w8
        b.ne    .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.

This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
  -inline-generic-switch-cost=false

This change was originally proposed by Haicheng in D29870.

Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier

Reviewed By: hans

Subscribers: joerg, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D31085

llvm-svn: 301649
2017-04-28 16:04:03 +00:00
Easwaran Raman e1bd7cceca Remove a repeated comment line. NFC.
llvm-svn: 301059
2017-04-21 23:12:16 +00:00
Eric Christopher 908ed7f20c Tidy checking for the soft float attribute.
llvm-svn: 300394
2017-04-15 06:14:52 +00:00
Eric Christopher 85be8ca881 Cache the DataLayout rather than looking it up frequently.
llvm-svn: 300393
2017-04-15 06:14:50 +00:00
Reid Kleckner fb502d2f5e [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Chandler Carruth 927d8e610a [IR] Redesign the case iterator in SwitchInst to actually be an iterator
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.

All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.

Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.

Differential Revision: https://reviews.llvm.org/D31548

llvm-svn: 300032
2017-04-12 07:27:28 +00:00
Vassil Vassilev e1f12fadc0 Remove unused functions. Remove static qualifier from functions in header files. NFC.
llvm-svn: 299947
2017-04-11 14:55:32 +00:00
Dehao Chen 9907e9d860 Do not inline hot callsites for samplepgo in thinlto compile phase.
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.

Reviewers: tejohnson, eraman

Reviewed By: tejohnson

Subscribers: mehdi_amini, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D31201

llvm-svn: 298428
2017-03-21 19:55:36 +00:00
Easwaran Raman a8b9cdc9e2 [InlineCost] Move the code in isGEPOffsetConstant to a lambda.
Differential revision: https://reviews.llvm.org/D30112

llvm-svn: 296208
2017-02-25 00:10:22 +00:00
Easwaran Raman 617f63640b Refactor instruction simplification code in visitors. NFC.
Several visitors check if operands to the instruction are constants,
either as it is or after looking up SimplifiedValues, check if the
result is a constant and update the SimplifiedValues map. This
refactoring splits it into a common function that does the checking of
whether the operands are constants and updating of the SimplifiedValues
table, and an instruction specific part that is implemented by each
instruction visitor as a lambda and passed to the common function.

Differential revision: https://reviews.llvm.org/D30104

llvm-svn: 295552
2017-02-18 17:22:52 +00:00
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Haicheng Wu 201b191b82 Recommit "[InlineCost] Use TTI to check if GEP is free." #3
This is the third attemp to recommit r292526.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292633
2017-01-20 18:51:22 +00:00
Haicheng Wu 71ef5bc0ff Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2"
This reverts commit r292616 because the test case still has problem.

llvm-svn: 292618
2017-01-20 16:52:22 +00:00
Haicheng Wu 8f34ae2aae Recommit "[InlineCost] Use TTI to check if GEP is free." #2
This is the second attemp to recommit r292526.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292616
2017-01-20 16:36:34 +00:00
Haicheng Wu 8f2aca388b Revert "Recommit "[InlineCost] Use TTI to check if GEP is free.""
This reverts commit r292570.  The test still has problem.

llvm-svn: 292572
2017-01-20 03:40:41 +00:00
Haicheng Wu 1af1f071ea Recommit "[InlineCost] Use TTI to check if GEP is free."
This recommits r292526 which is reverted in r292529 after fixing the test case.

The original summary:

Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

llvm-svn: 292570
2017-01-20 03:09:11 +00:00
Haicheng Wu e036df4723 Revert "[InlineCost] Use TTI to check if GEP is free."
This reverts commit r292526.  The test case has problem.

llvm-svn: 292529
2017-01-19 22:51:03 +00:00
Haicheng Wu da556345dc [InlineCost] Use TTI to check if GEP is free.
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.

Differential Revision: https://reviews.llvm.org/D28693

llvm-svn: 292526
2017-01-19 22:28:34 +00:00
Easwaran Raman e08b139d7d Refactor inline threshold update code.
Functional change: Previously, if a callee is cold, we used ColdThreshold if it minimizes the existing threshold. This was irrespective of whether we were optimizing for minsize (-Oz) or not. But -Oz uses very low threshold to begin with and the inlining with -Oz is expected to be tuned for lowering code size, so there is no good reason to set an even lower threshold for cold callees. We now lower the threshold for cold callees only when -Oz is not used. For default values of -inlinethreshold and -inlinecold-threshold, this change has no effect and this simplifies the code.

NFC changes: Group all threshold updates that are guarded by !Caller->optForMinSize() and within that group threshold updates that require profile summary info.

Differential revision: https://reviews.llvm.org/D28369

llvm-svn: 291487
2017-01-09 21:56:26 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Craig Topper 107b187d2a [Analysis] Fix typo in comment. NFC
llvm-svn: 289171
2016-12-09 02:18:04 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
James Molloy 6df8f27c95 [InlineCost] Remove skew when calculating call costs
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.

However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:

  int g() {
    h();
  }
  int f() {
    g();
  }

llvm-svn: 286814
2016-11-14 11:14:41 +00:00
Dehao Chen 84287abf43 Rename isHotFunction/isColdFunction to isFunctionEntryHot/isFunctionEntryCold. (NFC)
This is in preparation for https://reviews.llvm.org/D25048

llvm-svn: 283805
2016-10-10 21:47:28 +00:00
Piotr Padlewski f3d122cd02 NFC fix doxygen comments
llvm-svn: 282950
2016-09-30 21:05:49 +00:00
Easwaran Raman 7060af9d22 Fix a thinko in r278189.
llvm-svn: 280008
2016-08-29 20:45:51 +00:00
Easwaran Raman 0d58fcac99 Make more fields of InlineParams Optional.
Differential revision: https://reviews.llvm.org/D23386

llvm-svn: 278312
2016-08-11 03:58:05 +00:00
Piotr Padlewski d89875ca39 Changed sign of LastCallToStaticBouns
Summary:
I think it is much better this way.
When I firstly saw line:
  Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23222

llvm-svn: 278290
2016-08-10 21:15:22 +00:00
Easwaran Raman 1c57cc2b68 Do not directly use inline threshold cl options in cost analysis.
This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use.

Differential revision: https://reviews.llvm.org/D22120

llvm-svn: 278189
2016-08-10 00:48:04 +00:00
Dehao Chen e1c7c57d11 Remove cold callsite heuristic that is not necessary because of cold callee heuristic.
llvm-svn: 277863
2016-08-05 20:49:04 +00:00
Dehao Chen de39cb9384 Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.

Reviewers: davidxl, eraman

Subscribers: Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D22368

llvm-svn: 277860
2016-08-05 20:28:41 +00:00
Sean Silva ab6a683765 Avoid using a raw AssumptionCacheTracker in various inliner functions.
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706

Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.

llvm-svn: 276515
2016-07-23 04:22:50 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Easwaran Raman 22eb80a114 Fix size computation of array allocation in inline cost analysis
Differential revision: http://reviews.llvm.org/D21690

llvm-svn: 273952
2016-06-27 22:31:53 +00:00
Easwaran Raman 71069cf67d Use ProfileSummaryInfo in inline cost analysis.
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.

Differential revision: http://reviews.llvm.org/D21045

llvm-svn: 272321
2016-06-09 22:23:21 +00:00
Easwaran Raman bb578ef0dd Allow -inline-threshold to override default threshold.
Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior.

Differential Revision: http://reviews.llvm.org/D20452

llvm-svn: 270153
2016-05-19 23:02:09 +00:00
Easwaran Raman 9b792923d0 Revert r269131
llvm-svn: 269138
2016-05-10 23:26:04 +00:00
Easwaran Raman 7eccf4ee0e Reapply r266477 and r266488
llvm-svn: 269131
2016-05-10 22:03:23 +00:00
Sanjay Patel 0f153424a9 [Inliner] don't assume that a Constant alloca size is a ConstantInt (PR27277)
Differential Revision: http://reviews.llvm.org/D20077

llvm-svn: 268980
2016-05-09 21:51:53 +00:00
Chad Rosier 567556aa9c [Inliner] Formatting. NFC.
Patch by Aditya Kumar!
Differential Revision: http://reviews.llvm.org/D19047

llvm-svn: 267888
2016-04-28 14:47:23 +00:00
Peter Collingbourne 7dd8dbf486 Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.

Differential Revision: http://reviews.llvm.org/D18367

llvm-svn: 267223
2016-04-22 21:18:02 +00:00
Eric Liu d09f15ea6f Revert "Replace the use of MaxFunctionCount module flag"
This reverts commit r266477.

This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".

llvm-svn: 266619
2016-04-18 15:31:11 +00:00
Easwaran Raman f53baca686 Replace the use of MaxFunctionCount module flag
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.

Differential Revision: http://reviews.llvm.org/D18622

llvm-svn: 266477
2016-04-15 21:39:58 +00:00
Justin Lebar 8650a4da93 [TTI] Add getInliningThresholdMultiplier.
Summary:
InlineCost's threshold is multiplied by this value.  This lets us adjust
the inlining threshold up or down on a per-target basis.  For example,
we might want to increase the threshold on targets where calls are
unusually expensive.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18560

llvm-svn: 266405
2016-04-15 01:38:48 +00:00
Easwaran Raman d295b00ae9 Return immediately from analyzeCall if analyzeBlock returns false.
This is part of the patch reviewed at http://reviews.llvm.org/D17584

llvm-svn: 266249
2016-04-13 21:20:22 +00:00
Easwaran Raman 9a3fc17ad4 Refactor Threshold computation. NFC.
This is part of changes reviewed in http://reviews.llvm.org/D17584.

llvm-svn: 265852
2016-04-08 21:28:02 +00:00
Sanjoy Das 5ce3272833 Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
Easwaran Raman b1bd398ceb Revert revisions 262636, 262643, 262679, and 262682.
llvm-svn: 262883
2016-03-08 00:36:35 +00:00
Easwaran Raman 588c68a87b Fix a memory leak.
llvm-svn: 262682
2016-03-04 01:18:40 +00:00
Easwaran Raman fd6557e368 Fix breakage caused by r262636.
Use LLVM_ATTRIBUTE_UNUSED instead of __attribute_((unused))

llvm-svn: 262643
2016-03-03 18:53:20 +00:00
Easwaran Raman 3035719c86 Infrastructure for PGO enhancements in inliner
This patch provides the following infrastructure for PGO enhancements in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Differential Revision: http://reviews.llvm.org/D16381

llvm-svn: 262636
2016-03-03 18:26:33 +00:00
Hans Wennborg 00ab73dcb0 CallAnalyzer::analyzeCall: change the condition back to "Cost < Threshold"
In r252595, I inadvertently changed the condition to "Cost <= Threshold",
which caused a significant size regression in Chrome. This commit rectifies
that.

llvm-svn: 259915
2016-02-05 20:32:42 +00:00
Jun Bum Lim 53907161cc Avoid inlining call sites in unreachable-terminated block
Summary:
If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner.

Reviewers: manmanren, majnemer, hfinkel, davidxl, mcrosier, dblaikie, eraman

Subscribers: dblaikie, davidxl, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16616

llvm-svn: 259403
2016-02-01 20:55:11 +00:00
Yaron Keren eb2a25467e Annotate dump() methods with LLVM_DUMP_METHOD, addressing Richard Smith r259192 post commit comment.
clang part in r259232, this is the LLVM part of the patch.

llvm-svn: 259240
2016-01-29 20:50:44 +00:00
Easwaran Raman 30a93c1848 Lower inlining threshold when the caller has minsize attribute.
When the caller has optsize attribute, we reduce the inlinining threshold
to OptSizeThreshold (=75) if it is not already lower than that. We don't do
the same for minsize and I suspect it was not intentional. This also addresses
a FIXME regarding checking optsize attribute explicitly instead of using the
right wrapper.

Differential Revision: http://reviews.llvm.org/D16493

llvm-svn: 259120
2016-01-28 23:44:41 +00:00
Manuel Jacob e902459c4b Change ConstantFoldInstOperands to take Instruction instead of opcode and type. NFC.
Summary:
The previous form, taking opcode and type, is moved to an internal
helper and the new form, taking an instruction, is a wrapper around this
helper.

Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.

Reviewers: eddyb

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D16383

llvm-svn: 258391
2016-01-21 06:33:22 +00:00
Easwaran Raman f4bb2f0dc3 Refactor threshold computation for inline cost analysis
Differential Revision: http://reviews.llvm.org/D15401

llvm-svn: 257832
2016-01-14 23:16:29 +00:00
Easwaran Raman b9f7120e7a Refactor inline costs analysis by removing the InlineCostAnalysis class
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.

Differential Revision: http://reviews.llvm.org/D15701

llvm-svn: 256521
2015-12-28 20:28:19 +00:00
Akira Hatanaka 1cb242eb13 Provide a way to specify inliner's attribute compatibility and merging.
This reapplies r256277 with two changes:

- In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
  a use-after-free bug.
- Remove an unnecessary install-local target in lib/IR/Makefile. 

Original commit message for r252949:

Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.

This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.

rdar://problem/19836465

llvm-svn: 256304
2015-12-22 23:57:37 +00:00
Akira Hatanaka 9c05cc5670 Revert r256277 and r256279.
Some of the bots failed again.

llvm-svn: 256280
2015-12-22 20:29:09 +00:00
Akira Hatanaka a61deb249b Provide a way to specify inliner's attribute compatibility and merging.
This reapplies r252990 and r252949. I've added member function getKind
to the Attr classes which returns the enum or string of the attribute.

Original commit message for r252949:

Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.

This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.

rdar://problem/19836465

llvm-svn: 256277
2015-12-22 20:00:05 +00:00
Easwaran Raman 6d90d9f102 Use updated threshold for indirect call bonus
When considering foo->bar inlining, if there is an indirect call in foo which gets resolved to a direct call (say baz), then we try to inline baz into bar with a threshold T and subtract max(T - Cost(bar->baz), 0) from Cost(foo->bar). This patch uses max(Threshold(bar->baz) - Cost(bar->baz)) instead, where Thresheld(bar->baz) could be different from T due to bonuses or subtractions. Threshold(bar->baz) - Cost(bar->baz) better represents the desirability of inlining baz into bar.

Differential Revision: http://reviews.llvm.org/D14309

llvm-svn: 254945
2015-12-07 21:21:20 +00:00
Easwaran Raman 3676da4b4a Test commit.
Remove blank spaces at the end of comments

llvm-svn: 254630
2015-12-03 19:03:20 +00:00
Akira Hatanaka 5af7ace4ee Revert r252990.
Some of the buildbots are still failing.

llvm-svn: 252999
2015-11-13 01:44:32 +00:00
Akira Hatanaka c7dfb76fe7 Provide a way to specify inliner's attribute compatibility and merging.
This reapplies r252949. I've changed the type of FuncName to be
std::string instead of StringRef in emitFnAttrCompatCheck.

Original commit message for r252949:

Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.

This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.

rdar://problem/19836465

llvm-svn: 252990
2015-11-13 01:23:11 +00:00
Akira Hatanaka f3aa82f666 Revert r252949.
It broke some of the bots including clang-x64-ninja-win7.

llvm-svn: 252951
2015-11-12 21:19:18 +00:00
Akira Hatanaka 61b81a563a Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.

This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.

rdar://problem/19836465

llvm-svn: 252949
2015-11-12 20:59:43 +00:00
Hans Wennborg 21ce8ecb09 Inliner: Do zero-cost inlines even if above a negative threshold (PR24851)
Differential Revision: http://reviews.llvm.org/D14499

llvm-svn: 252595
2015-11-10 09:47:48 +00:00
Duncan P. N. Exon Smith 5a82c916b0 Analysis: Remove implicit ilist iterator conversions
Remove implicit ilist iterator conversions from LLVMAnalysis.

I came across something really scary in `llvm::isKnownNotFullPoison()`
which relied on `Instruction::getNextNode()` being completely broken
(not surprising, but scary nevertheless).  This function is documented
(and coded to) return `nullptr` when it gets to the sentinel, but with
an `ilist_half_node` as a sentinel, the sentinel check looks into some
other memory and we don't recognize we've hit the end.

Rooting out these scary cases is the reason I'm removing the implicit
conversions before doing anything else with `ilist`; I'm not at all
surprised that clients rely on badness.

I found another scary case -- this time, not relying on badness, just
bad (but I guess getting lucky so far) -- in
`ObjectSizeOffsetEvaluator::compute_()`.  Here, we save out the
insertion point, do some things, and then restore it.  Previously, we
let the iterator auto-convert to `Instruction*`, and then set it back
using the `Instruction*` version:

    Instruction *PrevInsertPoint = Builder.GetInsertPoint();

    /* Logic that may change insert point */

    if (PrevInsertPoint)
      Builder.SetInsertPoint(PrevInsertPoint);

The check for `PrevInsertPoint` doesn't protect correctly against bad
accesses.  If the insertion point has been set to the end of a basic
block (i.e., `SetInsertPoint(SomeBB)`), then `GetInsertPoint()` returns
an iterator pointing at the list sentinel.  The version of
`SetInsertPoint()` that's getting called will then call
`PrevInsertPoint->getParent()`, which explodes horribly.  The only
reason this hasn't blown up is that it's fairly unlikely the builder is
adding to the end of the block; usually, we're adding instructions
somewhere before the terminator.

llvm-svn: 249925
2015-10-10 00:53:03 +00:00
Sanjay Patel e9434e80d1 80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC80-cols; NFC
llvm-svn: 247700
2015-09-15 15:26:25 +00:00
Joseph Tremoulet 8220bcc570 [WinEH] Require token linkage in EH pad/ret signatures
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad.  This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.

The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`).  To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.

Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.

Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.

Reviewers: rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12108

llvm-svn: 245797
2015-08-23 00:26:33 +00:00
Chandler Carruth 7adc3a2b0e [PM/AA] Remove the last relics of the separate IPA library from LLVM,
folding the code into the main Analysis library.

There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.

Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.

I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.

Differential Revision: http://reviews.llvm.org/D12075

llvm-svn: 245318
2015-08-18 17:51:53 +00:00
Chandler Carruth d73bc5fbe2 Sink InlineCost.cpp into IPA -- it is now officially an interprocedural
analysis. How cute that it wasn't previously. ;]

Part of this confusion stems from the flattened header file tree. Thanks
to Benjamin for pointing out the goof on IRC, and we're considering
un-flattening the headers, so speak now if that would bug you.

llvm-svn: 173033
2013-01-21 12:09:41 +00:00
Chandler Carruth b8cf510d81 Move the inline cost analysis's primary cost query to TTI instead of the
old CodeMetrics system. TTI has the specific advantage of being
extensible and customizable by targets to reflect target-specific cost
metrics.

llvm-svn: 173032
2013-01-21 12:05:16 +00:00
Chandler Carruth 42f3dceb63 Now that the inline cost analysis is a pass, we can easily have it
depend on and use other analyses (as long as they're either immutable
passes or CGSCC passes of course -- nothing in the pass manager has been
fixed here). Leverage this to thread TargetTransformInfo down through
the inline cost analysis.

No functionality changed here, this just threads things through.

llvm-svn: 173031
2013-01-21 11:55:09 +00:00
Chandler Carruth 4319e2948d Make the inline cost a proper analysis pass. This remains essentially
a dynamic analysis done on each call to the routine. However, now it can
use the standard pass infrastructure to reference other analyses,
instead of a silly setter method. This will become more interesting as
I teach it about more analysis passes.

This updates the two inliner passes to use the inline cost analysis.
Doing so highlights how utterly redundant these two passes are. Either
we should find a cheaper way to do always inlining, or we should merge
the two and just fiddle with the thresholds to get the desired behavior.
I'm leaning increasingly toward the latter as it would also remove the
Inliner sub-class split.

llvm-svn: 173030
2013-01-21 11:39:18 +00:00
Chandler Carruth 9fb823bbd4 Move all of the header files which are involved in modelling the LLVM IR
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.

There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.

The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.

I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).

I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.

llvm-svn: 171366
2013-01-02 11:36:10 +00:00
Bill Wendling 698e84fc4f Remove the Function::getFnAttributes method in favor of using the AttributeSet
directly.

This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.

llvm-svn: 171253
2012-12-30 10:32:01 +00:00
Chandler Carruth 86ed53089f Fix a stunning oversight in the inline cost analysis. It was never
propagating one of the values it simplified to a constant across
a myriad of instructions. Notably, ptrtoint instructions when we had
a constant pointer (say, 0) didn't propagate that, blocking a massive
number of down-stream optimizations.

This was uncovered when investigating why we fail to inline and delete
the boilerplate in:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

It turns out most of the efforts I've made thus far to improve the
analysis weren't making it far purely because of this. After this is
fixed, the store-to-load forwarding patch enables LLVM to optimize the
above to an empty function. We still can't nuke a second push_back, but
for different reasons.

There is a very real chance this will cause somewhat noticable changes
in inlining behavior, so please let me know if you see regressions (or
improvements!) because of this patch.

llvm-svn: 171196
2012-12-28 14:43:42 +00:00
Chandler Carruth 753e21d057 Teach the inline cost analysis about calls that can be simplified and
how to propagate constants through insert and extract value
instructions.

With the recent improvements to instsimplify, this allows inline cost
analysis to constant fold through intrinsic functions, including notably
the with.overflow intrinsic math routines which often show up inside of
STL abstractions. This is yet another piece in the puzzle of breaking
down the code for:

  void f() {
    std::vector<int> v;
    v.push_back(1);
  }

But it still isn't enough. There are a pile of bugs in inline cost still
blocking this.

llvm-svn: 171195
2012-12-28 14:23:32 +00:00
James Molloy 4f6fb953a7 Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call.
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).

llvm-svn: 170704
2012-12-20 16:04:27 +00:00