Commit Graph

270 Commits

Author SHA1 Message Date
Eugene Zelenko f27d161bf0 [Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 316187
2017-10-19 21:21:30 +00:00
Vivek Pandya 9590658fb8 [NFC] Convert OptimizationRemarkEmitter old emit() calls to new closure
parameterized emit() calls

Summary: This is not functional change to adopt new emit() API added in r313691.

Reviewed By: anemet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38285

llvm-svn: 315476
2017-10-11 17:12:59 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Sanjoy Das def1729dc4 Use a BumpPtrAllocator for Loop objects
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.

Reviewers: chandlerc

Subscribers: mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D38201

llvm-svn: 314375
2017-09-28 02:45:42 +00:00
Adam Nemet 15fccf0009 Allow ORE.emit to take a closure to delay building the remark object
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator.  This requires making the insertion
operator generic.

I've also converted a few cases to use the new API.  It seems to work pretty
well.  See the LoopUnroller for a slightly more interesting case.

llvm-svn: 313691
2017-09-19 23:00:55 +00:00
Haicheng Wu 0812c5bea3 [InlineCost] Add cl::opt to allow full inline cost to be computed for debugging purposes.
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.

I took over this work from Chad Rosier (mcrosier@codeaurora.org).

Differential Revision: https://reviews.llvm.org/D35850

llvm-svn: 311371
2017-08-21 20:00:09 +00:00
Sam Elliott e604b563ea Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311349
2017-08-21 16:45:47 +00:00
Sam Elliott 7fe0aaa140 Revert "Emit only A Single Opt Remark When Inlining"
Reverting due to clang build failure

llvm-svn: 311274
2017-08-20 06:55:10 +00:00
Sam Elliott 785dd75369 Emit only A Single Opt Remark When Inlining
Summary:
This updates the Inliner to only add a single Optimization
Remark when Inlining, rather than an Analysis Remark and an
Optimization Remark.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33786

Reviewers: anemet, davidxl, chandlerc

Reviewed By: anemet

Subscribers: haicheng, fhahn, mehdi_amini, dblaikie, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D36054

llvm-svn: 311273
2017-08-20 06:43:34 +00:00
Chandler Carruth 95055d8f8b [PM] Fix a bug where through CGSCC iteration we can get
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.

This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.

The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).

I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.

A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.

The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.

I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.

We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.

Differential Revision: https://reviews.llvm.org/D36188

llvm-svn: 309784
2017-08-02 02:09:22 +00:00
Chandler Carruth 06a86301a1 [PM/LCG] Follow-up fix to r308088 to handle deletion of library
functions.

In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.

While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.

We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.

This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.

llvm-svn: 308417
2017-07-19 04:12:25 +00:00
Chandler Carruth bd9c29039e [PM] Finish implementing and fix a chain of bugs uncovered by testing
the invalidation propagation logic from an SCC to a Function.

I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.

However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.

Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.

Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.

llvm-svn: 307487
2017-07-09 03:59:31 +00:00
David Blaikie 6d0f39476a Inliner: Avoid calling shouldInline until it's absolutely necessary
This restores the order of evaluation (& conditionalized evaluation) of
isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline
(with the addition of a shouldInline call after
isTriviallyDeadInstruction) from before r305245.

llvm-svn: 305267
2017-06-13 02:24:09 +00:00
David Blaikie ae8c4af4ac Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner
llvm-svn: 305245
2017-06-12 23:01:17 +00:00
David Blaikie cb9327b02d Inliner: Don't touch indirect calls
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.

llvm-svn: 305052
2017-06-09 03:29:20 +00:00
Easwaran Raman f5f9160072 [ProfileSummary] Make getProfileCount a non-static member function.
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.

Differential revision: https://reviews.llvm.org/D33012

llvm-svn: 302597
2017-05-09 23:21:10 +00:00
Evgeny Astigeevich 7823c66e05 r286814 resulted that CallPenalty can be subtracted twice:
- First time, during calculation of the cost in InlineCost.cpp
- Second time, during calculation of the cost in Inliner.cpp

This patches fixes this.

Differential Revision: https://reviews.llvm.org/D31137

llvm-svn: 298496
2017-03-22 12:01:57 +00:00
Chandler Carruth 814e0df1c5 [PM/Inliner] Fix a bug in r297374 where we would leave stale calls in
the work queue and crash when trying to visit them after deleting the
function containing those calls.

llvm-svn: 297940
2017-03-16 10:45:42 +00:00
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Taewook Oh f22fa72e4a Do not apply redundant LastCallToStaticBonus
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.

If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.

Reviewers: chandlerc, eraman

Reviewed By: eraman

Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29169

llvm-svn: 295075
2017-02-14 17:30:05 +00:00
Chandler Carruth aaad9f84be [PM/LCG] Teach the LazyCallGraph how to replace a function without
disturbing the graph or having to update edges.

This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.

LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
   need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
   from `Function*` edges to `Node*` edges.

All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!

Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.

The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.

The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.

Differential Revision: https://reviews.llvm.org/D29577

llvm-svn: 294653
2017-02-09 23:24:13 +00:00
Peter Collingbourne cea1e4e79a De-duplicate some code for creating an AARGetter suitable for the legacy PM.
I'm about to use this in a couple more places.

Differential Revision: https://reviews.llvm.org/D29793

llvm-svn: 294648
2017-02-09 23:11:52 +00:00
Adam Nemet e7bdf227f6 [Inliner] Fold analysis remarks into missed remarks
This significantly reduces the noise level of these messages.

llvm-svn: 293492
2017-01-30 16:22:45 +00:00
Haicheng Wu f8dc2d8c8b [Inliner] Fix a comment to match the code. NFC.
TotalAltCost => TotalSecondaryCost

Differential Revision: https://reviews.llvm.org/D29231

llvm-svn: 293490
2017-01-30 16:15:14 +00:00
Chandler Carruth 6acdca78a0 [PH] Replace uses of AssertingVH from members of analysis results with
a lazy-asserting PoisoningVH.

AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.

This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.

The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.

The rest is straight cleanup.

I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.

Differential Revision: https://reviews.llvm.org/D29006

llvm-svn: 292928
2017-01-24 12:55:57 +00:00
Chandler Carruth 9524af4aac [PM] Clear any analyses for a dead function after inlining it and before
clearing its body. This is essential to avoid triggering asserting value
handles in analyses on the function's body.

I'm working on a test case for this behavior in LLVM, but Clang has
a great one that managed to trigger this on all of the bots already.

llvm-svn: 292770
2017-01-23 07:03:41 +00:00
Chandler Carruth b698d5964d [PM] Fix a really nasty bug introduced when adding PGO support to the
new PM's inliner.

The bug happens when we refine an SCC after having computed a proxy for
the FunctionAnalysisManager, and then proceed to compute fresh analyses
for functions in the *new* SCC using the manager provided by the old
SCC's proxy. *And* when we manage to mutate a function in this new SCC
in a way that invalidates those analyses. This can be... challenging to
reproduce.

I've managed to contrive a set of functions that trigger this and added
a test case, but it is a bit brittle. I've directly checked that the
passes run in the expected ways to help avoid the test just becoming
silently irrelevant.

This gets the new PM back to passing the LLVM test suite after the PGO
improvements landed.

llvm-svn: 292757
2017-01-22 10:34:01 +00:00
Chandler Carruth d4be9f4b8d [PM] Add some debug logging to the new PM inliner to make it easier to
trace its behavior.

llvm-svn: 292756
2017-01-22 10:33:58 +00:00
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Simon Pilgrim 7d18a70dac Fix spelling mistakes in Transforms comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287488
2016-11-20 13:19:49 +00:00
Xinliang David Li f450b88191 Fix typo
llvm-svn: 285978
2016-11-04 03:00:52 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Dehao Chen 5461d8bdb5 Refactor the ProfileSummaryInfo to use doInitialization and doFinalization to handle Module update.
Summary: This refactors the change in r282616

Reviewers: davidxl, eraman, mehdi_amini

Subscribers: mehdi_amini, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D25041

llvm-svn: 282630
2016-09-28 21:00:58 +00:00
Adam Nemet c507ac96f5 [Inliner] Port all opt remarks to new streaming API
llvm-svn: 282559
2016-09-27 23:47:03 +00:00
Adam Nemet 04758ba385 Shorten DiagnosticInfoOptimizationRemark* to OptimizationRemark*. NFC
With the new streaming interface, these class names need to be typed a
lot and it's way too looong.

llvm-svn: 282544
2016-09-27 22:19:23 +00:00
Adam Nemet 1142147e41 [Inliner] Fold the analysis remark into the missed remark
There is really no reason for these to be separate.

The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed.  There,
you have to query analysis to get the full picture.

I think we should just explain the reason for missing the optimization
in the missed remark when possible.  Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.

llvm-svn: 282542
2016-09-27 21:58:17 +00:00
Adam Nemet a62b7e1a28 Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539
2016-09-27 20:55:07 +00:00
Adam Nemet cc2a3fa8e8 Revert "Output optimization remarks in YAML"
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
2016-09-27 16:39:24 +00:00
Adam Nemet 92e928c10a Output optimization remarks in YAML
This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499
2016-09-27 16:15:16 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Chandler Carruth f702d8ecb6 [Inliner] Add a flag to disable manual alloca merging in the Inliner.
This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.

Some context on why we should be using stack coloring rather than
merging allocas in this way:

LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.

As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.

All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.

For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html

Differential Revision: https://reviews.llvm.org/D23052

llvm-svn: 278892
2016-08-17 02:40:23 +00:00
Piotr Padlewski d89875ca39 Changed sign of LastCallToStaticBouns
Summary:
I think it is much better this way.
When I firstly saw line:
  Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23222

llvm-svn: 278290
2016-08-10 21:15:22 +00:00