Commit Graph

2975 Commits

Author SHA1 Message Date
Dehao Chen a53219a5f2 Refactor SampleProfile.cpp to move computation inside a branch. (NFC)
llvm-svn: 292803
2017-01-23 17:09:02 +00:00
Chandler Carruth e8c66b2766 [PM] Replace the hard invalidate in JumpThreading for LVI with correct
invalidation of deleted functions in GlobalDCE.

This was always testing a bug really triggered in GlobalDCE. Right now
we have analyses with asserting value handles into IR. As long as those
remain, when *deleting* an IR unit, we cannot wait for the normal
invalidation scheme to kick in even though it was designed to work
correctly in the face of these kinds of deletions. Instead, the pass
needs to directly handle invalidating the analysis results pointing at
that IR unit.

I've tought the Inliner about this and this patch teaches GlobalDCE.
This will handle the asserting VH case in the existing test as well as
other issues of the same fundamental variety. I've moved the test into
the GlobalDCE directory and added a comment explaining what is going on.

Note that we cannot simply require LVI here because LVI is too lazy.

llvm-svn: 292773
2017-01-23 08:33:24 +00:00
Chandler Carruth 9524af4aac [PM] Clear any analyses for a dead function after inlining it and before
clearing its body. This is essential to avoid triggering asserting value
handles in analyses on the function's body.

I'm working on a test case for this behavior in LLVM, but Clang has
a great one that managed to trigger this on all of the bots already.

llvm-svn: 292770
2017-01-23 07:03:41 +00:00
Chandler Carruth b698d5964d [PM] Fix a really nasty bug introduced when adding PGO support to the
new PM's inliner.

The bug happens when we refine an SCC after having computed a proxy for
the FunctionAnalysisManager, and then proceed to compute fresh analyses
for functions in the *new* SCC using the manager provided by the old
SCC's proxy. *And* when we manage to mutate a function in this new SCC
in a way that invalidates those analyses. This can be... challenging to
reproduce.

I've managed to contrive a set of functions that trigger this and added
a test case, but it is a bit brittle. I've directly checked that the
passes run in the expected ways to help avoid the test just becoming
silently irrelevant.

This gets the new PM back to passing the LLVM test suite after the PGO
improvements landed.

llvm-svn: 292757
2017-01-22 10:34:01 +00:00
Chandler Carruth d4be9f4b8d [PM] Add some debug logging to the new PM inliner to make it easier to
trace its behavior.

llvm-svn: 292756
2017-01-22 10:33:58 +00:00
Anmol P. Paralkar 910dc8de3f MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
Summary:
Under option -mergefunc-preserve-debug-info we:
- Do not create a new function for a thunk.
- Retain the debug info for a thunk's parameters (and associated
  instructions for the debug info) from the entry block.
  Note: -debug will display the algorithm at work.
- Create debug-info for the call (to the shared implementation) made by
  a thunk and its return value.
- Erase the rest of the function, retaining the (minimally sized) entry
  block to create a thunk.
- Preserve a thunk's call site to point to the thunk even when both occur
  within the same translation unit, to aid debugability. Note that this
  behaviour differs from the underlying -mergefunc implementation which
  modifies the thunk's call site to point to the shared implementation
  when both occur within the same translation unit.

Reviewers: echristo, eeckstein, dblaikie, aprantl, friss

Reviewed By: aprantl

Subscribers: davide, fhahn, jfb, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28075

llvm-svn: 292702
2017-01-21 02:02:56 +00:00
Peter Collingbourne b365d921cf LowerTypeTests: Fix use-after-free. Found by asan/msan.
llvm-svn: 292700
2017-01-21 01:57:44 +00:00
Peter Collingbourne 67addbcacf LowerTypeTests: Simplify; always create SizeM1 with type IntPtrTy, move initialization out of if statement.
llvm-svn: 292674
2017-01-20 23:22:28 +00:00
Dehao Chen 77079003dd Add indirect call promotion to SamplePGO
Summary: This patch adds metadata for indirect call promotion in the sample profile loader.

Reviewers: xur, davidxl, dnovillo

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28923

llvm-svn: 292672
2017-01-20 22:56:07 +00:00
Easwaran Raman 12585b0148 Improve PGO support for the new inliner
This adds the following to the new PM based inliner in PGO mode:

* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.

* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.

* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.

Differential revision: https://reviews.llvm.org/D28331

llvm-svn: 292666
2017-01-20 22:44:04 +00:00
Peter Collingbourne e02b74e294 IPO, LTO: Plumb the summary from the LTO API into the pass manager.
Differential Revision: https://reviews.llvm.org/D28840

llvm-svn: 292661
2017-01-20 22:18:52 +00:00
Teresa Johnson 4566c6db87 [ThinLTO] Drop non-prevailing non-ODR weak to declarations
Summary:
Allow non-ODR weak/linkonce non-prevailing copies to be marked
as available_externally in the index. Add support for dropping these to
declarations in the backend.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28806

llvm-svn: 292656
2017-01-20 21:54:58 +00:00
Peter Collingbourne f04a390099 LowerTypeTests: Implement importing of type identifiers.
To import a type identifier we read the summary and create external
references to the symbols defined when exporting.

Differential Revision: https://reviews.llvm.org/D28546

llvm-svn: 292654
2017-01-20 21:49:34 +00:00
Peter Collingbourne ee1416037e LowerTypeTests: Compute SizeM1BitWidth in exportTypeId. NFCI.
This avoids needing to store it in a separate field in TypeIdLowering.

llvm-svn: 292647
2017-01-20 20:57:40 +00:00
Dehao Chen 94f369fc7f clang-format SampleProfile.cpp (NFC)
llvm-svn: 292533
2017-01-19 23:20:31 +00:00
Peter Collingbourne 22d9d3cdce LowerTypeTests: Implement exporting of type identifiers.
Type identifiers are exported by:
- Adding coarse-grained information about how to test the type
  identifier to the summary.
- Creating symbols in the object file (aliases and absolute symbols)
  containing fine-grained information about the type identifier.

Differential Revision: https://reviews.llvm.org/D28424

llvm-svn: 292462
2017-01-19 01:20:11 +00:00
Peter Collingbourne 20a00933fb ThinLTOBitcodeWriter: Clear comdats on filtered globals.
Differential Revision: https://reviews.llvm.org/D28839

llvm-svn: 292431
2017-01-18 20:03:02 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Teresa Johnson 83aaf358fd [ThinLTO] Import static functions from the same module as caller
Summary:
We can sometimes end up with multiple copies of a local function that
have the same GUID in the index. This happens when there are local
functions with the same name that are in different source files with the
same name (but in different directories), and they were compiled in
their own directory so had the same path at compile time.

In this case make sure we import the copy in the caller's module. While
it isn't a correctness problem (the renamed reference which is based on the
module IR hash will be unique since the module must have had an
externally visible function that was imported), importing the wrong copy
will result in lost performance opportunity since it won't be referenced
and inlined.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28440

llvm-svn: 291841
2017-01-12 22:04:45 +00:00
Peter Collingbourne 7636532c1b LowerTypeTests: Represent the memory region size with the constant size-1.
This means that we can use a shorter instruction sequence in the case where
the size is a power of two and on the boundary between two representations.

Differential Revision: https://reviews.llvm.org/D28421

llvm-svn: 291706
2017-01-11 21:32:10 +00:00
Peter Collingbourne 6bca5a0d82 Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.", with a fix for an off-by-one error.
llvm-svn: 291699
2017-01-11 20:28:46 +00:00
Ivan Krasin 42e6b4fd98 Revert rL291205 because it breaks Chrome tests under CFI.
Summary:
Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.

This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Original URL: https://reviews.llvm.org/D28341

Reviewers: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D28532

llvm-svn: 291684
2017-01-11 16:54:04 +00:00
Peter Collingbourne d79e49d807 LowerTypeTests: Thread summary and action from the API and command line into the pass.
Also move command line handling out of the pass constructor and into
a separate function.

Differential Revision: https://reviews.llvm.org/D28422

llvm-svn: 291323
2017-01-07 01:17:24 +00:00
Peter Collingbourne 81271b7bd2 LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.

Differential Revision: https://reviews.llvm.org/D28341

llvm-svn: 291205
2017-01-06 02:22:47 +00:00
Teresa Johnson 6c475a7595 ThinLTO: add early "dead-stripping" on the Index
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.

The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
  or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.

Patch originally by Mehdi Amini.

Reviewers: mehdi_amini, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23488

llvm-svn: 291177
2017-01-05 21:34:18 +00:00
Teresa Johnson 519465b993 [ThinLTO] Subsume all importing checks into a single flag
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28169

llvm-svn: 291108
2017-01-05 14:32:16 +00:00
Peter Collingbourne b2ce2b6805 IR: Module summary representation for type identifiers; summary test scaffolding for lowertypetests.
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).

Differential Revision: https://reviews.llvm.org/D28041

llvm-svn: 291069
2017-01-05 03:39:00 +00:00
Mehdi Amini 19ef4fad91 Use lazy-loading of Metadata in MetadataLoader when importing is enabled (NFC)
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.

Two main current limitations are:

1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.

These two limitations can be lifted in a subsequent improvement if
needed.

With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).

Reviewers: pcc, tejohnson, dexonsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28113

llvm-svn: 291027
2017-01-04 22:54:33 +00:00
Davide Italiano b672537cbf [PMBuilder] Remove RunFloat2Int cl::opt.
The pass has been on by default for a long time without problems.

llvm-svn: 290814
2017-01-02 17:49:18 +00:00
Chandler Carruth 9900d18bab [PM] Teach the inliner's call graph update to handle inserting new edges
when they are call edges at the leaf but may (transitively) be reached
via ref edges.

It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.

Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.

llvm-svn: 290649
2016-12-28 03:13:12 +00:00
Chandler Carruth 141bf5d14d [PM] Add one of the features left out of the initial inliner patch:
skipping indirectly recursive inline chains.

To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.

This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.

llvm-svn: 290590
2016-12-27 06:46:20 +00:00
Chandler Carruth 03130d981c [PM] Teach the inliner in the new PM to merge attributes after inlining.
Also enable the new PM in the attributes test case which caught this
issue.

llvm-svn: 290572
2016-12-27 03:39:54 +00:00
Chandler Carruth 6e9bb7e064 [PM] Teach the always inliner in the new pass manager to support
removing fully-dead comdats without removing dead entries in comdats
with live members.

This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.

I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).

llvm-svn: 290557
2016-12-26 23:43:27 +00:00
Davide Italiano fe7a3ee51e [NewGVN] Add a flag to enable the pass via `-mllvm`.
NewGVN can be tested passing `-mllvm -enable-newgvn` to clang.

Differential Revision:  https://reviews.llvm.org/D28059

llvm-svn: 290548
2016-12-26 18:26:19 +00:00
Chandler Carruth 4eaff12ba2 [PM] Teach the always inlining test case to be much more strict about
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.

Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.

Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?

Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.

llvm-svn: 290457
2016-12-23 23:33:35 +00:00
Mehdi Amini 94f86ad4e0 Function-import: Disable IRVerifier on lazy-loaded modules: the ODR TypeUniquing generates invalid debug info.
llvm-svn: 290442
2016-12-23 19:19:44 +00:00
Mehdi Amini fc06b83ee7 Fix build after r290437 (missing include)
llvm-svn: 290438
2016-12-23 18:04:51 +00:00
Mehdi Amini 9a9077fdad FunctionImport: fix typo '#ifndef NDEBUG' instead of '#ifndef DEBUG'
llvm-svn: 290437
2016-12-23 17:59:24 +00:00
Mehdi Amini 96cdc49305 [ThinLTO] Verify lazy-loaded source module for function importing when assertions are enabled (NFC)
llvm-svn: 290416
2016-12-23 05:16:19 +00:00
Evgeniy Stepanov 27d4c9b71b [cfi] Emit jump tables as a function-level inline asm.
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.

The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.

This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.

Differential Revision: https://reviews.llvm.org/D28012

llvm-svn: 290384
2016-12-22 22:22:35 +00:00
Easwaran Raman 180bd9f6b3 Pass GetAssumptionCache to InlineFunctionInfo constructor
Differential revision: https://reviews.llvm.org/D28038

llvm-svn: 290295
2016-12-22 01:07:01 +00:00
Adam Nemet 32e6a34c02 [LDist] Match behavior between invoking via optimization pipeline or opt -loop-distribute
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).

As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.

clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.

This changes opt's default to off as well to match clang.  The tests are
modified to explicitly enable the transformation.

llvm-svn: 290235
2016-12-21 04:07:40 +00:00
Peter Collingbourne 598bd2a262 IPO: Remove the ModuleSummary argument to the FunctionImport pass. NFCI.
No existing client is passing a non-null value here. This will come back
in a slightly different form as part of the type identifier summary work.

Differential Revision: https://reviews.llvm.org/D28006

llvm-svn: 290222
2016-12-21 00:50:12 +00:00
Chandler Carruth 1d96311447 [PM] Provide an initial, minimal port of the inliner to the new pass manager.
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.

Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
  this at all. Active discussion and investigation is going on to remove
  it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
  Why? Because it adds what I suspect is inappropriate coupling for
  little or no benefit. We will have an outer iteration system that
  tracks devirtualization including that from function passes and
  iterates already. We should improve that rather than approximate it
  here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
  reason at all.

The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.

A summary of the different things happening here:

1) Adding the usual new PM class and rigging.

2) Fixing minor underlying assumptions in the inline cost analysis or
   inline logic that don't generally hold in the new PM world.

3) Adding the core pass logic which is in essence a loop over the calls
   in the nodes in the call graph. This is a bit duplicated from the old
   inliner, but only a handful of lines could realistically be shared.
   (I tried at first, and it really didn't help anything.) All told,
   this is only about 100 lines of code, and most of that is the
   mechanics of wiring up analyses from the new PM world.

4) Updating the LazyCallGraph (in the new PM) based on the *newly
   inlined* calls and references. This is very minimal because we cannot
   form cycles.

5) When inlining removes the last use of a function, eagerly nuking the
   body of the function so that any "one use remaining" inline cost
   heuristics are immediately refined, and queuing these functions to be
   completely deleted once inlining is complete and the call graph
   updated to reflect that they have become dead.

6) After all the inlining for a particular function, updating the
   LazyCallGraph and the CGSCC pass manager to reflect the
   function-local simplifications that are done immediately and
   internally by the inline utilties. These are the exact same
   fundamental set of CG updates done by arbitrary function passes.

7) Adding a bunch of test cases to specifically target CGSCC and other
   subtle aspects in the new PM world.

Many thanks to the careful review from Easwaran and Sanjoy and others!

Differential Revision: https://reviews.llvm.org/D24226

llvm-svn: 290161
2016-12-20 03:15:32 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Evgeniy Stepanov 95294127d0 Revert "[GVNHoist] Move GVNHoist to function simplification part of pipeline."
This reverts r289696, which caused TSan perf regression.

See PR31382.

llvm-svn: 290030
2016-12-17 01:53:15 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Teresa Johnson edddca22c9 [ThinLTO] Thin link efficiency: More efficient export list computation
Summary:
Instead of checking whether a global referenced by a function being
imported is defined in the same module, speculatively always add the
referenced globals to the module's export list. After all imports are
computed, for each module prune any not in its defined set from its
export list.

For a huge C++ app with aggressive importing thresholds, even with
D27687 we spent a lot of time invoking modulePath() from
exportGlobalInModule (modulePath() was still the 2nd hottest routine in
profile). The reason is that with comdat/linkonce the summary lists for
each GUID can be long. For the app in question, for example, we were
invoking exportGlobalInModule almost 2 million times, and we traversed
an average of 63 entries in the summary list each time.

This patch reduced the thin link time for the app by about 10% (on top
of D27687) when using aggressive importing thresholds, and about 3.5% on
average with default importing thresholds.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27755

llvm-svn: 289918
2016-12-16 04:11:51 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Peter Collingbourne 7a4be21d46 Add missing library dep.
llvm-svn: 289903
2016-12-16 00:43:00 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Peter Collingbourne 1398a32e28 IPO: Introduce ThinLTOBitcodeWriter pass.
This pass prepares a module containing type metadata for ThinLTO by splitting
it into regular and thin LTO parts if possible, and writing both parts to
a multi-module bitcode file. Modules that do not contain type metadata are
written unmodified as a single module.

All globals with type metadata are added to the regular LTO module, and
the rest are added to the thin LTO module.

Differential Revision: https://reviews.llvm.org/D27324

llvm-svn: 289899
2016-12-16 00:26:30 +00:00
Teresa Johnson 19f2aa7891 [ThinLTO] Thin link efficiency improvement: don't re-export globals (NFC)
Summary:
We were reinvoking exportGlobalInModule numerous times redundantly.
No need to re-export globals referenced by a global that was already
imported from its module. This resulted in a large speedup in the thin
link for a big application, particularly when importing aggressiveness
was cranked up.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27687

llvm-svn: 289896
2016-12-15 23:50:06 +00:00
Teresa Johnson eb0ac24172 [ThinLTO] Revert part of r289843 that belonged to another patch.
The code change for D27687 accidentally got committed along with the
main change in r289843. Revert it temporarily, so that I can recommit it
along with its test as intended.

llvm-svn: 289875
2016-12-15 21:39:42 +00:00
Teresa Johnson 0c3f57b133 [ThinLTO] Remove stale comment (NFC)
This should have been removed with r288446.

llvm-svn: 289871
2016-12-15 20:53:31 +00:00
Teresa Johnson 475b51a700 [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
Summary:
Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27696

llvm-svn: 289867
2016-12-15 20:48:19 +00:00
Teresa Johnson 1b859a2306 [ThinLTO] Ensure callees get hot threshold when first seen on cold path
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.

Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.

llvm-svn: 289843
2016-12-15 18:21:01 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Dehao Chen 40dd8c5109 Only sets profile summary when it was not preset.
Summary: SampleProfileLoader pass may be invoked twice by LTO. The 2nd pass should not append more summary info as it is already preset by the 1st pass.

Reviewers: eraman, davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D27733

llvm-svn: 289725
2016-12-14 22:06:49 +00:00
Dehao Chen fb699619a0 Fix the bug in r289714 (NFC).
llvm-svn: 289724
2016-12-14 22:03:08 +00:00
Dehao Chen a99e082e15 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289714
2016-12-14 21:40:47 +00:00
Geoff Berry ca11a1e147 [GVNHoist] Move GVNHoist to function simplification part of pipeline.
Summary:
Move GVNHoist to later in the optimization pipeline, specifically, to
the function simplification part of the pipeline.  The new pipeline
location allows GVNHoist to run on a function after its callees have
been inlined but before the function has been considered for inlining
into its callers, exposing more opportunities for hoisting.

Performance results on AArch64 kryo:
Improvements:
  Benchmarks/CoyoteBench/fftbench  -24.952%
  spec2006/bzip2                    -4.071%
  internal bmark                    -3.177%
  Benchmarks/PAQ8p/paq8p            -1.754%
  spec2000/perlbmk                  -1.328%
  spec2006/h264ref                  -1.140%

Regressions:
  internal bmark                    +1.818%
  Benchmarks/mafft/pairlocalalign   +1.084%

Reviewers: sebpop, dberlin, hiraditya

Subscribers: aemerson, mehdi_amini, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27722

llvm-svn: 289696
2016-12-14 19:38:22 +00:00
Dehao Chen 23025f8483 revert r289669 which breaks bots
llvm-svn: 289676
2016-12-14 17:23:16 +00:00
Dehao Chen cb61c94d87 Create SampleProfileLoader pass in llvm instead of clang
Summary: We used to create SampleProfileLoader pass in clang. This makes LTO/ThinLTO unable to add this pass in the linker plugin. This patch moves the SampleProfileLoader pass creation from clang to llvm pass manager builder.

Reviewers: tejohnson, davidxl, dnovillo

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D27743

llvm-svn: 289669
2016-12-14 16:49:28 +00:00
Dehao Chen 0f35fa907d Change CoverageTracker from a global variable to member variable to avoid breaking thread-safety. (NFC)
llvm-svn: 289603
2016-12-13 22:13:18 +00:00
Teresa Johnson a29bd6ffcc [ThinLTO] Remove useless code (NFC)
Should have been removed in r288446.

llvm-svn: 289466
2016-12-12 20:34:28 +00:00
Peter Collingbourne 8786754cc3 WholeProgramDevirt: Teach the pass to handle structs of arrays.
This will become necessary in some cases once D22296 lands.

llvm-svn: 289165
2016-12-09 01:10:11 +00:00
Peter Collingbourne 7a1e5bbe4e Make WholeProgramDevirt understand ConstStruct vtables.
Based on a patch by LemonBoy!

Differential Revision: https://reviews.llvm.org/D26581

llvm-svn: 289162
2016-12-09 00:33:27 +00:00
Evgeniy Stepanov 0c8957c198 CFI-icall on Thumb
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.

This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.

Differential Revision: https://reviews.llvm.org/D27499

llvm-svn: 289008
2016-12-08 00:32:26 +00:00
Benjamin Kramer b1332d8bf6 Try unbreaking the MSVC build.
llvm-svn: 288907
2016-12-07 13:35:11 +00:00
Benjamin Kramer 926ab5b00b [LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects.
Also avoid allocating ~3x as much memory as needed.

llvm-svn: 288904
2016-12-07 12:31:45 +00:00
Peter Collingbourne 7357b2ad62 LowerTypeTests: Improve performance by optimising type metadata queries.
Requesting metadata for a global is a relatively expensive operation as it
involves a map lookup, but it's one that we need to do relatively frequently in
this pass to collect the list of type metadata nodes associated with a global.
This change improves the performance of type metadata queries by prebuilding
data structures that keep the global together with its list of type metadata,
and changing the pass to use that data structure wherever we were previously
passing global references around.

This change also eliminates some O(N^2) behavior by collecting the list of
globals associated with each type identifier during the first pass over the
list of globals rather than visiting each global to compute that list every
time we add a new type identifier.

Reduces pass runtime on a module containing Chrome's vtables from over 60s
to 0.9s.

Differential Revision: https://reviews.llvm.org/D27484

llvm-svn: 288859
2016-12-06 23:02:13 +00:00
Peter Collingbourne bc0705240e IR: Move NumElements field from {Array,Vector}Type to SequentialType.
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.

Differential Revision: https://reviews.llvm.org/D27122

llvm-svn: 288464
2016-12-02 03:20:58 +00:00
Peter Collingbourne 4568158c4d IR: Change PointerType to derive from Type rather than SequentialType.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html

This is for a couple of reasons:

- Values of type PointerType are unlike the other SequentialTypes (arrays
  and vectors) in that they do not hold values of the element type. By moving
  PointerType we can unify certain aspects of how the other SequentialTypes
  are handled.
- PointerType will have no place in the SequentialType hierarchy once
  pointee types are removed, so this is a necessary step towards removing
  pointee types.

Differential Revision: https://reviews.llvm.org/D26595

llvm-svn: 288462
2016-12-02 03:05:41 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Teresa Johnson 185b4ab6d4 [ThinLTO] Stop importing constant global vars as copies in the backend
Summary:
We were doing an optimization in the ThinLTO backends of importing
constant unnamed_addr globals unconditionally as a local copy (regardless
of whether the thin link decided to import them). This should be done in
the thin link instead, so that resulting exported references are marked
and promoted appropriately, but will need a summary enhancement to mark
these variables as constant unnamed_addr.

The function import logic during the thin link was trying to handle
this proactively, by conservatively marking all values referenced in
the initializer lists of exported global variables as also exported.
However, this only handled values referenced directly from the
initializer list of an exported global variable. If the value is itself
a constant unnamed_addr variable, we could end up exporting its
references as well. This caused multiple issues. The first is that the
transitively exported references weren't promoted. Secondly, some could
not be promoted/renamed (e.g. they had a section or other constraint).
recursively, instead of just adding the first level of initializer list
references to the ExportList directly.

Remove this optimization and the associated handling in the function
import backend. SPEC measurements indicate we weren't getting much
from it in any case.

Fixes PR31052.

Reviewers: mehdi_amini

Subscribers: krasin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26880

llvm-svn: 288446
2016-12-02 01:02:30 +00:00
Peter Collingbourne 863cbfbeba Object: Extract a ModuleSymbolTable class from IRObjectFile.
This class represents a symbol table built from in-memory IR. It provides
access to GlobalValues and should only be used if such access is required
(e.g. in the LTO implementation). We will eventually change IRObjectFile
to read from a bitcode symbol table rather than using ModuleSymbolTable,
so it would not be able to expose the module.

Differential Revision: https://reviews.llvm.org/D27073

llvm-svn: 288319
2016-12-01 06:51:47 +00:00
David Blaikie 831b652020 Use CallSite to simplify code
llvm-svn: 288192
2016-11-29 19:42:27 +00:00
David Majnemer d5648c7a7d Replace some callers of setTailCall with setTailCallKind
We were a little sloppy with adding tailcall markers.  Be more
consistent by using setTailCallKind instead of setTailCall.

llvm-svn: 287955
2016-11-25 22:35:09 +00:00
Dehao Chen 554f500ae2 Before sample pgo annotation, do not inline a function that has no debug info. (NFC)
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.

llvm-svn: 287710
2016-11-22 22:50:01 +00:00
Davide Italiano 2ae76dd239 [GlobalSplit] Port to the new pass manager.
llvm-svn: 287511
2016-11-21 00:28:23 +00:00
Simon Pilgrim 7d18a70dac Fix spelling mistakes in Transforms comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287488
2016-11-20 13:19:49 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Peter Collingbourne f72a8d4e08 Introduce GlobalSplit pass.
This pass splits globals into elements using inrange annotations on
getelementptr indices.

Differential Revision: https://reviews.llvm.org/D22295

llvm-svn: 287178
2016-11-16 23:40:26 +00:00
Teresa Johnson 4fef68cb8d [ThinLTO] Only promote exported locals as marked in index
Summary:
We have always speculatively promoted all renamable local values
(except const non-address taken variables) for both the exporting
and importing module. We would then internalize them back based on
the ThinLink results if they weren't actually exported. This is
inefficient, and results in unnecessary renames. It also meant we
had to check the non-renamability of a value in the summary, which
was already checked during function importing analysis in the ThinLink.

Made renameModuleForThinLTO (which does the promotion/renaming) instead
use the index when exporting, to avoid unnecessary renames/promotions.
For importing modules, we can simply promoted all values as any local
we import by definition is exported and needs promotion.

This required changes to the method used by the FunctionImport pass
(only invoked from 'opt' for testing) and when invoked from llvm-link,
since neither does a ThinLink. We simply conservatively mark all locals
in the index as promoted, which preserves the current aggressive
promotion behavior.

I also needed to change an llvm-lto based test where we had previously
been aggressively promoting values that weren't importable (aliasees),
but now will not promote.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26467

llvm-svn: 286871
2016-11-14 19:21:41 +00:00
Teresa Johnson d5033a4576 [ThinLTO] Make inline assembly handling more efficient in summary
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.

Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.

We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26402

llvm-svn: 286840
2016-11-14 16:40:19 +00:00
Peter Collingbourne d9445c49ad Bitcode: Change module reader functions to return an llvm::Expected.
Differential Revision: https://reviews.llvm.org/D26562

llvm-svn: 286752
2016-11-13 07:00:17 +00:00
Evgeniy Stepanov 1fe189d795 [cfi] Fix weak functions handling.
When a function pointer is replaced with a jumptable pointer, special
case is needed to preserve the semantics of extern_weak functions.
Since a jumptable entry can not be extern_weak, we emulate that
behaviour by replacing all references to F (the extern_weak function)
with the following expression: F != nullptr ? JumpTablePtr : nullptr.

Extra special care is needed for global initializers, since most (or
probably all) backends can not lower an initializer that includes
this kind of constant expression. Initializers like that are replaced
with a global constructor (i.e. a runtime initializer).

llvm-svn: 286636
2016-11-11 21:39:26 +00:00
Erik Eckstein 4d6fb72aa9 Make the FunctionComparator of the MergeFunctions pass a stand-alone utility.
This is pure refactoring. NFC.

This change moves the FunctionComparator (together with the GlobalNumberState
utility) in to a separate file so that it can be used by other passes.
For example, the SwiftMergeFunctions pass in the Swift compiler:
https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp

Details of the change:

*) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
*) Make FunctionComparator member functions protected (instead of private)
   so that a derived comparator class can use them.

Following refactoring helps to share code between the base FunctionComparator
class and a derived class:

*) Add a beginCompare() function
*) Move some basic function property comparisons into a separate function compareSignature()
*) Do the GEP comparison inside cmpOperations() which now has a new
   needToCmpOperands reference parameter

https://reviews.llvm.org/D25385

llvm-svn: 286632
2016-11-11 21:15:13 +00:00
Peter Collingbourne 6de481a378 Bitcode: Change getModuleSummaryIndex() to return an llvm::Expected.
Differential Revision: https://reviews.llvm.org/D26539

llvm-svn: 286624
2016-11-11 19:50:39 +00:00
Evgeniy Stepanov f48ffab554 [cfi] Implement cfi-icall using inline assembly.
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.

This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.

llvm-svn: 286611
2016-11-11 18:49:09 +00:00
Dehao Chen 5492f8646c Add comments about why we put LoopSink pass at the very late stage.
llvm-svn: 286480
2016-11-10 17:42:18 +00:00
Peter Collingbourne 7f00d0a125 Bitcode: Change the materializer interface to return llvm::Error.
Differential Revision: https://reviews.llvm.org/D26439

llvm-svn: 286382
2016-11-09 17:49:19 +00:00
Dehao Chen 947dbe1254 Enable Loop Sink pass for functions that has profile.
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.

Reviewers: davidxl, hfinkel

Subscribers: mehdi_amini, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26155

llvm-svn: 286325
2016-11-09 00:58:19 +00:00
Chad Rosier fbc7b7d154 Fix typo in comment. NFC.
llvm-svn: 286270
2016-11-08 19:10:25 +00:00
Chad Rosier c244349b85 Remove unused include. NFC.
llvm-svn: 286250
2016-11-08 16:51:19 +00:00
Chad Rosier 611b73b1f9 Fix 80-column violations. NFC.
llvm-svn: 286117
2016-11-07 16:28:04 +00:00
Xinliang David Li f450b88191 Fix typo
llvm-svn: 285978
2016-11-04 03:00:52 +00:00
Teresa Johnson 58fbc916a0 [ThinLTO] Rename HasSection to NoRename (NFC)
Summary:
This is in preparation for a change to utilize this flag for symbols
referenced/defined in either inline or module level assembly.

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26048

llvm-svn: 285376
2016-10-28 02:24:59 +00:00
Peter Collingbourne 4f3b2df9bb GlobalDCE: Restore a statement accidentally removed in r285048.
llvm-svn: 285052
2016-10-25 02:57:27 +00:00
Peter Collingbourne 7695cb6da8 GlobalDCE: Deduplicate code. NFCI.
llvm-svn: 285048
2016-10-25 01:58:26 +00:00
Rong Xu 1c0e9b97d2 Conditionally eliminate library calls where the result value is not used
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
   sqrt(val);
 is transformed to
   if (val < 0)
     sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.

These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.

Reviewers: hfinkel, mehdi_amini, davidxl

Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz

Differential Revision: https://reviews.llvm.org/D24414

llvm-svn: 284542
2016-10-18 21:36:27 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
David Callahan c1051ab26e Modify df_iterator to support post-order actions
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes.  It will however allow a client to distinguish back from cross edges in a DFS tree.

Reviewers: nadav, mehdi_amini, dberlin

Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D25191

llvm-svn: 283391
2016-10-05 21:36:16 +00:00
Sanjoy Das 0359a193a7 [PruneEH] Be correct in the face IPO
This fixes one spot I had missed in r265762.  Credit goes to Philip
Reames for spotting this one!

llvm-svn: 283137
2016-10-03 19:35:30 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Piotr Padlewski d28694739c [thinlto] Don't decay threshold for hot callsites
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24976

llvm-svn: 282833
2016-09-30 03:01:17 +00:00
Piotr Padlewski ba72b95f7b [thinlto] Add cold-callsite import heuristic
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006

Reviewers: tejohnson, mehdi_amini, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24940

llvm-svn: 282733
2016-09-29 17:32:07 +00:00
Dehao Chen 5461d8bdb5 Refactor the ProfileSummaryInfo to use doInitialization and doFinalization to handle Module update.
Summary: This refactors the change in r282616

Reviewers: davidxl, eraman, mehdi_amini

Subscribers: mehdi_amini, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D25041

llvm-svn: 282630
2016-09-28 21:00:58 +00:00
Adam Nemet c507ac96f5 [Inliner] Port all opt remarks to new streaming API
llvm-svn: 282559
2016-09-27 23:47:03 +00:00
Adam Nemet 04758ba385 Shorten DiagnosticInfoOptimizationRemark* to OptimizationRemark*. NFC
With the new streaming interface, these class names need to be typed a
lot and it's way too looong.

llvm-svn: 282544
2016-09-27 22:19:23 +00:00
Adam Nemet 1142147e41 [Inliner] Fold the analysis remark into the missed remark
There is really no reason for these to be separate.

The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed.  There,
you have to query analysis to get the full picture.

I think we should just explain the reason for missing the optimization
in the missed remark when possible.  Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.

llvm-svn: 282542
2016-09-27 21:58:17 +00:00
Adam Nemet a62b7e1a28 Output optimization remarks in YAML
(Re-committed after moving the template specialization under the yaml
namespace.  GCC was complaining about this.)

This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282539
2016-09-27 20:55:07 +00:00
Adam Nemet cc2a3fa8e8 Revert "Output optimization remarks in YAML"
This reverts commit r282499.

The GCC bots are failing

llvm-svn: 282503
2016-09-27 16:39:24 +00:00
Adam Nemet 92e928c10a Output optimization remarks in YAML
This allows various presentation of this data using an external tool.
This was first recommended here[1].

As an example, consider this module:

  1 int foo();
  2 int bar();
  3
  4 int baz() {
  5   return foo() + bar();
  6 }

The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):

  remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
  remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)

Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:

  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 10 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: foo
    - String:  will not be inlined into
    - Caller: baz
  ...
  --- !Missed
  Pass:            inline
  Name:            NotInlined
  DebugLoc:        { File: /tmp/s.c, Line: 5, Column: 18 }
  Function:        baz
  Hotness:         30
  Args:
    - Callee: bar
    - String:  will not be inlined into
    - Caller: baz
  ...

This is a summary of the high-level decisions:

* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:

   ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
                DEBUG_TYPE, "NotInlined", &I)
            << NV("Callee", Callee) << " will not be inlined into "
            << NV("Caller", CS.getCaller()) << setIsVerbose());

NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.

Subsequent patches will update ORE users to use the new streaming API.

* I am using YAML I/O for writing the YAML file.  YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types.  Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.

On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).

* The YAML stream is stored in the LLVM context.

* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".

* As before hotness is computed in the analysis pass instead of
DiganosticInfo.  This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.

[1] https://reviews.llvm.org/D19678#419445

Differential Revision: https://reviews.llvm.org/D24587

llvm-svn: 282499
2016-09-27 16:15:16 +00:00
Ivan Krasin 4ff4f21e15 Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.

Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.

Reviewers: pcc

Subscribers: kcc

Differential Revision: https://reviews.llvm.org/D24948

llvm-svn: 282461
2016-09-27 00:29:53 +00:00
Peter Collingbourne 53a852b648 LowerTypeTests: Remove unused variable.
llvm-svn: 282456
2016-09-26 23:56:17 +00:00
Peter Collingbourne 6ed92e3f53 LowerTypeTests: Create LowerTypeTestsModule class and move implementation there. Related simplifications.
llvm-svn: 282455
2016-09-26 23:54:39 +00:00
Piotr Padlewski d9830eb79f [thinlto] Basic thinlto fdo heuristic
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.

I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.

I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.

Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D24638

llvm-svn: 282437
2016-09-26 20:37:32 +00:00
Dehao Chen 160fbc3f95 Change the basic block weight calculation algorithm to use max instead of voting.
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.

Reviewers: dnovillo

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24788

llvm-svn: 282084
2016-09-21 16:26:51 +00:00
Arnold Schwaighofer f62ba1031f DeadArgElim: Don't mark swifterror arguments as unused
Replacing swifterror arguments with undef creates invalid IR.

rdar://28300490

llvm-svn: 282075
2016-09-21 15:29:08 +00:00
Dehao Chen 20866ed57e Handle early inline for hot callsites that reside in the same basic block.
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24734

llvm-svn: 281927
2016-09-19 18:38:14 +00:00
Dehao Chen 82667d04c5 Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability.
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.

Reviewers: dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24732

llvm-svn: 281912
2016-09-19 16:33:41 +00:00
Dehao Chen 38e3731c47 Use call target count to derive the call instruction weight
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24410

llvm-svn: 281911
2016-09-19 16:06:37 +00:00
Dehao Chen 41cde0b986 Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D24409

llvm-svn: 281870
2016-09-18 23:11:37 +00:00
Teresa Johnson fbb431b292 [ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.

Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.

Fixes PR30419.

Reviewers: mehdi_amini

Subscribers: probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24701

llvm-svn: 281840
2016-09-17 20:40:16 +00:00
Mehdi Amini a53d49e1b5 Don't create a SymbolTable in Function when the LLVMContext discards value names (NFC)
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.

This is a recommit of r281806 after fixing the accessor to return
a pointer instead of a reference and updating all the call-sites.

llvm-svn: 281813
2016-09-17 06:00:02 +00:00
Mehdi Amini 27d2379b4e Rename NameAnonFunctions to NameAnonGlobals to match what it is doing (NFC)
llvm-svn: 281745
2016-09-16 16:56:30 +00:00
Mehdi Amini d880309835 [GlobalOpt] Dead Eliminate declarations
GlobalOpt is already dead-code-eliminating global definitions. With
this change it also takes care of declarations.
Hopefully this should make it now a strict superset of GlobalDCE.
This is important for LTO/ThinLTO as we don't want the linker to see
"undefined reference" when it processes the input files: it could
prevent proper internalization (or even load an extra file from a
static archive, changing the behavior of the program!).

llvm-svn: 281653
2016-09-15 20:26:27 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
David Majnemer c83044d9bb [FunctionAttrs] Don't try to infer returned if it is already on an argument
Trying to infer the 'returned' attribute if an argument is already
'returned' can lead to verification failure: inference might determine
that a different argument is passed through which would result in two
different arguments marked as 'returned'.

This fixes PR30350.

llvm-svn: 281221
2016-09-12 16:04:59 +00:00
Tim Shen 48f814e8a3 s/static inline/static/ for headers I have changed in r279475. NFC.
llvm-svn: 280257
2016-08-31 16:48:13 +00:00
Teresa Johnson 8c1bc986b5 [ThinLTO] Indirect call promotion fixes for promoted local functions
Summary:
Fix a couple issues limiting the application of indirect call promotion
in ThinLTO mode:
- Invoke indirect call promotion before globalopt, since it may
  eliminate imported functions which appear unreferenced.
- Invoke indirect call promotion with InLTO=true so that the PGOFuncName
  metadata is used to get the name for locals which would have been
  renamed during promotion.

Reviewers: davidxl, mehdi_amini

Subscribers: Prazek, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D24004

llvm-svn: 280024
2016-08-29 22:46:56 +00:00
Adam Nemet cef3314156 [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Chandler Carruth 8882346842 [PM] Introduce basic update capabilities to the new PM's CGSCC pass
manager, including both plumbing and logic to handle function pass
updates.

There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
   CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
   the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

- We operate at three levels within the infrastructure: RefSCC, SCC, and
  Node. In each case, we are working bottom up and so we want to
  continue to iterate on the "lowest" node as the graph changes. Look at
  how we iterate over nodes in an SCC running function passes as those
  function passes mutate the CG. We continue to iterate on the "lowest"
  SCC, which is the one that continues to contain the function just
  processed.

- The call graph structure re-uses SCCs (and RefSCCs) during mutation
  events for the *highest* entry in the resulting new subgraph, not the
  lowest. This means that it is necessary to continually update the
  current SCC or RefSCC as it shifts. This is really surprising and
  subtle, and took a long time for me to work out. I actually tried
  changing the call graph to provide the opposite behavior, and it
  breaks *EVERYTHING*. The graph update algorithms are really deeply
  tied to this particualr pattern.

- When SCCs or RefSCCs are split apart and refined and we continually
  re-pin our processing to the bottom one in the subgraph, we need to
  enqueue the newly formed SCCs and RefSCCs for subsequent processing.
  Queuing them presents a few challenges:
  1) SCCs and RefSCCs use wildly different iteration strategies at
     a high level. We end up needing to converge them on worklist
     approaches that can be extended in order to be able to handle the
     mutations.
  2) The order of the enqueuing need to remain bottom-up post-order so
     that we don't get surprising order of visitation for things like
     the inliner.
  3) We need the worklists to have set semantics so we don't duplicate
     things endlessly. We don't need a *persistent* set though because
     we always keep processing the bottom node!!!! This is super, super
     surprising to me and took a long time to convince myself this is
     correct, but I'm pretty sure it is... Once we sink down to the
     bottom node, we can't re-split out the same node in any way, and
     the postorder of the current queue is fixed and unchanging.
  4) We need to make sure that the "current" SCC or RefSCC actually gets
     enqueued here such that we re-visit it because we continue
     processing a *new*, *bottom* SCC/RefSCC.

- We also need the ability to *skip* SCCs and RefSCCs that get merged
  into a larger component. We even need the ability to skip *nodes* from
  an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

- It is really nice to do this a function at a time because that
  function is likely hot in the cache. This means we want even the
  function pass adaptor to support online updates to the call graph!

- To update the call graph after arbitrary function pass mutations is
  quite hard. We have to build a fairly comprehensive set of
  data structures and then process them. Fortunately, some of this code
  is related to the code for building the cal graph in the first place.
  Unfortunately, very little of it makes any sense to share because the
  nature of what we're doing is so very different. I've factored out the
  one part that made sense at least.

- We need to transfer these updates into the various structures for the
  CGSCC pass manager. Once those were more sanely worked out, this
  became relatively easier. But some of those needs necessitated changes
  to the LazyCallGraph interface to make it significantly easier to
  extract the changed SCCs from an update operation.

- We also need to update the CGSCC analysis manager as the shape of the
  graph changes. When an SCC is merged away we need to clear analyses
  associated with it from the analysis manager which we didn't have
  support for in the analysis manager infrsatructure. New SCCs are easy!
  But then we have the case that the original SCC has its shape changed
  but remains in the call graph. There we need to *invalidate* the
  analyses associated with it.

- We also need to invalidate analyses after we *finish* processing an
  SCC. But the analyses we need to invalidate here are *only those for
  the newly updated SCC*!!! Because we only continue processing the
  bottom SCC, if we split SCCs apart the original one gets invalidated
  once when its shape changes and is not processed farther so its
  analyses will be correct. It is the bottom SCC which continues being
  processed and needs to have the "normal" invalidation done based on
  the preserved analyses set.

All of this is mostly background and context for the changes here.

Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.

Differential Revision: http://reviews.llvm.org/D21464

llvm-svn: 279618
2016-08-24 09:37:14 +00:00
Tim Shen f2187ed321 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Chandler Carruth f702d8ecb6 [Inliner] Add a flag to disable manual alloca merging in the Inliner.
This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.

Some context on why we should be using stack coloring rather than
merging allocas in this way:

LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.

As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.

All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.

For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html

Differential Revision: https://reviews.llvm.org/D23052

llvm-svn: 278892
2016-08-17 02:40:23 +00:00
Duncan P. N. Exon Smith 9e3edad932 IPO: Swap || operands to avoid dereferencing end()
IsOperandBundleUse conveniently indicates  whether
std::next(F->arg_begin(),UseIndex) will get to (or past) end().  Check
it first to avoid dereferencing end().

llvm-svn: 278884
2016-08-17 01:23:58 +00:00
Mehdi Amini 88c491ddec FunctionImport: missed one occurence of ImportListForModule to rename (NFC)
llvm-svn: 278778
2016-08-16 05:49:12 +00:00
Mehdi Amini 9b490f10e1 FunctionImport: rename ImportsForModule to ImportList for consistency (NFC)
llvm-svn: 278777
2016-08-16 05:47:12 +00:00
Mehdi Amini cdbcbf7477 [LTO] Simplify APIs and constify (NFC)
Summary:
Multiple APIs were taking a StringMap for the ImportLists containing
the entries for for all the modules while operating on a single entry
for the current module. Instead we can pass the desired ModuleImport
directly. Also some of the APIs were not const, I believe just to be
able to use operator[] on the StringMap.

Reviewers: tejohnson

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23537

llvm-svn: 278776
2016-08-16 05:46:05 +00:00
Teresa Johnson 6107a4195d [ThinLTO] Remove functions resolved to available_externally from comdats
Summary:
thinLTOResolveWeakForLinkerModule needs to drop any preempted weak symbols
that were converted to available_externally from comdats, otherwise we
will get a verification failure (since available_externally is a
declaration for the linker, and no declarations can be in a comdat).

Reviewers: mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23015

llvm-svn: 278739
2016-08-15 21:00:04 +00:00
Dehao Chen c0a1e432c7 Fine tuning of sample profile propagation algorithm.
Summary: The refined propagation algorithm is more accurate and robust.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23224

llvm-svn: 278522
2016-08-12 16:22:12 +00:00
Ivan Krasin 89439a7939 WholeProgramDevirt: initialize WasDevirt in all constructors.
Summary: This is a follow up to r278389 and r278442.

Differential Revision: https://reviews.llvm.org/D23438

llvm-svn: 278455
2016-08-12 01:40:10 +00:00
Piotr Padlewski 332b3b2210 Don't import variadic functions
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.

This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23339

llvm-svn: 278432
2016-08-11 22:13:57 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Ivan Krasin f3403fd2c8 WholeProgramDevirt: generate more detailed and accurate remarks.
Summary:
Keep track of all methods for which we have devirtualized at least
one call and then print them sorted alphabetically. That allows to
avoid duplicates and also makes the order deterministic.

Add optimization names into the remarks, so that it's easier to
understand how has each method been devirtualized.

Fix a bug when wrong methods could have been reported for
tryVirtualConstProp.

Reviewers: kcc, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23297

llvm-svn: 278389
2016-08-11 19:09:02 +00:00
Easwaran Raman 61edc107bb Add a new method to create SimpleInliner instance and make pre-inliner use this.
This adds a createFunctionInliningPass pass that takes an InlineParams object and use this to create the pre-inliner pass. This prevents the regular inliner's threshold flag from influencing the preinliner.

Differential revision: https://reviews.llvm.org/D23377

llvm-svn: 278377
2016-08-11 18:24:08 +00:00
Eugene Zelenko cdc7161281 Fix some Clang-tidy modernize and Include What You Use warnings.
Differential revision: https://reviews.llvm.org/D23291

llvm-svn: 278364
2016-08-11 17:20:18 +00:00
Piotr Padlewski d89875ca39 Changed sign of LastCallToStaticBouns
Summary:
I think it is much better this way.
When I firstly saw line:
  Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23222

llvm-svn: 278290
2016-08-10 21:15:22 +00:00
Easwaran Raman 1c57cc2b68 Do not directly use inline threshold cl options in cost analysis.
This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use.

Differential revision: https://reviews.llvm.org/D22120

llvm-svn: 278189
2016-08-10 00:48:04 +00:00
Adam Nemet 896c09bd10 [Inliner,OptDiag] Add hotness attribute to opt diagnostics
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.

BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level).  Thus there is no additional overhead without the
flag.

Reviewers: hfinkel, davidxl, eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22694

llvm-svn: 278185
2016-08-10 00:44:44 +00:00
Sean Silva fd03ac6a0c Consistently use ModuleAnalysisManager
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278078
2016-08-09 00:28:38 +00:00
Derek Schuff b7d6d9e3cd [WebAssembly] Fix CFI index to account for padding nullptr function
The WebAssembly linker now creates a dummy function at index 0 to
prevent miscomparisons with the NULL pointer, see
https://github.com/WebAssembly/binaryen/pull/658. Thanks to pcc for
pointing out this problem!

Patch by Dominic Chen

Differential Revision: https://reviews.llvm.org/D23137

llvm-svn: 278073
2016-08-08 23:56:01 +00:00
Sebastian Pop bfb96c5bfd GVN-hoist: enable by default
llvm-svn: 278010
2016-08-08 14:46:15 +00:00
Sean Silva 0873e7d218 Add some comments linking back to PR28400.
Thanks to Mehdi for the suggestion!

llvm-svn: 277984
2016-08-08 07:03:49 +00:00
Sean Silva 744f7a843f [PM] Invalidate CallGraphAnalysis because it holds AssertingVH
This is essentially PR28400. The fix here is similar to that implemented
in r274656.

llvm-svn: 277980
2016-08-08 05:38:01 +00:00
Benjamin Kramer 41e66dade1 [Inliner] Use function_ref for functors which are never taken ownership of.
llvm-svn: 277922
2016-08-06 12:33:46 +00:00
Matthias Braun 9a0035d8d2 Revert "(refs/bisect/bad) GVN-hoist: enable by default"
GVN-Hoist appears to miscompile llvm-testsuite
SingleSource/Benchmarks/Misc/fbench.c at the moment.

I filed http://llvm.org/PR28880

This reverts commit r277786.

llvm-svn: 277909
2016-08-06 02:23:15 +00:00
Ivan Krasin b05e06e4fd WholeProgramDevirt: print remarks with devirtualized method names.
Summary:
Chrome on Linux uses WholeProgramDevirt for speed ups, and it's
important to detect regressions on both sides: the toolchain,
if fewer methods get devirtualized after an update, and Chrome,
if an innocently looking change caused many hot methods become
virtual again.

The need to track devirtualized methods is not Chrome-specific,
but it's probably the only user of the pass at this time.

Reviewers: kcc

Differential Revision: https://reviews.llvm.org/D23219

llvm-svn: 277856
2016-08-05 19:45:16 +00:00
Sebastian Pop c33f0e25c9 GVN-hoist: enable by default
llvm-svn: 277786
2016-08-04 23:49:07 +00:00
Bruno Cardoso Lopes bd887581fc Revert "GVN-hoist: enable by default" & "Make GVN Hoisting obey optnone/bisect."
This reverts commits r277685 & r277688. r277685 broke compiler-rt
compilation http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/23335
and r277685 is a followup from it.

llvm-svn: 277690
2016-08-04 04:16:24 +00:00
Sebastian Pop 70ffe6523f GVN-hoist: enable by default
As we addressed all compilation time problems with GVN-hoist
https://llvm.org/bugs/show_bug.cgi?id=28670
this patch turns GVN-hoist back by default.

Differential Revision: https://reviews.llvm.org/D23136

llvm-svn: 277685
2016-08-04 01:59:42 +00:00
Chandler Carruth 8562d3a5e4 [Inliner] clang-format various parts of the inliner prior to changes
here. NFC.

llvm-svn: 277557
2016-08-03 01:02:31 +00:00
Ivan Krasin 3aade11252 Add -lowertypetests-bitsets-level to control bitsets generation.
Summary:
Sometimes, bitsets could get really large (>300k entries) and
we might want to drop a check, as it would have a too much cost.

Adding a flag to control how much penalty are we willing to pay
for bitsets.

Reviewers: kcc

Differential Revision: https://reviews.llvm.org/D23088

llvm-svn: 277556
2016-08-03 00:59:38 +00:00
Sean Silva f801575fd0 CodeExtractor : Add ability to preserve profile data.
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22744

llvm-svn: 277411
2016-08-02 02:15:45 +00:00
Tim Shen b44909eccb [ADT] NFC: Generalize GraphTraits requirement of "NodeType *" in interfaces to "NodeRef", and migrate SCCIterator.h to use NodeRef
Summary: By generalize the interface, users are able to inject more flexible Node token into the algorithm, for example, a pair of vector<Node>* and index integer. Currently I only migrated SCCIterator to use NodeRef, but more is coming. It's a NFC.

Reviewers: dblaikie, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22937

llvm-svn: 277399
2016-08-01 22:32:20 +00:00
Derek Schuff c64d7655b2 [WebAssembly] Support CFI for WebAssembly target
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.

Patch by Dominic Chen

Differential Revision: https://reviews.llvm.org/D21768

llvm-svn: 277398
2016-08-01 22:25:02 +00:00
Sean Silva 423c7149dc Revert r277313 and r277314.
They seem to trigger an LSan failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio

Revert "Add the tests for r277313"

This reverts commit r277314.

Revert "CodeExtractor : Add ability to preserve profile data."

This reverts commit r277313.

llvm-svn: 277317
2016-08-01 04:16:09 +00:00
Sean Silva 6208924323 CodeExtractor : Add ability to preserve profile data.
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22744

llvm-svn: 277313
2016-08-01 02:59:26 +00:00
David Majnemer d536f2328e [ConstnatFolding] Teach the folder how to fold ConstantVector
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.

Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.

llvm-svn: 277099
2016-07-29 03:27:26 +00:00
Piotr Padlewski 84abc74f2c Added ThinLTO inlining statistics
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
 \brief Calculate and dump ThinLTO specific inliner stats.
 The main statistics are:
 (1) Number of inlined imported functions,
 (2) Number of imported functions inlined into importing module (indirect),
 (3) Number of non imported functions inlined into importing module
 (indirect).
 The difference between first and the second is that first stat counts
 all performed inlines on imported functions, but the second one only the
 functions that have been eventually inlined to a function in the importing
 module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
 possible to e.g. import function `A`, `B` and then inline `B` to `A`,
 and after this `A` might be too big to be inlined into some other function
 that calls it. It calculates this statistic by building graph, where
 the nodes are functions, and edges are performed inlines and then by marking
 the edges starting from not imported function.

 If `Verbose` is set to true, then it also dumps statistics
 per each inlined function, sorted by the greatest inlines count like
 - number of performed inlines
 - number of performed inlines to importing module

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D22491

llvm-svn: 277089
2016-07-29 00:27:16 +00:00
David Majnemer 6e9b47bc8a Add EP_CGSCCOptimizerLate extension point to PassManagerBuilder
The EP_CGSCCOptimizerLate extension point allows adding CallGraphSCC
passes at the end of the main CallGraphSCC passes and before any
function simplification passes run by CGPassManager.

Patch by Gor Nishanov!

Differential Revision: https://reviews.llvm.org/D22897

llvm-svn: 276953
2016-07-28 03:28:43 +00:00
Sean Silva 519323db58 Cleanup : Reformat PartialInliner.cpp to have current LLVM style conventions
Modify the variable names and code style to be that of modern LLVM.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22743

llvm-svn: 276610
2016-07-25 05:57:59 +00:00
Sean Silva fe5abd5e0c Fix : Partial Inliner requires AssumptionCacheTracker
The public InlineFunction utility assumes that the passed in
InlineFunctionInfo has a valid AssumptionCacheTracker.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22706

llvm-svn: 276609
2016-07-25 05:00:00 +00:00
Xinliang David Li 9239245401 [Profile] Use explicit flag to enable IR PGO
Patch by Jake VanAdrighem

Differential Revision: http://reviews.llvm.org/D22607

llvm-svn: 276516
2016-07-23 04:28:52 +00:00
Sean Silva ab6a683765 Avoid using a raw AssumptionCacheTracker in various inliner functions.
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706

Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.

llvm-svn: 276515
2016-07-23 04:22:50 +00:00
Alina Sbirlea ba21ffebff Add flag to PassManagerBuilder to disable GVN Hoist Pass.
Summary:
Adding a flag to diable GVN Hoisting by default.
Note: The GVN Hoist Pass causes some Halide tests to hang. Halide will disable the pass while investigating.

Reviewers: llvm-commits, chandlerc, spop, dberlin

Subscribers: mehdi_amini

Differential Revision: https://reviews.llvm.org/D22639

llvm-svn: 276479
2016-07-22 22:02:19 +00:00
David Majnemer 522a91181a Don't remove side effecting instructions due to ConstantFoldInstruction
Just because we can constant fold the result of an instruction does not
imply that we can delete the instruction.  It may have side effects.

This fixes PR28655.

llvm-svn: 276389
2016-07-22 04:54:44 +00:00
George Burgess IV f84bb6fa67 Make help text more consistent. NFC.
llvm-svn: 276205
2016-07-20 23:14:29 +00:00
David Majnemer 5246e0b2c2 [FunctionAttrs] Correct the safety analysis for inference of 'returned'
We skipped over ReturnInsts which didn't return an argument which would
lead us to incorrectly conclude that an argument returned by another
ReturnInst was 'returned'.

This reverts commit r275756.

This fixes PR28610.

llvm-svn: 276008
2016-07-19 18:50:26 +00:00
Teresa Johnson 2124157102 [PM] Port FunctionImport Pass to new PM
Summary: Port FunctionImport Pass to new PM.

Reviewers: mehdi_amini, davide

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22475

llvm-svn: 275916
2016-07-18 21:22:24 +00:00
NAKAMURA Takumi 966bde50c3 Revert r275678, "Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute""
This reverts also r275029, "Update Clang tests after adding inference for the returned argument attribute"

It broke LTO build. Seems miscompilation.

llvm-svn: 275756
2016-07-18 03:23:25 +00:00
Teresa Johnson cd21a646f6 [ThinLTO] Perform profile-guided indirect call promotion
Summary:
To enable profile-guided indirect call promotion in ThinLTO mode, we
simply add call graph edges for each profitable target from the profile
to the summaries, then the summary-guided importing will consider the
callee for importing as usual.

Also we need to enable the indirect call promotion pass creation in the
PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO
backend), so that the newly imported functions are considered for
promotion in the backends.

The IC promotion profiles refer to callees by GUID, which required
adding GUIDs to the per-module VST in bitcode (and assigning them
valueIds similar to how they are assigned valueIds in the combined
index).

Reviewers: mehdi_amini, xur

Subscribers: mehdi_amini, davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21932

llvm-svn: 275707
2016-07-17 14:47:01 +00:00
Hal Finkel 660096b260 Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute"
This reverts commit r275042; the initial commit triggered self-hosting failures
on ARM/AArch64. James Molloy identified the problematic backend code, which has
been disabled in r275677. Trying again...

Original commit message:

Let FuncAttrs infer the 'returned' argument attribute

A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.

llvm-svn: 275678
2016-07-16 07:21:28 +00:00
Rong Xu 96a19d35ae [PGO] IRPGO pre-cleanup pass changes
This patch adds a selected set of cleanup passes including a pre-inline pass
before LLVM IR PGO instrumentation. The inline is only intended to apply those
obvious/trivial ones before instrumentation so that much less instrumentation
is needed to get better profiling information. This will drastically improve
the instrumented code performance for large C++ applications. Another benefit
is the context sensitive counts that can potentially improve the PGO
optimization.

Differential Revision: http://reviews.llvm.org/D21405

llvm-svn: 275588
2016-07-15 18:10:49 +00:00
Sebastian Pop 4177480aad code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 275561
2016-07-15 13:45:20 +00:00
Nico Weber 755cd760cd Revert r275401, it caused PR28551.
llvm-svn: 275420
2016-07-14 14:41:25 +00:00
Sebastian Pop 63847d04e7 code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 275401
2016-07-14 12:18:53 +00:00
Ivan Krasin 5474645dc8 Print remarks from WholeProgramDevirt pass for each call site.
Summary:
It's useful to have some visibility about which call sites are devirtualized,
especially for debug purposes. Another use case is a regression test on the
application side (like, Chromium).

Reviewers: pcc

Differential Revision: http://reviews.llvm.org/D22252

llvm-svn: 275145
2016-07-12 02:38:37 +00:00
Davide Italiano e8ae0b5eb4 [PM/IPO] Port LowerTypeTests to the new PassManager.
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.

llvm-svn: 275082
2016-07-11 18:10:06 +00:00
Davide Italiano 12a115683b [LowerTypeTests] Don't rely on doInitialization().
In preparation for porting this pass to the new PM (which has no
doInitialization()).

Differential Revision:  http://reviews.llvm.org/D22223

llvm-svn: 275074
2016-07-11 17:00:31 +00:00
Dehao Chen 9232f98279 Implement callsite-hotness based inline cost for Sample-based PGO
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {
  for (i=0; i < 100000000; i++) {
    callsite();
  }
}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Reviewers: davidxl, eraman, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22118

llvm-svn: 275073
2016-07-11 16:48:54 +00:00
Dehao Chen 29d2641f52 Tune the weight propagation algorithm for sample profile.
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D22180

llvm-svn: 275072
2016-07-11 16:40:17 +00:00
Hal Finkel 02012bcfee Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute
Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot.

llvm-svn: 275042
2016-07-11 04:51:23 +00:00
Hal Finkel ce881a41f9 Don't use a SmallSet for returned attribute inference
Suggested post-commit by David Majnemer on IRC (following-up on a pre-commit
review comment).

llvm-svn: 275033
2016-07-11 01:14:21 +00:00
Hal Finkel d66a7b05db Let FuncAttrs infer the 'returned' argument attribute
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.

Differential Revision: http://reviews.llvm.org/D22202

llvm-svn: 275027
2016-07-10 22:02:55 +00:00
Benjamin Kramer 5f7edcf953 [ArgPromote] Use function_ref and for-range loops.
No functionality change intended.

llvm-svn: 274973
2016-07-09 10:36:36 +00:00
Davide Italiano 92b933a55c [PM] Port CrossDSOCFI to the new pass manager.
llvm-svn: 274962
2016-07-09 03:25:35 +00:00
Piotr Padlewski 3b77612839 Add 'thinlto_src_module' md with asserts or -enable-import-metadata
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified

FIXED missing colon on requires.

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D22167

llvm-svn: 274947
2016-07-08 23:01:49 +00:00
Piotr Padlewski d4b792346c Revert "Add 'thinlto_src_module' md with asserts or -enable-import-metadata"
Reverting because of 17463
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17463

This reverts commit d20cb431bba2ba43b4c65a8556cff445bfefbb7c.

llvm-svn: 274946
2016-07-08 22:55:48 +00:00
Piotr Padlewski d6efefa2b8 Add 'thinlto_src_module' md with asserts or -enable-import-metadata
Summary:
This way the metadata will be only generated when asserts enabled,
or when -enable-import-metadata specified

Reviewers: tejohnson, eraman, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D22167

llvm-svn: 274938
2016-07-08 21:25:39 +00:00
Dehao Chen 429f5c735f Remove inline hints computation from SampleProfile.cpp
Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more.

Reviewers: dnovillo, davidxl

Subscribers: eraman, llvm-commits

Differential Revision: http://reviews.llvm.org/D19287

llvm-svn: 274918
2016-07-08 20:12:44 +00:00
Davide Italiano b4b9db81f2 [CrossDSOCFI] Change the pass so that it doesn't require doInitialization()
Differential Revision:  http://reviews.llvm.org/D21357

llvm-svn: 274910
2016-07-08 19:30:06 +00:00
Piotr Padlewski 6deaa6afae Add 'thinlto_src_module' metadata to imported function
Added metadata to be able to make statistics on how many functions
that have been imported have been removed. Also module name might
be helpfull when debugging.

Reviewers: tejohnson, eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D21943

llvm-svn: 274668
2016-07-06 20:26:25 +00:00
Piotr Padlewski 1f685e0186 NFC changed names in FunctionImport
llvm-svn: 274649
2016-07-06 18:12:23 +00:00
George Burgess IV bfa401e5ad [CFLAA] Split into Anders+Steens analysis.
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.

So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.

Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.

This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.

Patch by Jia Chen.

Differential Revision: http://reviews.llvm.org/D21910

llvm-svn: 274589
2016-07-06 00:26:41 +00:00
Sean Silva 997cbea05b [PM] Some preparatory refactoring to minimize the diff of D21921
llvm-svn: 274456
2016-07-03 03:35:03 +00:00
Sean Silva 45835e731d Remove dead TLI arg of isKnownNonNull and propagate deadness. NFC.
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.

This also removes the dependence of functionattrs on TLI altogether.

llvm-svn: 274455
2016-07-02 23:47:27 +00:00
Sean Silva 0fb7774f91 [PM] Some preparatory refactoring to minimize the diff of D21921
The main change here is just moving stuff to static functions.

llvm-svn: 274446
2016-07-02 19:12:56 +00:00
Sean Silva e2133e7c32 [PM] Preparatory cleanups to ArgumentPromotion.
This pulls some obvious changes out of http://reviews.llvm.org/D21921 to
minimize the diff.

llvm-svn: 274445
2016-07-02 18:59:51 +00:00
Duncan P. N. Exon Smith 9d1f156418 Revert "code hoisting pass based on GVN"
This reverts commit r274305, since it breaks self-hosting:
  http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/22349/
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17232

Note that the blamelist on lab.llvm.org:8011 is incorrect.  The previous
build was r274299, but somehow r274305 wasn't included in the blamelist:
  http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules

llvm-svn: 274320
2016-07-01 01:51:40 +00:00
Sebastian Pop 5c5798c57c code hoisting pass based on GVN
This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining.

Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki.
Important algorithmic contributions by Daniel Berlin under the form of reviews.

Differential Revision: http://reviews.llvm.org/D19338

llvm-svn: 274305
2016-07-01 00:24:31 +00:00
Easwaran Raman 1832bf6aee [PM] Port PartialInlining to the new PM
Differential revision: http://reviews.llvm.org/D21699

llvm-svn: 273894
2016-06-27 16:50:18 +00:00
Benjamin Kramer 135f735af1 Apply clang-tidy's modernize-loop-convert to most of lib/Transforms.
Only minor manual fixes. No functionality change intended.

llvm-svn: 273808
2016-06-26 12:28:59 +00:00
David Majnemer e14e7bc4b8 Revert "[SimplifyCFG] Stop inserting calls to llvm.trap for UB"
This reverts commit r273778, it seems to break UBSan :/

llvm-svn: 273779
2016-06-25 08:19:55 +00:00
David Majnemer d346a37737 [SimplifyCFG] Stop inserting calls to llvm.trap for UB
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.

While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.

There are much better tools than llvm.trap: UBSan and ASan.

N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...

llvm-svn: 273778
2016-06-25 08:04:19 +00:00
Peter Collingbourne 0312f614b1 IR: Introduce llvm.type.checked.load intrinsic.
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.

This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.

Differential Revision: http://reviews.llvm.org/D21121

llvm-svn: 273756
2016-06-25 00:23:04 +00:00
Dehao Chen c66a06ad0e Hookup ProfileSummary with SampleProfilerLoader
Summary: Set ProfileSummary in SampleProfilerLoader.

Reviewers: davidxl, eraman

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21702

llvm-svn: 273745
2016-06-24 22:57:06 +00:00
Peter Collingbourne 995d6cc8f9 Fix unused variable warning in -asserts builds.
llvm-svn: 273737
2016-06-24 21:37:11 +00:00
Peter Collingbourne 7efd750607 IR: New representation for CFI and virtual call optimization pass metadata.
The bitset metadata currently used in LLVM has a few problems:

1. It has the wrong name. The name "bitset" refers to an implementation
   detail of one use of the metadata (i.e. its original use case, CFI).
   This makes it harder to understand, as the name makes no sense in the
   context of virtual call optimization.

2. It is represented using a global named metadata node, rather than
   being directly associated with a global. This makes it harder to
   manipulate the metadata when rebuilding global variables, summarise it
   as part of ThinLTO and drop unused metadata when associated globals are
   dropped. For this reason, CFI does not currently work correctly when
   both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
   globals, and fails to associate metadata with the rebuilt globals. As I
   understand it, the same problem could also affect ASan, which rebuilds
   globals with a red zone.

This patch solves both of those problems in the following way:

1. Rename the metadata to "type metadata". This new name reflects how
   the metadata is currently being used (i.e. to represent type information
   for CFI and vtable opt). The new name is reflected in the name for the
   associated intrinsic (llvm.type.test) and pass (LowerTypeTests).

2. Attach metadata directly to the globals that it pertains to, rather
   than using the "llvm.bitsets" global metadata node as we are doing now.
   This is done using the newly introduced capability to attach
   metadata to global variables (r271348 and r271358).

See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html

Differential Revision: http://reviews.llvm.org/D21053

llvm-svn: 273729
2016-06-24 21:21:32 +00:00
Peter Collingbourne 6d88fde3af IR: Introduce Module::global_objects().
This is a convenience iterator that allows clients to enumerate the
GlobalObjects within a Module.

Also start using it in a few places where it is obviously the right thing
to use.

Differential Revision: http://reviews.llvm.org/D21580

llvm-svn: 273470
2016-06-22 20:29:42 +00:00
Dehao Chen 071bb9d7af Pass AssumptionCacheTracker from SampleProfileLoader to Inliner
Summary: Inliner needs ACT when calling InlineFunction. Instead of nullptr, we need to pass it in from SampleProfileLoader

Reviewers: davidxl

Subscribers: eraman, vsk, danielcdh, llvm-commits

Differential Revision: http://reviews.llvm.org/D21205

llvm-svn: 273199
2016-06-20 20:53:40 +00:00
Benjamin Kramer 1afc1de406 Apply another batch of fixes from clang-tidy's performance-unnecessary-value-param.
Contains some manual fixes. No functionality change intended.

llvm-svn: 273047
2016-06-17 20:41:14 +00:00
Chandler Carruth 164a2aa6f4 [PM] Remove support for omitting the AnalysisManager argument to new
pass manager passes' `run` methods.

This removes a bunch of SFINAE goop from the pass manager and just
requires pass authors to accept `AnalysisManager<IRUnitT> &` as a dead
argument. This is a small price to pay for the simplicity of the system
as a whole, despite the noise that changing it causes at this stage.

This will also helpfull allow us to make the signature of the run
methods much more flexible for different kinds af passes to support
things like intelligently updating the pass's progression over IR units.

While this touches many, many, files, the changes are really boring.
Mostly made with the help of my trusty perl one liners.

Thanks to Sean and Hal for bouncing ideas for this with me in IRC.

llvm-svn: 272978
2016-06-17 00:11:01 +00:00
Justin Lebar c05f3c9942 [IR] [DAE] Copy comdats during DAE, and don't copy comdats in GlobalObject::copyAttributesFrom.
Summary: This reverts the changes to Globals.cpp and IRMover.cpp in
"[IR] Copy comdats in GlobalObject::copyAttributesFrom" (D20631,
rL270743).

The DeadArgElim test is left unchanged, and we change DAE to explicitly
copy comdats.

The reverted change breaks copyAttributesFrom when the destination lives
in a different module from the source.  The decision in D21255 was to
revert this patch and handle comdat copying separately from
copyAttributesFrom.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21403

llvm-svn: 272855
2016-06-15 23:20:15 +00:00
David Majnemer cbf614a93b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Davide Italiano d737dd2ec6 [PM] Port WholeProgramDevirt to the new pass manager.
llvm-svn: 272721
2016-06-14 21:44:19 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Sean Silva e3bb457423 [PM] Port DeadArgumentElimination to the new PM
The approach taken here follows r267631.

deadarghaX0r should be easy to port when the time comes to add new-PM
support to bugpoint.

llvm-svn: 272507
2016-06-12 09:16:39 +00:00
Sean Silva f5080194fd [PM] Port ReversePostOrderFunctionAttrs to the new PM
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).

I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):

General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis

General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO

Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops

Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata

CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars

Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.

Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------

(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)

Note: "Deduce function attributes in RPO" is a module pass.

1. Do preparatory refactoring.

Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)

2. Rename the old pass class.

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).

llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.

Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.

3. Add the new PM pass class.

In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.

Add declaration for the new PM pass in this header:

    class ReversePostOrderFunctionAttrsPass
        : public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
    public:
      PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
    };

Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.

Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).

Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`

b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();

c) In the old PM getAnalysisUsage method, observe the calls
   `AU.addPreserved<...>();`.

   In the case `Changed == true`, for each preserved analysis you should do
   call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
   E.g.:

       PreservedAnalyses PA;
       PA.preserve<CallGraphAnalysis>();
       return PA;

Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.

4. Add the pass to the new PM pass registry to make it available in opt.

In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).

Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.

Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.

llvm-svn: 272505
2016-06-12 07:48:51 +00:00
Sean Silva adc7939525 Factor out a helper. NFC
Prep for porting to new PM.

llvm-svn: 272503
2016-06-12 05:44:51 +00:00
Easwaran Raman 71069cf67d Use ProfileSummaryInfo in inline cost analysis.
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.

Differential revision: http://reviews.llvm.org/D21045

llvm-svn: 272321
2016-06-09 22:23:21 +00:00
Teresa Johnson 7ab1f69272 [ThinLTO/gold] Enable summary-based internalization
Summary: Enable existing summary-based importing support in the gold-plugin.

Reviewers: mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21080

llvm-svn: 272239
2016-06-09 01:14:13 +00:00
Manuel Jacob a485984c0c [PM] Schedule InstSimplify after late LICM run, to clean up LCSSA nodes.
Summary:
The module pass pipeline includes a late LICM run after loop
unrolling.  LCSSA is implicitly run as a pass dependency of LICM.  However no
cleanup pass was run after this, so the LCSSA nodes ended in the optimized output.

Reviewers: hfinkel, mehdi_amini

Subscribers: majnemer, bruno, mzolotukhin, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D20606

llvm-svn: 271602
2016-06-02 22:14:26 +00:00
Erik Eckstein 0c48dd8ca5 Fix a crash in MergeFunctions related to ordering of weak/strong functions
The assumption, made in insert() that weak functions are always inserted after strong functions,
is only true in the first round of adding functions.
In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later,
where an equivalent weak function already exists in the tree.
This change removes the assert in insert() and explicitly enforces a weak->strong order.
This also removes the need of two separate loops in runOnModule().

llvm-svn: 271299
2016-05-31 17:20:23 +00:00
Xinliang David Li d38392ecd6 [PM] Port the Sample FDO to new PM (part-2)
llvm-svn: 271072
2016-05-27 23:20:16 +00:00
Xinliang David Li e897edbd36 [PM] Port the Sample FDO to new PM (part-1)
llvm-svn: 271062
2016-05-27 22:30:44 +00:00
Dehao Chen 80b16d4135 Remove sample profile dependency to instcombine, which is not a analysis pass.
Summary: This patch removes dependency from sample profile pass to instcombine pass.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20501

llvm-svn: 271009
2016-05-27 16:14:15 +00:00
Benjamin Kramer 4fed928f53 Avoid some copies by using const references.
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.

llvm-svn: 270988
2016-05-27 12:30:51 +00:00
Peter Collingbourne fad596aa81 Move whole-program virtual call optimization pass after function attribute inference in LTO pipeline.
As a result of D18634 we no longer infer certain attributes on linkonce_odr
functions at compile time, and may only infer them at LTO time. The readnone
attribute in particular is required for virtual constant propagation (part
of whole-program virtual call optimization) to work correctly.

This change moves the whole-program virtual call optimization pass after
the function attribute inference passes, and enables the attribute inference
passes at opt level 1, so that virtual constant propagation has a chance to
work correctly for linkonce_odr functions.

Differential Revision: http://reviews.llvm.org/D20643

llvm-svn: 270765
2016-05-25 21:26:14 +00:00
Teresa Johnson 04c9a2d63d [ThinLTO] Refactor ODR resolution and internalization (NFC)
Move the now index-based ODR resolution and internalization routines out
of ThinLTOCodeGenerator.cpp and into either LTO.cpp (index-based
analysis) or FunctionImport.cpp (index-driven optimizations).
This is to enable usage by other linkers.

llvm-svn: 270698
2016-05-25 14:03:11 +00:00
David Majnemer 124bdb7497 [FunctionAttrs] Volatile loads should disable readonly
A volatile load has side effects beyond what callers expect readonly to
signify.  For example, it is not safe to reorder two function calls
which each perform a volatile load to the same memory location.

llvm-svn: 270671
2016-05-25 05:53:04 +00:00
Andrew Kaylor 9c81d0fdeb Avoid including AlwaysInliner pass in opt-bisect search.
Differential Revision: http://reviews.llvm.org/D19640

llvm-svn: 270495
2016-05-23 21:57:54 +00:00
Mark Lacey 9b5fcf65ec Functions with differing phis should not be merged.
Check that the incoming blocks of phi nodes are identical, and block
function merging if they are not.

rdar://problem/26255167

Differential Revision: http://reviews.llvm.org/D20462

llvm-svn: 270250
2016-05-20 18:39:11 +00:00
Xinliang David Li 72616180df Rename pass name to prepare to new PM porting /NFC
llvm-svn: 269586
2016-05-15 01:04:24 +00:00