when they are call edges at the leaf but may (transitively) be reached
via ref edges.
It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.
Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.
llvm-svn: 290649
skipping indirectly recursive inline chains.
To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.
This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.
llvm-svn: 290590
removing fully-dead comdats without removing dead entries in comdats
with live members.
This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.
I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).
llvm-svn: 290557
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
llvm-svn: 282542
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.
Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered. I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off. As before without -Rpass-with-hotness
there is no overhead.
Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'. As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.
Reviewers: eraman, chandlerc, davidxl, hfinkel
Subscribers: tejohnson, Prazek, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D23415
llvm-svn: 279860
This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.
Some context on why we should be using stack coloring rather than
merging allocas in this way:
LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.
As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.
All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.
For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html
Differential Revision: https://reviews.llvm.org/D23052
llvm-svn: 278892
Summary:
I think it is much better this way.
When I firstly saw line:
Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23222
llvm-svn: 278290
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.
BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level). Thus there is no additional overhead without the
flag.
Reviewers: hfinkel, davidxl, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22694
llvm-svn: 278185
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
\brief Calculate and dump ThinLTO specific inliner stats.
The main statistics are:
(1) Number of inlined imported functions,
(2) Number of imported functions inlined into importing module (indirect),
(3) Number of non imported functions inlined into importing module
(indirect).
The difference between first and the second is that first stat counts
all performed inlines on imported functions, but the second one only the
functions that have been eventually inlined to a function in the importing
module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
possible to e.g. import function `A`, `B` and then inline `B` to `A`,
and after this `A` might be too big to be inlined into some other function
that calls it. It calculates this statistic by building graph, where
the nodes are functions, and edges are performed inlines and then by marking
the edges starting from not imported function.
If `Verbose` is set to true, then it also dumps statistics
per each inlined function, sorted by the greatest inlines count like
- number of performed inlines
- number of performed inlines to importing module
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D22491
llvm-svn: 277089
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706
Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.
llvm-svn: 276515
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
The implemented heuristic has a large body of code which better sits
in its own function for better readability. It also allows adding more
heuristics easier in the future.
llvm-svn: 268107
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
parts of the AA interface out of the base class of every single AA
result object.
Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.
When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.
To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).
However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.
The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.
It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.
However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.
Differential Revision: http://reviews.llvm.org/D17329
llvm-svn: 262490
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
llvm-svn: 260183
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.
Differential Revision: http://reviews.llvm.org/D15701
llvm-svn: 256521
This reapplies r256277 with two changes:
- In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
a use-after-free bug.
- Remove an unnecessary install-local target in lib/IR/Makefile.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256304
This reapplies r252990 and r252949. I've added member function getKind
to the Attr classes which returns the enum or string of the attribute.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256277
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.
Differential Revision: http://reviews.llvm.org/D15245
llvm-svn: 256222
This reapplies r252949. I've changed the type of FuncName to be
std::string instead of StringRef in emitFnAttrCompatCheck.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252990
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252949