With the recent addition of new parameter MergeAttributes (D134117),
callers need to specify several default parameters before getting to
specify the new parameter.
This patch reorders the parameters so that callers do not have to
specify as many default parameters.
Differential Revision: https://reviews.llvm.org/D134125
In the past, we've had a bug resulting in a compiler crash after
forgetting to merge function attributes (D105729).
This patch teaches InlineFunction to merge function attributes. This
way, we minimize the "time" when the IR is valid, but the function
attributes are not.
Differential Revision: https://reviews.llvm.org/D134117
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.
Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner. DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:
- The use of DefaultInlineOrder doesn't make sense in the module
inliner because there is no priority inherent in the order in which
call sites are added to the list of call sites -- SmallVector.
- The use of any other inline order doesn't make sense in the CGSCC
inliner because little prioritization can be done within one CGSCC.
This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).
At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.
Differential Revision: https://reviews.llvm.org/D134080
Previously if the inliner split an SCC such that an empty one remained, the MLInlineAdvisor could potentially lose track of the EdgeCount if a subsequent CGSCC pass modified the calls of a function that was initially in the SCC pre-split. Saving the seen nodes in onPassEntry resolves this.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127693
Adds option to print the contents of the Inline Advisor after each SCC Inliner pass
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D127689
Since the size of most of SCC's is 1, the PriorityInlineOrder would not change the inline
order in SCC inliner.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D123608
Introduce a new attribute "function-inline-cost-multiplier" which
multiplies the inline cost of a call site (or all calls to a callee) by
the multiplier.
When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee. If so, set the "function-inline-cost-multiplier" attribute of
the new call site to double the original call site's attribute value.
This does not happen when the original call site is intra-SCC.
This is an alternative to D120584, which marks the call sites as
noinline.
Hopefully fixes PR45253.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D121084
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)
Differential Revision: https://reviews.llvm.org/D116964
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
In a CGSCC pass manager, we may visit the same function multiple times
due to SCC mutations. In the inliner pipeline, this results in running
the function simplification pipeline on a function multiple times even
if it hasn't been changed since the last function simplification
pipeline run.
We use a newly introduced analysis to keep track of whether or not a
function has changed since the last time the function simplification
pipeline has run on it. If we see this analysis available for a function
in a CGSCCToFunctionPassAdaptor, we skip running the function passes on
the function. The analysis is queried at the end of the function passes
so that it's available after the first time the function simplification
pipeline runs on a function. This is a per-adaptor option so it doesn't
apply to every adaptor.
The goal of this is to improve compile times. However, currently we
can't turn this on by default at least for the higher optimization
levels since the function simplification pipeline is not robust enough
to be idempotent in many cases, resulting in performance regressions if
we stop running the function simplification pipeline on a function
multiple times. We may be able to turn this on for -O1 in the near
future, but turning this on for higher optimization levels would require
more investment in the function simplification pipeline.
Heavily inspired by D98103.
Example compile time improvements with flag turned on:
https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113947
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.
So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.
This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.
However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.
Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).
Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructionshttps://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss
D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation. D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D113304
Adds the following switches:
1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:
1. Original: defers to original advisor
2. AlwaysInline: inline all sites not in replay
3. NeverInline: inline no sites not in replay
2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:
1. Line
2. LineColumn
3. LineDiscriminator
4. LineColumnDiscriminator
Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.
All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang
An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.
Updated various tests to cover the new switches and negative remarks
Testing:
ninja check-all
Reviewed By: wenlei, mtrofin
Differential Revision: https://reviews.llvm.org/D112040
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.
This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.
Testing:
ninja check-all
Reviewed By: wenlei, mtrofin
Differential Revision: https://reviews.llvm.org/D110658
If another inlining session came after a ModuleInlinerWrapperPass, the
advisor alanysis would still be cached, but its Result would be cleared.
We need to clear both.
This addresses PR52118
Differential Revision: https://reviews.llvm.org/D111586
This also removes the need to disable the mandatory inlining phase in
tests.
In a departure from the previous remark, we don't output a 'cost' in
this case, because there's no such thing. We just report that inlining
happened because of the attribute.
Differential Revision: https://reviews.llvm.org/D110891
Bisecting and reducing opt pipelines that includes the
ModuleInlinerWrapperPass has turned out to be a bit problematic.
This is far from perfect (it still lacks information about inline
advisor params etc.), but it should give some kind of hint to what
the wrapped pipeline looks like when using -print-pipeline-passes.
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109878
In default pipelines the ModuleInlinerWrapperPass is adding the
InlinerPass to the pipeline twice, once due to MandatoryFirst (passing
true in the ctor) and then a second time with false as argument.
To make it possible to bisect and reduce opt test cases for this
part of the pipeline we need to be able to choose between the two
different variants of the InlinerPass when running opt. This patch is
changing 'inline' to a CGSCC_PASS_WITH_PARAMS in the PassRegistry,
making it possible run opt with both -passes=cgscc(inline) and
-passes=cgscc(inline<only-mandatory>).
Reviewed By: aeubanks, mtrofin
Differential Revision: https://reviews.llvm.org/D109877
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.
We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.
It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.
Similar to https://reviews.llvm.org/D106306.
Fixes PR50788.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D106405
Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites.
In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free
if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics.
Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105805
The patch templatize PriorityInlinerOrder so that it can accept any type priority metric.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D104972
This patch makes PriorityInlineOrder lazily updated.
The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D104654
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D104028
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D104028
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
The ModulePassManager should already have taken care of all analysis
invalidation. Without this change, upcoming changes will cause more
invalidation than necessary.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D101320