NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
When inlining functions containing allocas of scalable vectors we
cannot specify the size in the lifetime markers, since we don't
know this at compile time.
Added new test here:
test/Transforms/Inline/AArch64/sve-alloca-merge.ll
Differential Revision: https://reviews.llvm.org/D87139
As discussed in D86843, -earlycse-debug-hash should be used in more regression
tests to catch inconsistency between the hashing and the equivalence check.
Differential Revision: https://reviews.llvm.org/D86863
The NPM doesn't support call-site alwaysinline as described in the comments.
Also make NPM runs more similar to legacy PM runs.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D86663
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.
Differential Revision: https://reviews.llvm.org/D86549
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).
Differential Revision: https://reviews.llvm.org/D86481
Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.
Differential Revision: https://reviews.llvm.org/D85674
Allow inlining only when the Callee has a subset of the Caller's
features. In principle, we should be able to inline regardless of any
features because WebAssembly supports features at module granularity,
not function granularity, but without this restriction it would be
possible for a module to "forget" about features if all the functions
that used them were inlined.
Requested in PR46812.
Differential Revision: https://reviews.llvm.org/D85494
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.
Fixed the code, also expanded the test to capture this.
Differential Revision: https://reviews.llvm.org/D85373
If an analysis is actually invalidated, there's already a log statement
for that: 'Invalidating analysis: FooAnalysis'.
Otherwise the statement is not very useful.
Reviewed By: asbirlea, ychen
Differential Revision: https://reviews.llvm.org/D84981
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D84694
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:
- loading models via a command-line parameter, thus allowing for rapid
training iteration, where models can be used for the next exploration
phase without requiring recompiling the compiler. This trades off some
compilation speed for the added flexibility.
- collecting training logs, in the form of tensorflow.SequenceExample
protobufs. We generate these as textual protobufs, which simplifies
generation and testing. The protobufs may then be readily consumed by a
tensorflow-based training algorithm.
To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html
Reviewers: jdoerfert, davidxl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83733
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Previously the NPM inliner would skip all potential inlines in an
optnone function, but alwaysinline callees should be inlined regardless
of optnone.
Fixes inline-optnone.ll under NPM.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D83021
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll".
Also applied to unroll-and-jam and unroll-full.
Fixes various check-llvm tests when NPM is turned on.
Reviewed By: Whitney, dmgreen
Differential Revision: https://reviews.llvm.org/D82590
If the GEP instruction contanins only constants as its arguments,
then it should be recognized as a constant. For now, there was
also added a flag to turn off this simplification if it causes
any regressions ("disable-gep-const-evaluation") which is off
by default. Once I gather needed data of the effectiveness of
this simplification, the flag will be deleted.
Reviewers: apilipenko, davidxl, mtrofin
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D81026
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
All other floating point math optimization related attribute are merged
in a conservative way during function inlining. This commit adds the
merge rule for the 'no-signed-zeros-fp-math' attribute.
Differential Revision: https://reviews.llvm.org/D81714
Some sequences of optimizations can generate call sites which may never be
executed during runtime, and through constant propagation result in dynamic
allocas being converted to static allocas with very large allocation amounts.
The inliner tries to move these to the caller's entry block, resulting in the
stack limits being reached/bypassed. Avoid inlining functions if this would
result.
The threshold of 64k currently doesn't get triggered on the test suite with an
-Os LTO build on arm64, care should be taken in changing this in future to avoid
needlessly pessimising inlining behaviour.
Differential Revision: https://reviews.llvm.org/D81765
This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D81024
This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".
This is the second attempt to integrate the patch.
The problem from the first try has been discussed and
fixed in D82205.
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential revision: https://reviews.llvm.org/D81743
Summary:
Add call site location info into inline remarks so we can differentiate inline sites.
This can be useful for inliner tuning. We can also reconstruct full hierarchical inline
tree from parsing such remarks. The messege of inline remark is also tweaked so we can
differentiate SampleProfileLoader inline from CGSCC inline.
Reviewers: wmi, davidxl, hoy
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D82213
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79650
If the GEP instruction contanins only constants as its arguments,
then it should be recognized as a constant. For now, there was
also added a flag to turn off this simplification if it causes
any regressions ("disable-gep-const-evaluation") which is off
by default. Once I gather needed data of the effectiveness of
this simplification, the flag will be deleted.
Reviewers: apilipenko, davidxl, mtrofin
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D81026
This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D81024
This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".
Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev
Reviewed By: mtrofin
Differential revision: https://reviews.llvm.org/D81743
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79650
Summary:
When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We can avoid rerun of
the same SCC when the SCC is set to be processed by both mechanisms *back to back*. In pathological cases, such redundant
rerun could cause exponential size growth due to inlining along cycles, even when there's no SCC mutation and hence
convergence is not a problem.
Note that it's ok to have SCC updated and rerun immediately, and also in the work list if we have actually moved an SCC
to be topologically "below" the current one due to merging. In that case, we will need to revisit the current SCC after
those moved SCCs. For that reason, the redundant avoidance here only targets back to back rerun of the same SCC - the
case described by the now removed FIXME comment.
Reviewers: chandlerc, wmi
Subscribers: llvm-commits, hoy
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80589
- Renaming the printer class, flag
- Refactoring
- Changing some tests
This patch is a preparational stage for introducing a new printing pass and new
functionality to the existing Annotation Writer. I plan to extend
this functionality for this tool to be more useful when looking at the inline
process.
Allow InvokeInst to have the second optional prof branch weight for
its unwind branch. InvokeInst is a terminator with two successors.
It might have its unwind branch taken many times. If so
the BranchProbabilityInfo unwind branch heuristic can be inaccurate.
This patch allows a higher accuracy calculated with both branch
weights set.
Changes:
- A new section about InvokeInst is added to
the BranchWeightMetadata page. It states the old information that
missed in the doc and adds new about the second branch weight.
- Verifier is changed to allow either 1 or 2 branch weights
for InvokeInst.
- A new test is written for BranchProbabilityInfo to demonstrate
the main improvement of the simple fix in calcMetadataWeights().
- Several new testcases are created for Inliner. Those check that
both weights are accounted for invoke instruction weight
calculation.
- PGOUseFunc::setBranchWeights() is fixed to be applicable to
InvokeInst.
Reviewers: davidxl, reames, xur, yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80618
When sampleFDO is enabled, people may expect they can use
-fno-profile-sample-use to opt-out using sample profile for a certain file.
That could be either for debugging purpose or for performance tuning purpose.
However, when thinlto is enabled, if a function in file A compiled with
-fno-profile-sample-use is imported to another file B compiled with
-fprofile-sample-use, the inlined copy of the function in file B may still
get its profile annotated.
The inconsistency may even introduce profile unused warning because if the
target is not compiled with explicit debug information flag, the function
in file A won't have its debug information enabled (debug information will
be enabled implicitly only when -fprofile-sample-use is used). After it is
imported into file B which is compiled with -fprofile-sample-use, profile
annotation for the outline copy of the function will fail because the
function has no debug information, and that will trigger profile unused
warning.
We add a new attribute use-sample-profile to control whether a function
will use its sample profile no matter for its outline or inline copies.
That will make the behavior of -fno-profile-sample-use consistent.
Differential Revision: https://reviews.llvm.org/D79959