This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.
Reviewers: tejohnson, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19096
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
Summary:
This patch refined the instruction weight anootation algorithm:
1. Do not use dbg_value intrinsics for annotation.
2. Annotate cold calls if the call is inlined in profile, but not inlined before preparation. This indicates that the annotation preparation step found no sample for the inlined callsite, thus the call should be very cold.
Reviewers: dnovillo, davidxl
Subscribers: mgrang, llvm-commits
Differential Revision: http://reviews.llvm.org/D19286
llvm-svn: 266936
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
This is a requirement for the cache handling in D18494
Differential Revision: http://reviews.llvm.org/D18908
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266519
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.
Fixes PR25079.
llvm-svn: 266486
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
Summary:
This IR pass is helpful for GPUs, and other targets with divergent
branches. It's a nop on targets without divergent branches.
Reviewers: chandlerc
Subscribers: llvm-commits, jingyue, rnk, joker.eph, tra
Differential Revision: http://reviews.llvm.org/D18626
llvm-svn: 266399
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
This will save a bunch of copies / initialization of intermediate
datastructure, and (hopefully) simplify the code.
This also abstract the symbol preservation mechanism outside of the
Internalization pass into the client code, which is not forced
to keep a map of strings for instance (ThinLTO will prefere hashes).
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266163
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18883
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
Summary:
The function import pass was computing all the imports for all the
modules in the index, and only using the imports for the current module.
Change this to instead compute only for the given module. This means
that the exports list can't be populated, but they weren't being used
anyway.
Longer term, the linker can collect all the imports and export lists
and serialize them out for consumption by the distributed backend
processes which use this pass.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18945
llvm-svn: 266125
r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately.
llvm-svn: 266022
Summary:
This is the first step in also serializing the index out to LLVM
assembly.
The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).
Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.
Also, this enables removal of the BitWriter library's dependence on the
Analysis library that was previously required for the summary computation.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18763
llvm-svn: 265941
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602
Summary:
To aid in debugging, dump out the correlation between value names and
GUID for each source module when it is materialized. This will make it
easier to comprehend the earlier summary-based function importing debug
trace which only has access to and prints the GUIDs.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18556
llvm-svn: 265326
Summary: This should make the code more readable, especially all the map declarations.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18721
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265215
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
This is a recommit of r265095 after fixing the Windows issues.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265111
This reverts commit r265096, r265095, and r265094.
Windows build is broken, and the validation does not pass.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265102
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265095
This patch simply mirrors the attributes we give to @llvm.nvvm.reflect
to the __nvvm_reflect libdevice call. This shaves about 30% of the code
in libdevice away because of CSE opportunities. It's also helps us
figure out that libdevice implementations of transcendental functions
don't have side-effects.
llvm-svn: 265060
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
Summary:
Add a statistic to count the number of imported functions. Also, add a
new -print-imports option to emit a trace of imported functions, that
works even for an NDEBUG build.
Note that emitOptimizationRemark does not work for the above printing as
it expects a Function object and DebugLoc, neither of which we have
with summary-based importing.
This is part 2 of D18487, the first part was committed separately as
r264536.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18487
llvm-svn: 264537
With r264503, aliases are now being added to the GlobalsToImport set
even when their aliasees can't be imported due to their linkage type.
While the importing worked correctly (the aliases imported as
declarations) due to the logic in doImportAsDefinition, there is no
point to adding them to the GlobalsToImport set.
Additionally, with D18487 it was resulting in incorrectly printing a
message indicating that the alias was imported.
To avoid this, delay adding aliases to the GlobalsToImport set until
after the linkage type of the aliasee is checked.
This patch is part of D18487.
llvm-svn: 264536
Summary:
Now that the summary contains the full reference/call graph, we can
replace the existing function importer that loads and inspect the IR
to iteratively walk the call graph by a traversal based purely on the
summary information. Decouple the actual importing decision from any
IR manipulation.
Reviewers: tejohnson
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18343
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264503
Summary:
ThinLTO is relying on linkInModule to import selected function.
However a lot of "magic" was hidden in linkInModule and the IRMover,
who would rename and promote global variables on the fly.
This is moving to an approach where the steps are decoupled and the
client is reponsible to specify the list of globals to import.
As a consequence some test are changed because they were relying on
the previous behavior which was importing the definition of *every*
single global without control on the client side.
Now the burden is on the client to decide if a global has to be imported
or not.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18122
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263863
The latent bug that LLE exposed in the LoopVectorizer was resolved
(PR26952).
The pass can be disabled with -mllvm -enable-loop-load-elim=0
llvm-svn: 263595
Since the static getGlobalIdentifier and getGUID methods are now called
for global values other than functions, reflect that by moving these
methods to the GlobalValue class.
llvm-svn: 263524
(Resubmitting after fixing missing file issue)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263513
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263490
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage. Re-landed here with no changes.
Reviewers: chandlerc, jingyue
Subscribers: llvm-commits, tra, jhen, hfinkel
Differential Revision: http://reviews.llvm.org/D17739
llvm-svn: 263481
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Summary:
This patch adds support for including a full reference graph including
call graph edges and other GV references in the summary.
The reference graph edges can be used to make importing decisions
without materializing any source modules, can be used in the plugin
to make file staging decisions for distributed build systems, and is
expected to have other uses.
The call graph edges are recorded in each function summary in the
bitcode via a list of <CalleeValueIds, StaticCount> tuples when no PGO
data exists, or <CalleeValueId, StaticCount, ProfileCount> pairs when
there is PGO, where the ValueId can be mapped to the function GUID via
the ValueSymbolTable. In the function index in memory, the call graph
edges reference the target via the CalleeGUID instead of the
CalleeValueId.
The reference graph edges are recorded in each summary record with a
list of referenced value IDs, which can be mapped to value GUID via the
ValueSymbolTable.
Addtionally, a new summary record type is added to record references
from global variable initializers. A number of bitcode records and data
structures have been renamed to reflect the newly expanded scope of the
summary beyond functions. More cleanup will follow.
Reviewers: joker.eph, davidxl
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17212
llvm-svn: 263275
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.
In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.
llvm-svn: 263219
tests to run GVN in both modes.
This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.
Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.
I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.
Differential Revision: http://reviews.llvm.org/D18019
llvm-svn: 263208
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
llvm-svn: 263184
As part of r251146 InstCombine was extended to call computeKnownBits on
every value in the function to determine whether it happens to be
constant. This increases typical compiletime by 1-3% (5% in irgen+opt
time) in my measurements. On the other hand this case did not trigger
once in the whole llvm-testsuite.
This patch introduces the notion of ExpensiveCombines which are only
enabled for OptLevel > 2. I removed the check in InstructionSimplify as
that is called from various places where the OptLevel is not known but
given the rarity of the situation I think a check in InstCombine is
enough.
Differential Revision: http://reviews.llvm.org/D16835
llvm-svn: 263047
This reverts commit r262250.
It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when
running with *ref* data set. The error happens with
"-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing".
llvm-svn: 262839
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17827
llvm-svn: 262634
parts of the AA interface out of the base class of every single AA
result object.
Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.
When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.
To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).
However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.
The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.
It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.
However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.
Differential Revision: http://reviews.llvm.org/D17329
llvm-svn: 262490
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
llvm-svn: 262419
Summary:
I re-benchmarked this and results are similar to original results in
D13259:
On ARM64:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
SingleSource/Benchmarks/Polybench/stencils/adi -19.78%
On x86:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14%
And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.
In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.
The reason that time spent in SCEV increases has to do with the design
of the old pass manager. If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.
This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.
(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result. Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)
Reviewers: hfinkel, dberlin
Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16300
llvm-svn: 262250
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Reviewers: chandlerc, jingyue
Subscribers: hfinkel, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17317
llvm-svn: 261544
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
convert one test to use this.
This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.
For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.
The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.
llvm-svn: 261203
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).
This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.
The ThinLTO pipeline integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes. It is not clear if this part
of the pipeline will stay as is, as the split model of ThinLTO
does not allow the same benefit as FullLTO without added tricks.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.
This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.
This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.
llvm-svn: 260813
node set rather than walking the SCC directly.
This directly exposes the functions and has already had null entries
filtered out. We also don't need need to handle optnone as it has
already been handled in the caller -- we never try to remove convergent
when there are optnone functions in the SCC.
With this change, the code for removing convergent should work with the
new pass manager and a different SCC analysis.
llvm-svn: 260668
with the test for a non-convergent intrinsic call.
While it is possible to use the call records to search for function
calls, we're going to do an instruction scan anyways to find the
intrinsics, we can handle both cases while scanning instructions. This
will also make the logic more amenable to the new pass manager which
doesn't use the same call graph structure.
My next patch will remove use of CallGraphNode entirely and allow this
code to work with both the old and new pass manager. Fortunately, it
should also get strictly simpler without changing functionality.
llvm-svn: 260666
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.
This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits, dexonsmith
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260604
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260603
Make sure we split ":" from the end of the global function id (which
is <path>:<function> for local functions) instead of the beginning to
avoid splitting at the wrong place for Windows file paths that contain
a ":".
llvm-svn: 260469
The current function importer will walk the callgraph, importing
transitively any callee that is below the threshold. This can
lead to import very deep which is costly in compile time and not
necessarily beneficial as most of the inline would happen in
imported function and not necessarilly in user code.
The actual factor has been carefully chosen by flipping a coin ;)
Some tuning need to be done (just at the existing limiting threshold).
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17082
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260466
There is not reason to pass an array of "char *" to rebuild a set if
the client already has one.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260462
This restores commit r260408, along with a fix for a bot failure.
The bot failure was caused by dereferencing a unique_ptr in the same
call instruction parameter list where it was passed via std::move.
Apparently due to luck this was not exposed when I built the compiler
with clang, only with gcc.
llvm-svn: 260442
Summary:
This patch uses the lower 64-bits of the MD5 hash of a function name as
a GUID in the function index, instead of storing function names. Any
local functions are first given a global name by prepending the original
source file name. This is the same naming scheme and GUID used by PGO in
the indexed profile format.
This change has a couple of benefits. The primary benefit is size
reduction in the combined index file, for example 483.xalancbmk's
combined index file was reduced by around 70%. It should also result in
memory savings for the index file in memory, as the in-memory map is
also indexed by the hash instead of the string.
Second, this enables integration with indirect call promotion, since the
indirect call profile targets are recorded using the same global naming
convention and hash. This will enable the function importer to easily
locate function summaries for indirect call profile targets to enable
their import and subsequent promotion.
The original source file name is recorded in the bitcode in a new
module-level record for use in the ThinLTO backend pipeline.
Reviewers: davidxl, joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17028
llvm-svn: 260408
Summary:
As discussed on IRC, move the ThinLTOGlobalProcessing code out of
the linker, and into TransformUtils. The name of the class is changed
to FunctionImportGlobalProcessing.
Reviewers: joker.eph, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17081
llvm-svn: 260395
Summary:
Remove the convergent attribute on any functions which provably do not
contain or invoke any convergent functions.
After this change, we'll be able to modify clang to conservatively add
'convergent' to all functions when compiling CUDA.
Reviewers: jingyue, joker.eph
Subscribers: llvm-commits, tra, jhen, hfinkel, resistor, chandlerc, arsenm
Differential Revision: http://reviews.llvm.org/D17013
llvm-svn: 260319
This pass implements whole program optimization of virtual calls in cases
where we know (via bitset information) that the list of callees is fixed. This
includes the following:
- Single implementation devirtualization: if a virtual call has a single
possible callee, replace all calls with a direct call to that callee.
- Virtual constant propagation: if the virtual function's return type is an
integer <=64 bits and all possible callees are readnone, for each class and
each list of constant arguments: evaluate the function, store the return
value alongside the virtual table, and rewrite each virtual call as a load
from the virtual table.
- Uniform return value optimization: if the conditions for virtual constant
propagation hold and each function returns the same constant value, replace
each virtual call with that constant.
- Unique return value optimization for i1 return values: if the conditions
for virtual constant propagation hold and a single vtable's function
returns 0, or a single vtable's function returns 1, replace each virtual
call with a comparison of the vptr against that vtable's address.
Differential Revision: http://reviews.llvm.org/D16795
llvm-svn: 260312
FunctionAttrs does an "optimistic" analysis of SCCs as a unit, which
means normally it is able to disregard calls from an SCC into itself.
However, calls and invokes with operand bundles are allowed to have
memory effects not fully described by the memory effects on the call
target, so we can't be optimistic around operand-bundled calls from an
SCC into itself.
llvm-svn: 260244
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
llvm-svn: 260183
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
llvm-svn: 259986
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
present in the module (generated in the frontend).
llvm-svn: 258746
Summary:
Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object.
A good example of improper behavior in the current implementation is section information associated with the GlobalObject. If a section was set for it, and GlobalOpt is creating/modifying a new object based on this one (often copying the original name), without this change new object will be placed in a default section, resulting in inappropriate properties of the new variable.
The argument here is that if customer specified a section for a variable, any changes to it that compiler does should not cause it to change that section allocation.
Moreover, any other properties worth representation in copyAttributesFrom() should also be propagated.
Reviewers: jmolloy, joker-eph, joker.eph
Subscribers: slarin, joker.eph, rafael, tobiasvk, llvm-commits
Differential Revision: http://reviews.llvm.org/D16074
llvm-svn: 258556
Summary:
Since we are currently not doing incremental importing there is
no need to link metadata as a postpass. The module linker will
only link in the imported subroutines due to the functionality
added by r256003.
(Note that the metadata postpass linking functionalitiy is still
used by llvm-link, and may be needed here in the future if a more
incremental strategy is adopted.)
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16424
llvm-svn: 258458
This patch includes the passmanagerbuilder change that enables IR level PGO instrumentation. It adds two passmanagerbuilder options: -profile-generate=<profile_filename> and -profile-use=<profile_filename>. The new options are primarily for debug purpose.
Reviewers: davidxl, silvas
Differential Revision: http://reviews.llvm.org/D15828
llvm-svn: 258420
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Loop trip counts can often be resolved during LTO. We should obviously be unrolling small loops once those trip counts have been resolved, but we weren't.
llvm-svn: 257767
The findExternalCalls routine ignores calls to functions already
defined in the dest module. This was not handling the case where
the definition in the current module is actually an alias to a
function call.
llvm-svn: 257493
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.
llvm-svn: 257191
Due to the new in-place ThinLTO symbol handling support added in
r257174, we now invoke renameModuleForThinLTO on the current
module from within the FunctionImport pass.
Additionally, renameModuleForThinLTO no longer needs to return the
Module as it is performing the renaming in place on the one provided.
This commit will be immediately preceeded by a companion clang patch to
remove its invocation of renameModuleForThinLTO.
llvm-svn: 257181
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.
Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.
llvm-svn: 257171
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
Differential Revision: http://reviews.llvm.org/D15785
llvm-svn: 257163
Summary: The example in desc should match with actual option name
Reviewers: jmolloy
Differential Revision: http://reviews.llvm.org/D15800
llvm-svn: 256951
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.
Differential Revision: http://reviews.llvm.org/D15879
llvm-svn: 256911
This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have.
Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site.
Differential Revision: http://reviews.llvm.org/D15820
llvm-svn: 256787
InlineCostAnalysis is an analysis pass without any need for it to be one.
Once it stops being an analysis pass, it doesn't maintain any useful state
and the member functions inside can be made free functions. NFC.
Differential Revision: http://reviews.llvm.org/D15701
llvm-svn: 256521
a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Differential Revision: http://reviews.llvm.org/D15676
llvm-svn: 256466
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Differential Revision: http://reviews.llvm.org/D15668
llvm-svn: 256465
This reapplies r256277 with two changes:
- In emitFnAttrCompatCheck, change FuncName's type to std::string to fix
a use-after-free bug.
- Remove an unnecessary install-local target in lib/IR/Makefile.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256304
This reapplies r252990 and r252949. I've added member function getKind
to the Attr classes which returns the enum or string of the attribute.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 256277
The code for deleting dead global variables and functions was
duplicated.
This is in preparation for also deleting dead global aliases.
llvm-svn: 256274
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.
Differential Revision: http://reviews.llvm.org/D15245
llvm-svn: 256222
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences. Does not
use binary literals.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256095
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256093
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256090
Summary:
Second patch split out from http://reviews.llvm.org/D14752.
Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.
This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.
Depends on D14825.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14838
llvm-svn: 255909
As of r255720, the loop pass manager will DTRT when passes update the
loop info for removed loops, so they no longer need to reach into
LPPassManager APIs to do this kind of transformation. This change very
nearly removes the need for the LPPassManager to even be passed into
loop passes - the only remaining pass that uses the LPM argument is
LoopUnswitch.
llvm-svn: 255797
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.
llvm-svn: 255693
This patch does two things:
1. mem2reg is now run immediately after globalopt. Now that globalopt
can localize variables more aggressively, it makes sense to lower
them to SSA form earlier rather than later so they can benefit from
the full set of optimization passes.
2. More scalar optimizations are run after the loop optimizations in
LTO mode. The loop optimizations (especially indvars) can clean up
scalar code sufficiently to make it worthwhile running more scalar
passes. I've particularly added SCCP here as it isn't run anywhere
else in the LTO pass pipeline.
Mem2reg is super cheap and shouldn't affect compilation time at all. The
rest of the added passes are in the LTO pipeline only so doesn't affect
the vast majority of compilations, just the link step.
llvm-svn: 255634
This patch converts code that has access to a LLVMContext to not take a
diagnostic handler.
This has a few advantages
* It is easier to use a consistent diagnostic handler in a single program.
* Less clutter since we are not passing a handler around.
It does make it a bit awkward to implement some C APIs that return a
diagnostic string. I will propose new versions of these APIs and
deprecate the current ones.
llvm-svn: 255571
DenseMap is the wrong data structure to use for sample records and call
sites. The keys are too large, causing massive core memory growth when
reading profiles.
Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory. By switching to std::map, the compiler now grows to
300Mb in memory.
There still are some opportunities for memory footprint reduction. I'll
be looking at those next.
llvm-svn: 255389
Added some missing spaces between the module identifier and the start of
the debug message. Also added a ":" after the module identifier to make
this look a little nicer.
llvm-svn: 255259
- This simplifies the CallSite class, arg_begin / arg_end are now
simple wrapper getters.
- In several places, we were creating CallSite instances solely to call
arg_begin and arg_end. With this change, that's no longer required.
llvm-svn: 255226
loading the source Module, linking the function in the destination
module, and destroying the source Module before repeating with the
next function to import (potentially from the same Module).
Ideally we would keep the source Module alive and import the next
Function needed from this Module. Unfortunately this is not possible
because the linker does not leave it in a usable state.
However we can do better by first computing the list of all candidates
per Module, and only then load the source Module and import all the
function we need for it.
The trick to process callees is to materialize function in the source
module when building the list of function to import, and inspect them
in their source module, collecting the list of callees for each
callee.
When we move the the actual import, we will import from each source
module exactly once. Each source module is loaded exactly once.
The only drawback it that it requires to have all the lazy-loaded
source Module in memory at the same time.
Currently this patch already improves considerably the link time,
a multithreaded link of llvm-dis on my laptop was:
real 1m12.175s user 6m32.430s sys 0m10.529s
and is now:
real 0m40.697s user 2m10.237s sys 0m4.375s
Note: this is the full link time (linker+Import+Optimizer+CodeGen)
Differential Revision: http://reviews.llvm.org/D15178
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255100
For an invoke with operand bundles, the [op_begin(), op_end()-3] range
can contain things other than invoke arguments. This change teaches
PruneEH to use arg_begin() and arg_end() explicitly.
llvm-svn: 255073
Summary:
Add a field on the PassManagerBuilder that clang or gold can use to pass
down a pointer to the function index in memory to use for importing when
the ThinLTO backend is triggered. Add support to supply this to the
function import pass.
Reviewers: joker.eph, dexonsmith
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D15024
llvm-svn: 254926
For efficiency reason, when importing multiple functions for the same Module,
we can avoid reparsing it every time.
Differential Revision: http://reviews.llvm.org/D15102
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 254486
When linking static archive, there is no individual module files to
load. Instead they can be mmap'ed and could be initialized from a
buffer directly. The callback provide flexibility to override the
scheme for loading module from the summary.
Differential Revision: http://reviews.llvm.org/D15101
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 254479
This fixes buildbots in systems that std::to_string is not present. It
also tidies the output of the diagnostic to render doubles a bit better
(thanks Ben Kramer for help with string streams and format).
llvm-svn: 254261
This adds two thresholds to the sample profiler to affect inlining
decisions: the concept of global hotness and coldness.
Functions that have accumulated more than a certain fraction of samples at
runtime, are annotated with the InlineHint attribute. Conversely,
functions that accumulate less than a certain fraction of samples, are
annotated with the Cold attribute.
This is very similar to the hints emitted by Clang when using
instrumentation profiles.
Notice that this is a very blunt instrument. A function may have
globally collected a significant fraction of samples, but that does not
necessarily mean that every callsite for that function is hot.
Ideally, we would annotate each callsite with the samples collected at
that callsite. This way, the inliner can incorporate all these weights
into its cost model.
Once the inliner offers this functionality, we can change the hints
emitted here to a more precise per-callsite annotation. For now, this is
providing some measure of speedups with our internal benchmarks. I've
observed speedups of up to 23% (though the geo mean is about 3%). I expect
these numbers to improve as the inliner gets better annotations.
llvm-svn: 254212
Based on testing of internal benchmarks, I'm lowering this threshold to
a value of 0.1%. This means that SamplePGO will respect 99.9% of the
original inline decisions when following a profile.
The performance difference is noticeable in some tests. With the
previous threshold, the speedups over baseline -O2 was about 0.63%. With
the new default, the speedups are around 3% on average.
The point of this threshold is not to do more aggressive inlining. When
an inlined callsite crosses this threshold, SamplePGO will redo the
inline decision so that it can better apply the input profile.
By respecting most original inline decisions, we can apply more of the
input profile because the shape of the code follows the profile more
closely.
In the next series, I'll be looking at adding some inline hints for the
cold callsites and for toplevel functions that are hot/cold as well.
llvm-svn: 254211
They are as much trouble as aliases to declarations. They are requiring
the code generator to define a symbol with the same value as another
symbol, but the second symbol is undefined.
If representing this is important for some optimization, we could add
support for available_externally aliases. They would be *required* to
point to a declaration (or available_externally definition).
llvm-svn: 254170
Add a simple initial heuristic to control importing based on the number
of instructions recorded in the function's summary. Add option to
control the limit, and test using option.
llvm-svn: 254036
When the original binary is executed and sampled, the resulting profile
contains information on the original inline stack. We currently follow
the original inline plan if we notice that the inlined callsite has more
than 0 samples to it.
A better way is to determine whether the callsite is actually worth
inlining. If the callsite accumulates a small fraction of the samples
spent in the parent function, then we don't want to bother inlining it
(as it means that the callsite is actually cold).
This patch introduces a threshold expressed in percentage of samples
in relation to the parent function. If the callsite uses less than N%
of the total samples used by its parent, the original inline decision is
not re-applied.
I've set the threshold to the very arbitrary value of 5%. I'm yet to do
any actual experiments to see what's a good value. I wanted to separate
the basic mechanism from the tuning.
llvm-svn: 254034
This patch implements a minimum spanning tree (MST) based instrumentation for
PGO. The use of MST guarantees minimum number of CFG edges getting
instrumented. An addition optimization is to instrument the less executed
edges to further reduce the instrumentation overhead. The patch contains both the
instrumentation and the use of the profile to set the branch weights.
Differential Revision: http://reviews.llvm.org/D12781
llvm-svn: 254021
Analyze imported function bodies and add any new external calls to
the worklist for importing. Currently no controls on the importing
so this will end up importing everything possible in the call tree
below the importing module. Basic profitability checks coming next.
Update test to check for iteratively inlined functions.
llvm-svn: 254011
Skip imports for weak_any aliases as well. Fix the test to check
non-import of weak aliases and functions, and import of normal alias.
llvm-svn: 253991
Summary:
This is a helper to perform cross-module import for ThinLTO. Right now
it is importing naively every possible called functions.
Reviewers: tejohnson
Subscribers: dexonsmith, llvm-commits
Differential Revision: http://reviews.llvm.org/D14914
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 253954
The existing coverage tracker counts the number of records that were used
from the input profile. An alternative view of coverage is to check how
many available samples were applied.
This way, if the profile contains several records with few samples, it
doesn't really matter much that they were not applied. The more
interesting records to apply are the ones that contribute many samples.
llvm-svn: 253912
If a function was originally inlined but not actually hot at runtime,
its samples will not be counted inside the parent function. This throws
off the coverage calculation because it expects to find more used
records than it should.
Fixed by ignoring functions that will not be inlined into the parent.
Currently, this is inlined functions with 0 samples. In subsequent
patches, I'll change this to mean "cold" functions.
llvm-svn: 253716
This reverts r253661.
Turns out that the assignment is not redundant (despite the Clang static analyzer claiming the opposite).
The variable is being used by the lambda function AddUsersToWorklistIfCapturing().
llvm-svn: 253696
While debugging some sampling coverage problems, I found this useful:
When applying samples from a profile, it helps to also know what line
offset and discriminator the sample belongs to. This makes it easy to
correlate against the input profile.
llvm-svn: 253670
We currently bail out of global localization if the global has non-instruction users. However, often these can be simple bitcasts or constant-GEPs, which we can easily turn into instructions before localizing. Be a bit more aggressive.
llvm-svn: 253584
This provides a way to force a function to have certain attributes from the command line. This can be useful when debugging or doing workload exploration, where manually editing IR is tedious or not possible (due to build systems etc).
The syntax is -force-attribute=function_name:attribute_name
All function attributes are parsed except alignstack as it requires an argument.
llvm-svn: 253550
While setting function attributes we check all instructions that may access memory. For a call instruction we check all arguments. The special check is required for pointers.
I added vector-of-pointers to the call arguments types that should be checked.
Differential Revision: http://reviews.llvm.org/D14693
llvm-svn: 253363
Address Duncan Exon Smith's comments on D14148, which was added after the patch had been LGTM'd and committed:
* clang-format one area where whitespace diffs occurred.
* Add a threshold to limit the store/load dominance checks as they are quadratic.
llvm-svn: 253192
Global to local demotion can speed up programs that use globals a lot. It is particularly useful with LTO, when the entire call graph is known and most functions have been internalized.
For a global to be demoted, it must only be accessed by one function and that function:
1. Must never recurse directly or indirectly, else the GV would be clobbered.
2. Must never rely on the value in GV at the start of the function (apart from the initializer).
GlobalOpt can already do this, but it is hamstrung and only ever tries to demote globals inside "main", because C++ gives extra guarantees about how main is called - once and only once.
In LTO mode, we can often prove the first property (if the function is internal by this point, we know enough about the callgraph to determine if it could possibly recurse). FunctionAttrs now infers the "norecurse" attribute for this reason.
The second property can be proven for a subset of functions by proving that all loads from GV are dominated by a store to GV. This is conservative in the name of compile time - this only requires a DominatorTree which is fairly cheap in the grand scheme of things. We could do more fancy stuff with MemoryDependenceAnalysis too to catch more cases but this appears to catch most of the useful ones in my testing.
llvm-svn: 253168
This reapplies r252949. I've changed the type of FuncName to be
std::string instead of StringRef in emitFnAttrCompatCheck.
Original commit message for r252949:
Provide a way to specify inliner's attribute compatibility and merging
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252990
rules using table-gen. NFC.
This commit adds new classes CompatRule and MergeRule to Attributes.td,
which are used to generate code to check attribute compatibility and
merge attributes of the caller and callee.
rdar://problem/19836465
llvm-svn: 252949
A function can be marked as norecurse if:
* The SCC to which it belongs has cardinality 1; and either
a) It does not call any non-norecurse function. This includes self-recursion; or
b) It only has one callsite and the function that callsite is within is marked norecurse.
a) is best propagated bottom-up and b) is best propagated top-down.
We build up the norecurse attributes bottom-up using the existing SCC pass, and mark functions with no obvious recursion (but not provably norecurse) to sweep later, top-down.
llvm-svn: 252862
When working with tokens, it is often the case that one has instructions
which consume a token and produce a new token. Currently, we have no
mechanism to represent an initial token state.
Instead, we can create a notional "empty token" by inventing a new
constant which captures the semantics we would like. This new constant
is called ConstantTokenNone and is written textually as "token none".
Differential Revision: http://reviews.llvm.org/D14581
llvm-svn: 252811
When GlobalOpt splits an internal, global variable with an aggregate type, it
should propagate the externally_initialized flag to the newly created globals.
This makes the pass safe for our downstream use of this flag, while still
allowing some useful optimisations (such as removing dead parts of the split
aggregate) to be performed.
Differential Revision: http://reviews.llvm.org/D13382
llvm-svn: 252490
Summary:
Teach the FunctionAttrs to do the right thing for IR with operand
bundles.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14408
llvm-svn: 252387
Summary:
This change fixes an iterator wraparound bug in
`determinePointerReadAttrs`.
Ideally, ++'ing off the `end()` of an iplist should result in a failed
assert, but currently iplist seems to silently wrap to the head of the
list on `end()++`. This is why the bad behavior is difficult to
demonstrate.
Reviewers: chandlerc, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14350
llvm-svn: 252386
Some implicit ilist iterator conversions have crept back into Analysis,
Transforms, Hexagon, and llvm-stress. This removes them.
I'll commit a patch immediately after this to disallow them (in a
separate patch so that it's easy to revert if necessary).
llvm-svn: 252371
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Differential Revision: http://reviews.llvm.org/D14265
llvm-svn: 252219
Summary:
Remove the loop over the uses of the CallSite in ArgumentUsesTracker.
Since we have the `Use *` for actual argument operand, we can just use
pointer subtraction.
The time complexity remains the same though (except for a vararg
argument) -- `std::advance` is O(UseIndex) for the ArgumentList
iterator.
The real motivation is to make a later change adding support for operand
bundles simpler.
Reviewers: reames, chandlerc, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14363
llvm-svn: 252141
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop. E.g.:
for (i)
A[i + 1] = A[i] + B[i]
=>
T = A[0]
for (i)
T = T + B[i]
A[i + 1] = T
The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load. Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.
This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning. As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here. Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.
In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads. I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.
The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop. Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%). This gain also transfers over to x86: it's
around 8-10%.
Right now the pass is off by default and can be enabled
with -enable-loop-load-elim. On the LNT testsuite, there are two
performance changes (negative number -> improvement):
1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
critical paths is reduced
2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
outside of LNT
The pass is scheduled after the loop vectorizer (which is after loop
distribution). The rational is to try to reuse LAA state, rather than
recomputing it. The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.
LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences). LAA is known to omit loop-independent
dependences in certain situations. The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.
Reviewers: dberlin, hfinkel
Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13259
llvm-svn: 252017
This reverts commit r251837, due to a number of bot failures of the form:
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to
'llvm::object::FunctionIndexObjectFile::create(llvm::MemoryBufferRef,
llvm::LLVMContext&, llvm::Module const*, bool)'
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to 'llvm::object::FunctionIndexObjectFile::takeIndex()'
I'm not sure why these are happening - I added Object to the requred
libraries in tools/llvm-link/LLVMBuild.txt and the LLVM_LINK_COMPONENTS
in tools/llvm-link/CMakeLists.txt. Confirmed for my build that these
symbols come out of libLLVMObject.a. What am I missing?
llvm-svn: 251841
Summary:
Support for necessary linkage changes and symbol renaming during
ThinLTO function importing.
Also includes llvm-link support for manually importing functions
and associated llvm-link based tests.
Note that this does not include support for intelligently importing
metadata, which is currently imported duplicate times. That support will
be in the follow-on patch, and currently is ignored by the tests.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: tobiasvk, tejohnson, llvm-commits
Differential Revision: http://reviews.llvm.org/D13515
llvm-svn: 251837
The initial coverage checking code for sample records failed to count
records inside inlined profiles. This change fixes the oversight.
llvm-svn: 251752
from its pass harness by providing a lambda to query for AA results.
This allows the legacy pass to easily provide a lambda that uses the
special helpers to construct function AA results from a legacy CGSCC
pass. With the new pass manager (the next patch) the lambda just
directly wraps the intuitive query API.
llvm-svn: 251715
transformations in FunctionAttrs rather than building a new one each
time.
This isn't trivial because there are different heuristics from different
passes for exactly what set they want. The primary difference is whether
an *overridable* function completely disables the synthesis of
attributes. I've modeled this by directly testing for overridable, and
using the common set that excludes external and opt-none functions.
This does cause some changes by disabling more optimizations in the face
of opt-none. Specifically, we were still optimizing *calls* to opt-none
functions based on their attributes, just not the bodies. It seems
better to be conservative on both fronts given the intended semanticas
here (best effort to not assume or disturb anything). I've not tried to
test this change as it seems complex, brittle, and not important to the
implicit contract of opt-none. Instead, it seems more like a choice that
should be dictated by the simplified implementation and the change to be
acceptable differences within the space of opt-none.
A big benefit here is that these transformations no longer rely on the
legacy pass manager's SCC types, they just work on generic sets of
function pointers. This will make it easy to re-use their logic in the
new pass manager.
I've also made the transforms static functions instead of members where
trivial while I was touching the signatures.
llvm-svn: 251640
This adds the flag -mllvm -sample-profile-check-coverage=N to the
SampleProfile pass. N is the percent of input sample records that the
user expects to apply. If the pass does not use N% (or more) of the
sample records in the input, it emits a warning.
This is useful to detect some forms of stale profiles. If the code has
drifted enough from the original profile, there will be records that do
not match the IR anymore.
This will not detect cases where a sample profile record for line L is
referring to some other instructions that also used to be at line L.
llvm-svn: 251568
The pass was keeping around a lot of per-function data (visited blocks,
edges, dominance, etc) that is just taking up memory for no reason. In
fact, from function to function it could potentially confuse the
propagator since some maps are indexed by line offsets which can be
common between functions.
llvm-svn: 251531
I think these were affected by a change way back when to stop printing newlines in Value::dump() by default. This change simply allows the debug output to be readable.
NFC.
llvm-svn: 251517
When emitting a remark for a conditional branch annotation, the remark
uses the line location information of the conditional branch in the
message. In some cases, that information is unavailable and the
optimization would segfaul. I'm still not sure whether this is a bug or
WAI, but the optimizer should not die because of this.
llvm-svn: 251420
No functionality changed here, but the indentation is substantially
reduced and IMO the code is much easier to read. I've also added some
helpful comments.
This is just a clean-up I wrote while studying the code, and that has
been in my backlog for a while.
llvm-svn: 251381
This adds a couple of optimization remarks to the SamplePGO
transformation. When it decides to inline a hot function (to mimic the
inline stack and repeat useful inline decisions in the original build).
It will also report branch destinations. For instance, given the code
fragment:
6 if (i < 1000)
7 sum -= i;
8 else
9 sum += -i * rand();
If the 'else' branch is taken most of the time, building this code with
-Rpass=sample-profile will produce:
a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile]
sum += -i * rand();
^
llvm-svn: 251330
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.
http://reviews.llvm.org/D13914
llvm-svn: 250873
This adjusts all integers in the reader/writer to reflect the types
stored on profile files. They should all be unsigned 32-bit or 64-bit
values. Changed all associated internal types to be uint32_t or
uint64_t.
The only place that needed some adjustments is in the sample profile
transformation. Altough the weight read from the profile are 64-bit
values, the internal API for branch weights only accepts 32-bit values.
The pass now saturates weights that overflow uint32_t.
llvm-svn: 250427
Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again.
Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off.
llvm-svn: 250157
GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.
llvm-svn: 250035
http://reviews.llvm.org/D13576
As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol.
llvm-svn: 249848
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.
rdar://22971893
llvm-svn: 249327
Place new and update dbg.declare calls immediately after the
corresponding alloca.
Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).
llvm-svn: 248769
Patch by Jake VanAdrighem!
Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.
This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.
Reviewers: silvas
Subscribers: silvas, llvm-commits
Differential Revision: http://reviews.llvm.org/D12851
llvm-svn: 248728
Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D13064
llvm-svn: 248460
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value. However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type. We achieved this by inserting an insertvalue
instruction at the invoke's normal successor. However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.
Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block. The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.
This fixes PR24906.
llvm-svn: 248387
evaluate whether 'readonly' or 'readnone' apply to a given function.
This both reduces indentation and will make it easy to share the logic
with a new pass manager implementation.
llvm-svn: 248181
This was a flawed change - it just caused the getElementType call to be
deferred until later, when we really need to remove it. Now that the IR
for GlobalAliases has been updated, the root cause is addressed that way
instead and this change is no longer needed (and in fact gets in the way
- because we want to pass the pointee type directly down further).
Follow up patches to push this through GlobalValue, bitcode format, etc,
will come along soon.
This reverts commit 236160.
llvm-svn: 247585
GetElementPointers must have the first argument's type compared
for structural equivalence. Previously the code erroneously compared the
pointer's type, but this code was dead because all pointer types (of the
same address space) are the same. The pointee must be compared instead
(using the type stored in the GEP, not from the pointer type which will
be erased anyway).
Author: jrkoenig
Reviewers: dschuff, nlewycky, jfb
Subscribers: nlewycky, llvm-commits
Differential revision: http://reviews.llvm.org/D12820
llvm-svn: 247570
of a method and into a re-usable static helper. We can potentially use
this function from the implementation of a new pass manager oriented
version of the pass. Also add some better documentation of exactly what
the semantic model of this routine is (it isn't trivial) and use a more
modern naming convention for it.
llvm-svn: 247524
static function rather than a method. It just needed access to
TargetLibraryInfo, and this way it can be easily reused between the
current FunctionAttrs implementation and any port for the new pass
manager.
llvm-svn: 247522
comments, deleting duplicate comments, moving comments to consistently
live on the definition since these are all really internal routines,
etc. NFC.
llvm-svn: 247520
This change correctly sets the attributes on the callsites
generated in thunks. This makes sure things such as sret, sext, etc.
are correctly set, so that the call can be a proper tailcall.
Also, the transfer of attributes in the replaceDirectCallers function
appears to be unnecessary, but until this is confirmed it will remain.
Author: jrkoenig
Reviewers: dschuff, jfb
Subscribers: llvm-commits, nlewycky
Differential revision: http://reviews.llvm.org/D12581
llvm-svn: 247313
Visit disjoint sets in a deterministic order based on the maximum BitSetNM
index, otherwise the order in which we visit them will depend on pointer
comparisons. This was being exposed by MSan.
llvm-svn: 247201
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
This change extends the bitset lowering pass to support bitsets that may
contain either functions or global variables. A function bitset is lowered to
a jump table that is laid out before one of the functions in the bitset.
Also add support for non-string bitset identifier names. This allows for
distinct metadata nodes to stand in for names with internal linkage,
as done in D11857.
Differential Revision: http://reviews.llvm.org/D11856
llvm-svn: 247080
Summary:
This patch introduces a side table in Merge Functions to
efficiently remove functions from the function set when functions
they refer to are merged. Previously these functions would need to
be compared lg(N) times to find the appropriate FunctionNode in the
tree to defer. With the recent determinism changes, this comparison
is more expensive. In addition, the removal function would not always
actually remove the function from the set (i.e. after remove(F),
there would sometimes still be a node in the tree which contains F).
With these changes, these functions are properly deferred, and so more
functions can be merged. In addition, when there are many merged
functions (and thus more deferred functions), there is a speedup:
chromium: 48678 merged -> 49380 merged; 6.58s -> 5.49s
libxul.so: 41004 merged -> 41030 merged; 8.02s -> 6.94s
mysqld: 1607 merged -> 1607 merged (same); 0.215s -> 0.212s (probably noise)
Author: jrkoenig
Reviewers: jfb, dschuff
Subscribers: llvm-commits, nlewycky
Differential revision: http://reviews.llvm.org/D12537
llvm-svn: 246735
Teach FunctionAttr to infer the nonnull attribute on return values of functions which never return a potentially null value. This is done both via a conservative local analysis for the function itself and a optimistic per-SCC analysis. If no function in the SCC returns anything which could be null (other than values from other functions in the SCC), we can conclude no function returned a null pointer. Even if some function within the SCC returns a null pointer, we may be able to locally conclude that some don't.
Differential Revision: http://reviews.llvm.org/D9688
llvm-svn: 246476
Summary:
This patch removes two remaining places where pointer value comparisons
are used to order functions: comparing range annotation metadata, and comparing
block address constants. (These are both rare cases, and so no actual
non-determinism was observed from either case).
The fix for range metadata is simple: the annotation always consists of a pair
of integers, so we just order by those integers.
The fix for block addresses is more subtle. Two constants are the same if they
are the same basic block in the same function, or if they refer to corresponding
basic blocks in each respective function. Note that in the first case, merging
is trivially correct. In the second, the correctness of merging relies on the
fact that the the values of block addresses cannot be compared. This change is
actually an enhancement, as these functions could not previously be merged (see
merge-block-address.ll).
There is still a problem with cross function block addresses, in that constants
pointing to a basic block in a merged function is not updated.
This also more robustly compares floating point constants by all fields of their
semantics, and fixes a dyn_cast/cast mixup.
Author: jrkoenig
Reviewers: dschuff, nlewycky, jfb
Subscribers llvm-commits
Differential revision: http://reviews.llvm.org/D12376
llvm-svn: 246305
The problem here were the function analyses invoked by the function pass
manager from the new IPO pass. I looked at other IPO passes needing
dominance information and the only one that requires it (partial
inliner) does not use the standard dependency mechanism.
This patch mimics what the partial inliner does to compute dominance,
post-dominance and loop info. One thing I like about this approach is
that I can delay the computation of all this until I actually need it.
This should bring the ASAN buildbot back to green. If there's a better
way to fix this, I'll do it in a follow-up patch.
llvm-svn: 246066
Summary: When comparing basic blocks, there is an additional check that two Value*'s should have the same ID, which interferes with merging equivalent constants of different kinds (such as a ConstantInt and a ConstantPointerNull in the included testcase). The cmpValues function already ensures that the two values in each function are the same, so removing this check should not cause incorrect merging.
Also, the type comparison is redundant, based on reviewing the code and testing on the test suite and several large LTO bitcodes.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D12302
llvm-svn: 246001
Eventually, we will need sample profiles to be incorporated into the
inliner's cost models. To do this, we need the sample profile pass to
be a module pass.
This patch makes no functional changes beyond the mechanical adjustments
needed to run SampleProfile as a module pass.
llvm-svn: 245940
It doesn't solve the problem, when for example we load something, and
then assume that it is the same as some constant value, because
globalopt will fail on unknown load instruction. The proposed solution
would be to skip some instructions that we can't evaluate and they are
safe to skip (f.e. load, assume and many others) and see if they are
required to perform optimization (f.e. we don't care about ephemeral
instructions that may appear using @llvm.assume())
http://reviews.llvm.org/D12266
llvm-svn: 245919
TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree.
There is no reason not to require it up-front for SROA.
Some history is necessary to understand why we ended-up here.
r123437 switched the second (Legacy)SROA in the optimizer pipeline to
use SSAUpdater in order to avoid recomputing the costly
DominanceFrontier. The purpose was to speed-up the compile-time.
Later r123609 removed the need for the DominanceFrontier in
(Legacy)SROA.
Right after, some cleanup was made in r123724 to remove any reference
to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and
SROA_DT (the latter replacing SROA_DF).
The second argument of `createScalarReplAggregatesPass` was renamed
from `UseDomFrontier` to `UseDomTree`.
I believe this is were a mistake was made. The pipeline was not
updated and the call site was still:
PM->add(createScalarReplAggregatesPass(-1, false));
At that time, SROA was immediately followed in the pipeline by
EarlyCSE which required alread the DominatorTree. Not requiring
the DominatorTree in SROA didn't save anything, but unfortunately
it was lost at this point.
When the new SROA Pass was introduced in r163965, I believe the goal
was to have an exact replacement of the existing SROA, this bug
slipped through.
You can see currently:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
SROA
Dominator Tree Construction
Early CSE
After this patch:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
Dominator Tree Construction
SROA
Early CSE
This improves the compile time from 88s to 23s for PR17855.
https://llvm.org/bugs/show_bug.cgi?id=17855
And from 113s to 12s for PR16756
https://llvm.org/bugs/show_bug.cgi?id=16756
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D12267
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245820
Summary:
Merge functions previously relied on unsigned comparisons of pointer values to
order functions. This caused observable non-determinism in the compiler for
large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces
different hashes when run repeatedly on the same machine. Differing output was
observed on three large bitcodes, but it was less frequent on the smallest file.
It is possible that this only manifests on the large inputs, hence remaining
undetected until now.
This patch fixes this by removing (almost, see below) all places where
comparisons between pointers are used to order functions. Most of these changes
are local, but the comparison of global values requires assigning an identifier
to each local in the order it is visited. This is very similar to the way the
comparison function identifies Value*'s defined within a function. Because the
order of visiting the functions and their subparts is deterministic, the
identifiers assigned to the globals will be as well, and the order of functions
will be deterministic.
With these changes, there is no more observed non-determinism. There is also
only minor slowdowns (negligible to 4%) compared to the baseline, which is
likely a result of the fact that global comparisons involve hash lookups and not
just pointer comparisons.
The one caveat so far is that programs containing BlockAddress constants can
still be non-deterministic. It is not clear what the right solution is here. In
particular, even if the global numbers are used to order by function, we still
need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out
and fail to order the functions or consider them equal, because we require a
total order over functions. Note that programs with BlockAddress constants are
relatively rare, so the impact of leaving this in is minor as long as this pass
is opt-in.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: jevinskie, llvm-commits, chapuni
Differential revision: http://reviews.llvm.org/D12168
llvm-svn: 245762
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.
The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.
The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.
The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.
The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.
Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.
Author: jrkoenig
Reviewers: nlewycky, dschuff, jfb
Subscribers: llvm-commits, aemerson
Differential revision: http://reviews.llvm.org/D11923
llvm-svn: 245140
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Differential Revision: http://reviews.llvm.org/D11861
llvm-svn: 245029
its creation function.
This required shifting a bunch of method definitions to be out-of-line
so that we could leave most of the implementation guts in the .cpp file.
llvm-svn: 245021
I've used forward declarations and reorderd the source code some to make
this reasonably clean and keep as much of the code as possible in the
source file, including all the stratified set details. Just the basic AA
interface and the create function are in the header file, and the header
file is now included into the relevant locations.
llvm-svn: 245009
Summary:
For LTO we need to enable this pass in the LTO pipeline,
as it is skipped during the "-flto -c" compile step (when PrepareForLTO is
set).
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11919
llvm-svn: 244622
This is the first mechanical step in preparation for making this and all
the other alias analysis passes available to the new pass manager. I'm
factoring out all the totally boring changes I can so I'm moving code
around here with no other changes. I've even minimized the formatting
churn.
I'll reformat and freshen comments on the interface now that its located
in the right place so that the substantive changes don't triger this.
llvm-svn: 244197
The only place that tries to return a CallGraph by value
(CallGraphAnalysis::run) doesn't seem to be used right now, but it's a
reasonable bit of cleanup anyway.
llvm-svn: 244122
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
This change was done as an audit and is by inspection. The new EH
system is still very much a work in progress. NFC for the landingpad
case.
llvm-svn: 243965
Summary:
This threshold limited FunctionAttrs ability to prove arguments to be read-only.
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.
Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.
Patched by Xuetian Weng.
[1] https://github.com/vetter/shoc
Reviewers: nlewycky, jingyue, nicholas
Subscribers: nicholas, test, llvm-commits
Differential Revision: http://reviews.llvm.org/D11311
llvm-svn: 243141
We had a few places where we did
for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
but those could instead do
for (auto *EltTy : STy->elements()) {
llvm-svn: 243136
the general GMR-in-non-LTO flag.
Without this, we have the global information during the CGSCC pipeline
for GVN and such, but don't have it available during the late loop
optimizations such as the vectorizer. Moreover, after the CGSCC pipeline
has finished we have substantially more accurate and refined call graph
information, function annotations, etc, which will make GMR even more
powerful than it is early in the pipelien.
Note that we have to play silly games with preserving AliasAnalysis
(which is now trivially preserved) in order to let a module analysis
magically be preserved into the entire function pass pipeline.
Simultaneously we have to not make GMR an immutable pass in order to be
able to re-run it and collect fresh data on the final call graph.
llvm-svn: 242999
preparation for de-coupling the AA implementations.
In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.
I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.
I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.
Differential Revision: http://reviews.llvm.org/D10564
llvm-svn: 242963
Summary:
While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.
Running globalopt took around ~9 minutes. The slowdown came from how GlobalOpt merged stores from static constructors individually into the global initializer in EvaluateStaticConstructor. For each store it discovered and wanted to commit, it would copy the existing global initializer and then merge in the individual store. I changed this so that stores are now grouped by global, and sorted from most significant to least significant by their GEP indexes (e.g. a store to GEP 0, 0 comes before GEP 0, 0, 1). With this representation, the existing initializer can be copied and all new stores merged into it in a single pass.
With this patch and http://reviews.llvm.org/D11198, the lookup table that was taking ~10 minutes to compile now compiles in around 5 seconds. I've ran 'make check' and the test-suite, which all passed.
I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.
Reviewers: chandlerc, nlewycky
Subscribers: nlewycky, llvm-commits
Differential Revision: http://reviews.llvm.org/D11200
llvm-svn: 242935
pipeline.
Even before I started improving its runtime, it was already crazy fast
once the call graph exists, and if we can get it to be conservatively
correct, will still likely catch a lot of interesting and useful cases.
So it may well be useful to enable by default.
But more importantly for me, this should make it easier for me to test
that changes aren't breaking it in fundamental ways by enabling it for
normal builds.
llvm-svn: 242895
part of simplifying its interface and usage in preparation for porting
to work with the new pass manager.
Note that this will likely expose that we have dead arguments, members,
and maybe even pass requirements for AA. I'll be cleaning those up in
seperate patches. This just zaps the actual update API.
Differential Revision: http://reviews.llvm.org/D11325
llvm-svn: 242881
We insert a bitcast which obfuscates the getCalledFunction for the utility
function which looks up attributes from the called function. Loosing ABI
changing parameter attributes is a bad thing.
rdar://21516488
llvm-svn: 242807
Not sure if the optimizer will save the call as getCalledFunction()
is not a trivial access function but the code is clearer this way.
llvm-svn: 242641
Internalizing an individual comdat group member without also internalizing
the other members of the comdat can break comdat semantics. For example,
if a module contains a reference to an internalized comdat member, and the
linker chooses a comdat group from a different object file, this will break
the reference to the internalized member.
This change causes the internalizer to only internalize comdat members if all
other members of the comdat are not externally visible. Once a comdat group
has been fully internalized, there is no need to apply comdat rules to its
members; later optimization passes (e.g. globaldce) can legally drop individual
members of the comdat. So we drop the comdat attribute from all comdat members.
Differential Revision: http://reviews.llvm.org/D10679
llvm-svn: 242423
Self-referential constants containing references to a merged function
no longer cause the MergeFunctions pass to infinite loop. Also adds a
reproduction IR which would otherwise fail, which was isolated from a similar
issue in Chromium.
Author: jrkoenig
Reviewers: nlewycky, jfb
Subscribers: llvm-commits, nlewycky, jfb
Differential Revision: http://reviews.llvm.org/D11208
llvm-svn: 242337
No in-tree alias analysis used this facility, and it was not called in
any particularly rigorous way, so it seems unlikely to be correct.
Note that one of the only stateful AA implementations in-tree,
GlobalsModRef is completely broken currently (and any AA passes like it
are equally broken) because Module AA passes are not effectively
invalidated when a function pass that fails to update the AA stack runs.
Ultimately, it doesn't seem like we know how we want to build stateful
AA, and until then trying to support and maintain correctness for an
untested API is essentially impossible. To that end, I'm planning to rip
out all of the update API. It can return if and when we need it and know
how to build it on top of the new pass manager and as part of *tested*
stateful AA implementations in the tree.
Differential Revision: http://reviews.llvm.org/D10889
llvm-svn: 241975
After changes in rL231820 loop re-rotation is performed even in -Oz mode. Since loop rotation is disabled for -Oz, it seems loop re-rotation should be disabled too.
Differential Revision: http://reviews.llvm.org/D10961
llvm-svn: 241897
This change includes a fix for https://code.google.com/p/chromium/issues/detail?id=499508#c3,
which required updating the visibility for symbols with eliminated definitions.
--Original Commit Message--
Add new EliminateAvailableExternally module pass, which is performed in
O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
llvm-svn: 241466
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
The PruneEH pass tries to annotate functions as 'noreturn' if it doesn't
see a ReturnInst. However, a naked function containing inline assembly
can contain control flow leaving the function.
This fixes PR23971.
llvm-svn: 240876
It is possible for a global to be substituted with another global of a
different type or a different kind (i.e. an alias) at IR link time. One
example of this scenario is when a Microsoft ABI vtable is substituted with
an alias referring to a larger vtable containing an RTTI reference.
This will cause the global to be RAUW'd with a possibly bitcasted reference
to the other global. This will of course also affect any references to the
global in bitset metadata.
The right way to handle such metadata is simply to ignore it. This is sound
because the linked module should contain another copy of the bitset entries as
applied to the new global.
llvm-svn: 240866
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
This is now living in MemoryLocation, which is what it pertains to. It
is also an enum there rather than a static data member which is left
never defined.
llvm-svn: 239886
that it is its own entity in the form of MemoryLocation, and update all
the callers.
This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.
llvm-svn: 239885
This patch adds the safe stack instrumentation pass to LLVM, which separates
the program stack into a safe stack, which stores return addresses, register
spills, and local variables that are statically verified to be accessed
in a safe way, and the unsafe stack, which stores everything else. Such
separation makes it much harder for an attacker to corrupt objects on the
safe stack, including function pointers stored in spilled registers and
return addresses. You can find more information about the safe stack, as
well as other parts of or control-flow hijack protection technique in our
OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf)
and our project website (http://levee.epfl.ch).
The overhead of our implementation of the safe stack is very close to zero
(0.01% on the Phoronix benchmarks). This is lower than the overhead of
stack cookies, which are supported by LLVM and are commonly used today,
yet the security guarantees of the safe stack are strictly stronger than
stack cookies. In some cases, the safe stack improves performance due to
better cache locality.
Our current implementation of the safe stack is stable and robust, we
used it to recompile multiple projects on Linux including Chromium, and
we also recompiled the entire FreeBSD user-space system and more than 100
packages. We ran unit tests on the FreeBSD system and many of the packages
and observed no errors caused by the safe stack. The safe stack is also fully
binary compatible with non-instrumented code and can be applied to parts of
a program selectively.
This patch is our implementation of the safe stack on top of LLVM. The
patches make the following changes:
- Add the safestack function attribute, similar to the ssp, sspstrong and
sspreq attributes.
- Add the SafeStack instrumentation pass that applies the safe stack to all
functions that have the safestack attribute. This pass moves all unsafe local
variables to the unsafe stack with a separate stack pointer, whereas all
safe variables remain on the regular stack that is managed by LLVM as usual.
- Invoke the pass as the last stage before code generation (at the same time
the existing cookie-based stack protector pass is invoked).
- Add unit tests for the safe stack.
Original patch by Volodymyr Kuznetsov and others at the Dependable Systems
Lab at EPFL; updates and upstreaming by myself.
Differential Revision: http://reviews.llvm.org/D6094
llvm-svn: 239761
It is valid for globals to be unnamed, but aliases must have a name. To avoid
creating invalid IR, we need to assign names to any aliases we create that
point to unnamed objects that have been moved into combined globals.
llvm-svn: 239590
If the first argument to a function is a 'this' argument and the second
has the sret attribute, the ArgumentPromotion pass may promote the 'this'
argument to more than one argument, violating the IR constraint that 'sret'
may only be applied to the first or second argument.
Although this IR constraint is arguably unnecessary, it highlighted the fact
that ArgPromotion does not need to preserve this attribute. Dropping the
attribute reduces register pressure in the backend by avoiding the register
copy required by sret. Because sret implies noalias, we also replace the
former with the latter.
Differential Revision: http://reviews.llvm.org/D10353
llvm-svn: 239488
O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
llvm-svn: 239480
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
llvm-svn: 239427
We don't want to replace function A by Function B in one module and Function B
by Function A in another module.
If these functions are marked with linkonce_odr we would end up with a function
stub calling B in one module and a function stub calling A in another module. If
the linker decides to pick these two we will have two stubs calling each other.
rdar://21265586
llvm-svn: 239367
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Differential Revision: http://reviews.llvm.org/D10228
llvm-svn: 239003
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
Summary:
In case of functions that have a pointer argument and only pass it to
each other, the function attributes pass deduces that the pointer should
get the readnone attribute, but fails to remove a readonly attribute
that may already have been present.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9995
llvm-svn: 238152
InstructionCombiningPass was added after LoopUnrollPass in r237395. Because
InstructionCombiningPass is strictly more powerful than InstructionSimplifierPass,
remove the unnecessary InstructionSimplifierPass.
Differential Revision: http://reviews.llvm.org/D9838
llvm-svn: 237702
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.
Credit goes to Jingyue Wu for writing an earlier version of this pass.
Patched by Bjarke Roune.
Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.
Reviewers: eliben, dberlin, meheff, jingyue, hfinkel
Reviewed By: jingyue, hfinkel
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9360
llvm-svn: 237459
This is to cleanup some redundency generated by LoopUnroll pass. Such redundency may not be cleaned up by existing passes after LoopUnroll.
Differential Revision: http://reviews.llvm.org/D9777
llvm-svn: 237395
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass. Loop Distribution becomes
the second user of this analysis.
The pass is off by default and can be enabled
with -enable-loop-distribution. There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.
I decided to remove the control-dependence calculation from this first
version. This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately. Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop. This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.
The pass keeps DominatorTree and LoopInfo updated. I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops. SimplifyLoop is violated in some cases and I have a FIXME
covering this.
Reviewers: hfinkel, nadav, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8831
llvm-svn: 237358
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Clang regressions were caused by more stringent assertion checking
introduced by this change. Small fix needed to clang has been committed
in r236751.
llvm-svn: 236752
This makes use of the new API which can remove attributes from a set given a builder.
This is much faster than creating a temporary set and reduces llc time by about 0.3% which was all spent creating temporary attributes sets on the context.
llvm-svn: 236668
COMDAT groups which have become rendered unused because of inline are
discardable if we can prove that we've made the group empty.
This fixes PR22285.
llvm-svn: 236539
Many of the callers already have the pointer type anyway, and for the
couple of callers that don't it's pretty easy to call PointerType::get
on the pointee type and address space.
This avoids LLParser from using PointerType::getElementType when parsing
GlobalAliases from IR.
llvm-svn: 236160
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.
This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.
Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075
llvm-svn: 235611
Stop using `DIDescriptor` and its subclasses in the `DebugInfoFinder`
API, as well as the rest of the API hanging around in `DebugInfo.h`.
llvm-svn: 235240
Change `DICompileUnit::replaceSubprograms()` and
`DICompileUnit::replaceGlobalVariables()` to match the `MDCompileUnit`
equivalents that they're wrapping.
llvm-svn: 234852
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).
llvm-svn: 234840
The only difference between the two is a `dyn_cast<>` to
`GlobalVariable`. If optimizations have left anything behind when a
global gets replaced, then it doesn't seem like the debug info is dead.
I can't seem to find an optimization that would leave behind a
non-`GlobalVariable` without nulling the reference entirely, so I
haven't added a testcase (but I'll be deleting `getGlobal()` in a future
commit).
llvm-svn: 234792
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
llvm-svn: 234601
Remove `DIDescriptor::Verify()` and the `Verify()`s from subclasses.
They had already been gutted, and just did an `isa<>` check.
In a couple of cases I've temporarily dropped the check entirely, but
subsequent commits are going to disallow conversions to the
`DIDescriptor`s directly from `MDNode`, so the checks will come back in
another form soon enough.
llvm-svn: 234201
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.
Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.
llvm-svn: 234042
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
This pushes the use of PointerType::getElementType up into several
callers - I'll essentially just have to keep pushing that up the stack
until I can eliminate every call to it...
llvm-svn: 233604
This re-adds float2int to the tree, after fixing PR23038. It turns
out the argument to APSInt() is true-if-unsigned, rather than
true-if-signed :(. Added testcase and explanatory comment.
llvm-svn: 233370
This caused PR23008, compiles failing with: "Use still stuck around after Def is
destroyed: %.sroa.speculated"
Also reverting follow-up r233064.
llvm-svn: 233105
It is possible to have code that converts from integer to float, performs operations then converts back, and the result is provably the same as if integers were used.
This can come from different sources, but the most obvious is a helper function that uses floats but the arguments given at an inlined callsites are integers.
This pass considers all integers requiring a bitwidth less than or equal to the bitwidth of the mantissa of a floating point type (23 for floats, 52 for doubles) as exactly representable in floating point.
To reduce the risk of harming efficient code, the pass only attempts to perform complete removal of inttofp/fptoint operations, not just move them around.
llvm-svn: 233062
Remove `DebugInfoVerifierLegacyPass` and the `-verify-di` pass.
Instead, call into the `DebugInfoVerifier` from inside
`VerifierLegacyPass::finalizeModule()`. This better matches the logic
in `verifyModule()` (used by the new PassManager), avoids requiring two
separate passes to verify the IR, and makes the API for "add a pass to
verify the IR" simple.
Note: the `-verify-debug-info` flag still works (for now, at least;
eventually it might make sense to just remove it).
llvm-svn: 232772
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
Differential Revision: http://reviews.llvm.org/D8455
llvm-svn: 232770
This change also introduces a link-time optimization level of 1. This
optimization level runs only the globaldce pass as well as cleanup passes for
passes that run at -O0, specifically simplifycfg which cleans up lowerbitsets.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266951.html
llvm-svn: 232769
`StripDebug` was only used by tools/opt/opt.cpp in
`AddStandardLinkPasses()`, but opt.cpp adds the same pass based on its
command-line flag before it calls `AddStandardLinkPasses()`. Stripping
debug info twice isn't very useful.
llvm-svn: 232765
When we encounter a global with a comdat, rather than iterating over
every global in the module to find globals in the same comdat, store the
members in a multimap. This effectively lowers the complexity to O(N log N),
improving performance significantly for large modules such as might be
encountered during LTO.
It looks like we used to do something like this until r219191.
No functional change.
Differential Revision: http://reviews.llvm.org/D8431
llvm-svn: 232743
This involved threading the type-to-gep through a data structure, since
the code was relying on the pointer type to carry this information. I
imagine there will be a lot of this work across the project... slow
work chasing each use case, but the assertions will help keep me honest.
llvm-svn: 232277
Adding nullptr to all the IRBuilder stuff because it's the first thing
that fails to build when testing without the back-compat functions, so
I'll keep having to re-add these locally for each chunk of migration I
do. Might as well check them in to save me the churn. Eventually I'll
have to migrate these too, but I'm going breadth-first.
llvm-svn: 232270
I'm just going to migrate these in a pretty ad-hoc & incremental way -
providing the backwards compatible API for now, then locally removing
it, fixing a few callers, adding it back in and commiting those callers.
Rinse, repeat.
The assertions should ensure that if I get this wrong we'll find out
about it and not just have one giant patch to revert, recommit, revert,
recommit, etc.
llvm-svn: 232240
The linker on that platform may re-order symbols or strip dead symbols, which
will break bit set checks. Avoid this by hiding the symbols from the linker.
llvm-svn: 232235
It's firstly committed at r231630, and reverted at r231635.
Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.
llvm-svn: 232011
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.
llvm-svn: 231630
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
Byte Offset
0123456789ABCDEF
Bit
7 HHHHHHHHHIIIIIII
6 GGGGGGGGGGJJJJJJ
5 FFFFFFFFFFFKKKKK
4 EEEEEEEEEEEELLLL
3 DDDDDDDDDDDDDMMM
2 CCCCCCCCCCCCCCNN
1 BBBBBBBBBBBBBBBO
0 AAAAAAAAAAAAAAAA
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Differential Revision: http://reviews.llvm.org/D7954
llvm-svn: 231043
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.
The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.
Differential Revision: http://reviews.llvm.org/D7873
llvm-svn: 230540
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.
Differential Revision: http://reviews.llvm.org/D7796
llvm-svn: 230394
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.
The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.
Differential Revision: http://reviews.llvm.org/D7288
llvm-svn: 230054
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229202
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.
llvm-svn: 228885
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.
Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman
llvm-svn: 228798
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
llvm-svn: 228782
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.
Previously we were deleting arguments if sub-value 0 turned out to be dead.
llvm-svn: 228731
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.
This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.
llvm-svn: 228559
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.
E.g.
%val = call { i32, i32 } @whatever()
[...]
ret { i32, i32 } %val
The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.
Also unifies the logic slightly between aggregate and non-aggregate
cases..
llvm-svn: 228558
analyses back into the LTO code generator.
The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.
This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.
llvm-svn: 227576
SplitLandingPadPredecessors and remove the Pass argument from its
interface.
Another step to the utilities being usable with both old and new pass
managers.
llvm-svn: 226426
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
The functions {pred,succ,use,user}_{begin,end} exist, but many users
have to check *_begin() with *_end() by hand to determine if the
BasicBlock or User is empty. Fix this with a standard *_empty(),
demonstrating a few usecases.
llvm-svn: 225760
a cache of assumptions for a single function, and an immutable pass that
manages those caches.
The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.
Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.
For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.
llvm-svn: 225131
- by Ella Bolshinsky
The alias analysis is used define whether the given instruction
is a barrier for store sinking. For 2 identical stores, following
instructions are checked in the both basic blocks, to determine
whether they are sinking barriers.
http://reviews.llvm.org/D6420
llvm-svn: 224247
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532. Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.
I have a follow-up patch prepared for `clang`. If this breaks other
sub-projects, I apologize in advance :(. Help me compile it on Darwin
I'll try to fix it. FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.
This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.
Here's a quick guide for updating your code:
- `Metadata` is the root of a class hierarchy with three main classes:
`MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from
the `Value` class hierarchy. It is typeless -- i.e., instances do
*not* have a `Type`.
- `MDNode`'s operands are all `Metadata *` (instead of `Value *`).
- `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.
If you're referring solely to resolved `MDNode`s -- post graph
construction -- just use `MDNode*`.
- `MDNode` (and the rest of `Metadata`) have only limited support for
`replaceAllUsesWith()`.
As long as an `MDNode` is pointing at a forward declaration -- the
result of `MDNode::getTemporary()` -- it maintains a side map of its
uses and can RAUW itself. Once the forward declarations are fully
resolved RAUW support is dropped on the ground. This means that
uniquing collisions on changing operands cause nodes to become
"distinct". (This already happened fairly commonly, whenever an
operand went to null.)
If you're constructing complex (non self-reference) `MDNode` cycles,
you need to call `MDNode::resolveCycles()` on each node (or on a
top-level node that somehow references all of the nodes). Also,
don't do that. Metadata cycles (and the RAUW machinery needed to
construct them) are expensive.
- An `MDNode` can only refer to a `Constant` through a bridge called
`ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).
As a side effect, accessing an operand of an `MDNode` that is known
to be, e.g., `ConstantInt`, takes three steps: first, cast from
`Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
third, cast down to `ConstantInt`.
The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
metadata schema owners transition away from using `Constant`s when
the type isn't important (and they don't care about referring to
`GlobalValue`s).
In the meantime, I've added transitional API to the `mdconst`
namespace that matches semantics with the old code, in order to
avoid adding the error-prone three-step equivalent to every call
site. If your old code was:
MDNode *N = foo();
bar(isa <ConstantInt>(N->getOperand(0)));
baz(cast <ConstantInt>(N->getOperand(1)));
bak(cast_or_null <ConstantInt>(N->getOperand(2)));
bat(dyn_cast <ConstantInt>(N->getOperand(3)));
bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));
you can trivially match its semantics with:
MDNode *N = foo();
bar(mdconst::hasa <ConstantInt>(N->getOperand(0)));
baz(mdconst::extract <ConstantInt>(N->getOperand(1)));
bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2)));
bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3)));
bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));
and when you transition your metadata schema to `MDInt`:
MDNode *N = foo();
bar(isa <MDInt>(N->getOperand(0)));
baz(cast <MDInt>(N->getOperand(1)));
bak(cast_or_null <MDInt>(N->getOperand(2)));
bat(dyn_cast <MDInt>(N->getOperand(3)));
bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));
- A `CallInst` -- specifically, intrinsic instructions -- can refer to
metadata through a bridge called `MetadataAsValue`. This is a
subclass of `Value` where `getType()->isMetadataTy()`.
`MetadataAsValue` is the *only* class that can legally refer to a
`LocalAsMetadata`, which is a bridged form of non-`Constant` values
like `Argument` and `Instruction`. It can also refer to any other
`Metadata` subclass.
(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)
llvm-svn: 223802
Patch by Ben Gamari!
This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute. There are a two primary usecases
that these attributes aim to serve,
1. Function prologue sigils
2. Function hot-patching: Enable the user to insert `nop` operations
at the beginning of the function which can later be safely replaced
with a call to some instrumentation facility
3. Runtime metadata: Allow a compiler to insert data for use by the
runtime during execution. GHC is one example of a compiler that
needs this functionality for its tables-next-to-code functionality.
Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.
Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.
The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.
The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.
References
----------
This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
Test Plan: testsuite
Differential Revision: http://reviews.llvm.org/D6454
llvm-svn: 223189
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
llvm-svn: 220652
Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags.
Reviewers: nadav, aschwaighofer, yijiang
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5884
llvm-svn: 220345
the IR going into it and to clean up the IR produced by the vectorizers.
Note that these are *off by default* right now while folks collect data
on whether the performance tradeoff is reasonable.
In a build of the 'opt' binary, I see about 2% compile time regression
due to this change on average. This is in my mind essentially the worst
expected case: very little of the opt binary is going to *benefit* from
these extra passes.
I've seen several benchmarks improve in performance my small amounts due
to running these passes, and there are certain (rare) cases where these
passes make a huge difference by either enabling the vectorizer at all
or by hoisting runtime checks out of the outer loop. My primary
motivation is to prevent people from seeing runtime check overhead in
benchmarks where the existing passes and optimizers would be able to
eliminate that.
I've chosen the sequence of passes based on the kinds of things that
seem likely to be relevant for the code at each stage: rotaing loops for
the vectorizer, finding correlated values, loop invariants, and
unswitching opportunities from any runtime checks, and cleaning up
commonalities exposed by the SLP vectorizer.
I'll be pinging existing threads where some of these issues have come up
and will start new threads to get folks to benchmark and collect data on
whether this is the right tradeoff or we should do something else.
llvm-svn: 219644
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well. This sort of thing is already handled by GlobalOpt/GlobalDCE.
This fixes PR21206.
llvm-svn: 219335
After some stellar (& inspired) help from Reid Kleckner providing a test
case for some rather unstable undefined behavior showing up as
assertions produced by r214761, I was able to fix this issue in DAE
involving the application of both varargs removal, followed by normal
argument removal.
Indeed I introduced this same bug into ArgumentPromotion (r212128) by
copying the code from DAE, and when I fixed the bug in ArgPromo
(r213805) and commented in that patch that I didn't need to address the
same issue in DAE because it was a single pass. Turns out it's two pass,
one for the varargs and one for the normal arguments, so the same fix is
needed (at least during varargs removal). So here it is.
(the observable/net effect of this bug, even when it didn't result in
assertion failure, is that debug info would describe the DAE'd function
in the abstract, but wouldn't provide high/low_pc, variable locations,
line table, etc (it would appear as though the function had been
entirely optimized away), see the original PR14016 for details of the
general problem)
I'm not recommitting the assertion just yet, as there's been another
regression of it since I last tried. It might just be a few test cases
weren't adequately updated after Adrian or Duncan's recent schema
changes.
llvm-svn: 219210
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 219010
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 218914
With this a DataLayoutPass can be reused for multiple modules.
Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.
Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.
llvm-svn: 217548
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.
The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could). Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).
llvm-svn: 217344
This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.
The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.
There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.
This pass will be used by follow-up commits that make use of @llvm.assume.
llvm-svn: 217334
This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.
llvm-svn: 216866
Don't promote byval pointer arguments when when their size in bits is
not equal to their alloc size in bits. This can happen for x86_fp80,
where the size in bits is 80 but the alloca size in bits in 128.
Promoting these types can break passing unions of x86_fp80s and other
types.
Patch by Thomas Jablin!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D5057
llvm-svn: 216693
Adding, removing, or changing non-pack parameters can change the ABI
classification of pack parameters. Clang and other frontends encode the
classification in the IR of the call site, but the callee side
determines it dynamically based on the number of registers consumed so
far. Changing the prototype affects the number of registers consumed
would break such code.
Dead argument elimination performs a similar task and already has a
similar check to avoid this problem.
Patch by Thomas Jablin!
llvm-svn: 216421
GlobalDCE deletes global vars and updates their initializers to nullptr
while leaving underlying constants to be cleaned up later by its uses.
The clean up may never happen, fix this by forcing it every time it's
safe to destroy constants.
Final patch by Rafael Espindola
http://reviews.llvm.org/D4931
<rdar://problem/17523868>
llvm-svn: 216390
attribute and function argument attribute synthesizing and propagating.
As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.
llvm-svn: 215538
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst. Optimizing these is pretty straightforward.
N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.
llvm-svn: 215184
This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.
llvm-svn: 214963
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
llvm-svn: 214919
Ugh. Turns out not even transformation passes link in how to read IR.
I sincerely believe the buildbots will finally agree with my system
after this though. (I don't really understand why all of this has been
working on my system, but not on all the buildbots.)
Create a new tool called llvm-uselistorder to use for verifying use-list
order. For now, just dump everything from the (now defunct)
-verify-use-list-order pass into the tool.
This might be a better way to test use-list order anyway.
Part of PR5680.
llvm-svn: 213957
The dragonegg buildbot (and others?) started failing after
r213945/r213946 because `llvm-as` wasn't linking in the bitcode reader.
I think moving the verify functions to the same file as the verify pass
should fix the build. Adding a command-line option for maintaining
use-list order in assembly as a drive-by to prevent warnings about
unused static functions.
llvm-svn: 213947
Add a -verify-use-list-order pass, which shuffles use-list order, writes
to bitcode, reads back, and verifies that the (shuffled) order matches.
- The utility functions live in lib/IR/UseListOrder.cpp.
- Moved (and renamed) the command-line option to enable writing
use-lists, so that this pass can return early if the use-list orders
aren't being serialized.
It's not clear that this pass is the right direction long-term (perhaps
a separate tool instead?), but short-term it's a great way to test the
use-list order prototype. I've added an XFAIL-ed testcase that I'm
hoping to get working pretty quickly.
This is part of PR5680.
llvm-svn: 213945
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
llvm-svn: 213864
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).
This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.
No functionality change intended.
llvm-svn: 213859
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.
To address this, ensure that the map is updated after each argument
promotion.
In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.
llvm-svn: 213805
Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended.
Test Plan: All tests (make check-all) still pass.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4481
llvm-svn: 213474
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.
llvm-svn: 213396
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.
In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).
llvm-svn: 212686
This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c.
A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed.
llvm-svn: 212550
This is useful for functions that are not actually available externally but
referenced by a vtable of some kind. Clang emits functions like this for the MS
ABI.
PR20182.
llvm-svn: 212337
Exposes more constant globals that can be removed by
the global optimizer. A specific example is the removal
of the static global block address array in
clang/test/CodeGen/indirect-goto.c. This change impacts only
lower optimization levels. With LTO interprocedural
const prop runs already before global opt.
llvm-svn: 212284
Matching behavior with DeadArgumentElimination (and leveraging some
now-common infrastructure), keep track of the function from debug info
metadata if arguments are promoted.
This may produce interesting debug info - since the arguments may be
missing or of different types... but at least backtraces, inlining, etc,
will be correct.
llvm-svn: 212128
There were transforms whose *intent* was to downgrade the linkage of
external objects to have internal linkage.
However, it fired on things with private linkage as well.
llvm-svn: 212104
This new IR facility allows us to represent the object-file semantic of
a COMDAT group.
COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.
This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.
Differential Revision: http://reviews.llvm.org/D4178
llvm-svn: 211920
Folding a reference to a thread_local variable into another global
variable's initializer is very problematic, there is no relocation that
exists to represent such an access.
llvm-svn: 211762
Referencing a dllimport variable requires actually instructions, not
just a relocation. This fixes PR19955.
Differential Revision: http://reviews.llvm.org/D4249
llvm-svn: 211571
Patch removes rest part of code related to old implementation.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
This one was the final patch.
llvm-svn: 211457
Added short description for new comparison algorithm, that introduces
total ordering among functions set.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211456
Patch activates new implementation.
So from now, merging process should take time O(N*log(N)).
Where N size of module (we are free to measure it in
functions or in instructions). Internally FnTree represents
binary tree. So every lookup operation takes O(log(N)) time.
It is still not the last patch in series, we also have to
clean-up pass from old code, and update pass comments.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211445
Patch removed next old FunctionComparator methods:
* enumerate
* isEquivalentOperation
* isEquivalentGEP
* isEquivalentType
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211444
introduced among functions set.
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211442
methods.
Patch changes return type of FunctionComparator::compare() and
FunctionComparator::compare(const BasicBlock*, const BasicBlock*)
methods from bool (equal or not) to {-1, 0, 1} (less, equal, great).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 211437
Summary:
Different range metadata can lead to different optimizations in later
passes, possibly breaking the semantics of the merged function. So range
metadata must be taken into consideration when comparing Load
instructions.
Thanks!
llvm-svn: 211391
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
llvm-svn: 210903
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.
This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.
llvm-svn: 210280
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
llvm-svn: 209595