The initialize function has an early return for AMDGPU targets. If taken,
the ShouldExtI32* initialization code will not be executed, resulting in
invalid values in the corresponding fields. Fix this by moving the code
to the top of the function.
llvm-svn: 287570
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.
This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.
Differential Revision: https://reviews.llvm.org/D26803
llvm-svn: 287545
On some architectures (s390x, ppc64, sparc64, mips), C-level int is passed
as i32 signext instead of plain i32. Likewise, unsigned int may be passed
as i32, i32 signext, or i32 zeroext depending on the platform. Add this
information to TargetLibraryInfo, to be used whenever some LLVM pass
inserts a compiler-rt call to a function involving int parameters
or returns.
Differential Revision: http://reviews.llvm.org/D21739
llvm-svn: 287533
Summary:
CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time.
Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter.
Reviewers: sanjoy
Subscribers: mzolotukhin, tstellarAMD, llvm-commits
Differential Revision: https://reviews.llvm.org/D26389
llvm-svn: 287232
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
llvm-svn: 287206
This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed.
Differential Revision: https://reviews.llvm.org/D26177
llvm-svn: 286893
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).
Original description of this part of patch follows:
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
llvm-svn: 286844
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.
Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.
We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26402
llvm-svn: 286840
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.
However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:
int g() {
h();
}
int f() {
g();
}
llvm-svn: 286814
All existing callers were manually extracting information out of an existing
GEP instruction and passing it to getGEPExpr(). Simplify the interface by
changing it to take a GEPOperator instead.
llvm-svn: 286751
If the inrange keyword is present before any index, loading from or
storing to any pointer derived from the getelementptr has undefined
behavior if the load or store would access memory outside of the bounds of
the element selected by the index marked as inrange.
This can be used, e.g. for alias analysis or to split globals at element
boundaries where beneficial.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html
Differential Revision: https://reviews.llvm.org/D22793
llvm-svn: 286514
The r283656 did this in the remark arguments. We also need to do this
in the main function attribute as that is written to YAML as well.
llvm-svn: 286482
This restores the part of r286297 that didn't require adding a
dependency from the Analysis to Object library. There are two parts
to the original fix, and this will address the handling for the case
where locals are used in module level asm.
The part that requires functionality in libObject handles local defs
in module level asm, and was reverted because our downstream build
of clang builds lib/Bitcode into a single library, and this new
dependency introduced a cycle there. I am trying to get that fixed
(see D26502), so for now that change isn't being restored
llvm-svn: 286475
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
define <4 x i32> @umax_vec(<4 x i32> %x) {
%cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
%sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
ret <4 x i32> %sel
}
$ ./opt umax.ll -S | ./llc -o - -mattr=avx
vpmaxud LCPI0_0(%rip), %xmm0, %xmm0
$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0
Differential Revision: https://reviews.llvm.org/D26096
llvm-svn: 286318
Summary:
This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.
All static global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: johanengelen, krasin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26146
llvm-svn: 286297
Summary:
We've had support for auto upgrading old style scalar TBAA access
metadata tags into the "new" struct path aware TBAA metadata for 3 years
now. The only way to actually generate old style TBAA was explicitly
through the IRBuilder API. I think this is a good time for dropping
support for old style scalar TBAA.
I'm not removing support for textual or bitcode upgrade -- if you have
IR with the old style scalar TBAA tags that go through the AsmParser orf
the bitcode parser before LLVM sees them, they will keep working as
usual.
Note:
%val = load i32, i32* %ptr, !tbaa !N
!N = < scalar tbaa node >
is equivalent to
%val = load i32, i32* %ptr, !tbaa !M
!N = < scalar tbaa node >
!M = !{!N, !N, 0}
Reviewers: manmanren, chandlerc, sunfish
Subscribers: mcrosier, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D26229
llvm-svn: 286291
This additional information can be used to improve the locations when generating remarks for loops.
Patch by Florian Hahn.
Differential Revision: https://reviews.llvm.org/D25763
llvm-svn: 286227
With this we get a new field in the YAML record if the value being
streamed out has a debug location. For examples, please see the changes
to the tests.
This is then used in opt-viewer to display a link for the callee
function in the inlining remarks.
Differential Revision: https://reviews.llvm.org/D26366
llvm-svn: 286169
InstCombine should always canonicalize patterns like the one shown in the comment
when visiting 'select' insts in adjustMinMax().
Scalars were already handled there, and vector splats are handled after:
https://reviews.llvm.org/rL285732
llvm-svn: 285744
This is intended to make the semantic intent clearer.
The wrapper objects are now generic to avoid `const_cast` s. Since
`const` ness is part of the API of `MDNode::getMostGenericTBAA` (and
therefore I can't make things `const` all the way through without some
code churn outside TypeBasedAliasAnalysis.cpp), this seemed like the
cleanest solution.
llvm-svn: 285665
Summary:
Instead of using the workaround of suppressing the entire index for
modules that call inline asm that may reference locals, use the
NoRename flag on the summary for any locals in the llvm.used set, and
add a reference edge from any functions containing inline asm.
This avoids issues from having no summaries despite the module defining
global values, which was preventing more aggressive index-based
optimization. It will be followed by a subsequent patch to make a
similar fix for local references in module level asm (to fix PR30610).
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26121
llvm-svn: 285513
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202.
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.
The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.
Differential Revision: https://reviews.llvm.org/D26091
llvm-svn: 285499
Summary:
We were trying to add APInt values with different bit sizes after
visiting an addrspacecast instruction which changed the bit width
of the pointer.
Reviewers: majnemer, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24774
llvm-svn: 285407
Now LPPassManager will run LCSSA verification only for the top-level loop
which was processed on the current iteration.
Differential Revision: https://reviews.llvm.org/D25873
llvm-svn: 285394
Summary:
Previously we were creating the alias summary on the fly while writing
the summary to bitcode. This moves the creation of these summaries to
the module summary index builder where we build the rest of the summary
index.
This is going to be necessary for setting the NoRename flag for values
possibly used in inline asm or module level asm.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26049
llvm-svn: 285379
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.
This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.
Patch by Arthur Silva!
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25941
llvm-svn: 285228
This reverts commit r285191.
LICM appears to rely on the Alias Set Tracker hitting lifetime markers to prevent
code from being moved outside of the original scope.
llvm-svn: 285227
There are two fixes here: one, AnalyzeUsesOfPointer can't return
false until it has checked all the uses of the pointer. Two, if a
global uses another global, we have to assume the address of the
first global escapes.
Fixes https://llvm.org/bugs/show_bug.cgi?id=30707 .
Differential Revision: https://reviews.llvm.org/D25798
llvm-svn: 285034
In BasicAA GEP operand values get adjusted ("wrap-around") based on the
pointersize. Otherwise, in non-64b modes, AA could report false negatives.
However, a wrap-around is valid only for a fully evaluated expression.
It had been introduced to fix an alias problem in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160118/326163.html.
This commit restricts the wrap-around to constant gep operands only where the
value is known at compile-time.
llvm-svn: 284908
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
This is to avoid inlining too many multiplication operands into a SCEV, which could
take exponential time in the worst case.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Differential Revision: https://reviews.llvm.org/D25794
llvm-svn: 284784
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
0 - X --> X, if X is 0 or the minimum signed value
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
I noticed this pattern might be created in the backend after the change from D25485,
so we'll want to add a similar fold for the DAG.
The use of computeKnownBits in InstSimplify may be something to investigate if the
compile time of InstSimplify is noticeable. We could replace computeKnownBits with
specific pattern matchers or limit the recursion.
Differential Revision: https://reviews.llvm.org/D25785
llvm-svn: 284649
In loops that look something like
i = n;
do {
...
} while(i++ < n+k);
where k is a constant, the maximum backedge count is k (in fact the backedge
count will be either 0 or k, depending on whether n+k wraps). More generally
for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will
iterate either 0 or that constant number of times.
This allows for more loop unrolling with the recent upper bound loop unrolling
changes, and I'm working on a patch that will let loop unrolling additionally
make use of the loop being executed either 0 or k times (we need to retain the
loop comparison only on the first unrolled iteration).
Differential Revision: https://reviews.llvm.org/D25607
llvm-svn: 284465
Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor.
Reviewers: sanjoy, grosser
Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser
Differential Revision: https://reviews.llvm.org/D16492
llvm-svn: 284378
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25293
llvm-svn: 284061
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.
Patch by Li Huang
Differential Revision: https://reviews.llvm.org/D18777
llvm-svn: 284022
The basic inlining operation makes the following changes to the call graph:
1) Add edges that were previously transitive edges. This is always trivial and
this patch gives the LCG helper methods to make this more convenient.
2) Remove the inlined edge. We had existing support for this, but it contained
bugs that needed to be fixed. Testing in the same pattern as the inliner
exposes these bugs very nicely.
3) Delete a function when it becomes dead because it is internal and all calls
have been inlined. The LCG had no support at all for this operation, so this
adds that support.
Two unittests have been added that exercise this specific mutation pattern to
the call graph. They were extremely effective in uncovering bugs. Sadly,
a large fraction of the code here is just to implement those unit tests, but
I think they're paying for themselves. =]
This was split out of a patch that actually uses the routines to
implement inlining in the new pass manager in order to isolate (with
unit tests) the logic that was entirely within the LCG.
Many thanks for the careful review from folks! There will be a few minor
follow-up patches based on the comments in the review as well.
Differential Revision: https://reviews.llvm.org/D24225
llvm-svn: 283982
For each block check that it doesn't have any uses outside of it's innermost loop.
Differential Revision: https://reviews.llvm.org/D25364
llvm-svn: 283877
Summary: This patch sets function as hot if function's entry count is hot/cold.
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25048
llvm-svn: 283852
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
Summary:
When there is a call to an alias in the same module, we were not
adding a call edge. So we could incorrectly think that the alias
was dead if it was inlined in that function, despite having a
reference imported elsewhere. This resulted in unsats at link time.
Add a call edge when the call is to an alias.
Reviewers: davide, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25384
llvm-svn: 283664
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Reviewers: reames, sanjoy, apilipenko
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24772
llvm-svn: 283557
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
Summary:
The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement.
Reviewers: majnemer, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24955
llvm-svn: 283434
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
llvm-svn: 283236
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.
Differential Revision: https://reviews.llvm.org/D24104
llvm-svn: 283165
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D24727
llvm-svn: 283099
This was first landed in rL283058 and subsequenlty reverted since a
change this depends on (rL283057) was buggy and had to be reverted.
llvm-svn: 283079
They've broken the sanitizer-bootstrap bots. Reverting while I investigate.
Original commit messages:
r283057: "[ConstantRange] Make getEquivalentICmp smarter"
r283058: "[SCEV] Rely on ConstantRange instead of custom logic; NFCI"
llvm-svn: 283062
(Recommit after making sure IsVerbose gets properly initialized in
DiagnosticInfoOptimizationBase. See previous commit that takes care of
this.)
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.
Then we need the magic to be able to turn an LAA remark into an LV
remark. This is done via a new OptimizationRemark ctor.
llvm-svn: 282813
OptimizationRemarkAnalysis directly takes the role of the report that is
generated by LAA.
Then we need the magic to be able to turn an LAA remark into an LV
remark. This is done via a new OptimizationRemark ctor.
llvm-svn: 282758
Summary:
When using llc with -compile-twice, module is generated twice, but getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI will still get the old PSI with the original (invalidated) Module. This patch checks if the module has changed when calling getPSI, if yes, update the module and invalidate the Summary.
The bug does not show up in the current llc because PSI is not used in CodeGen yet. But with https://reviews.llvm.org/D24989, the bug will be exposed by test/CodeGen/PowerPC/pr26378.ll
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24993
llvm-svn: 282616
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
llvm-svn: 282612
Summary:
Instead of creating and destroying SCEVUnionPredicate instances (which
internally creates and destroys a DenseMap), use temporary SmallPtrSet
instances of remember the set of predicates that will get reified into a
SCEVUnionPredicate.
Reviewers: silviu.baranga, sbaranga
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25000
llvm-svn: 282606
Ever since LAA was split out into an analysis on its own, this function
stopped emitting the report directly. Instead it stores it to be
retrieved by the client which can then emit it as its own report
(e.g. -Rpass-analysis=loop-vectorize).
llvm-svn: 282561
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
I don't expect `PendingLoopPredicates` to have very many
elements (e.g. when -O3'ing the sqlite3 amalgamation,
`PendingLoopPredicates` has at most 3 elements). So now we use a
`SmallPtrSet` for it instead of the more heavyweight `DenseSet`.
llvm-svn: 282511
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
In a previous change I collapsed two different caches into one. When
doing that I noticed that ScalarEvolution's move constructor was not
moving those caches.
To keep the previous change simple, I've moved that bugfix into this
separate change.
llvm-svn: 282376
Both `loopHasNoSideEffects` and `loopHasNoAbnormalExits` involve walking
the loop and maintaining similar sorts of caches. This commit changes
SCEV to compute both the predicates via a single walk, and maintain a
single cache instead of two.
llvm-svn: 282375
This change simplifies a data structure optimization in the
`BackedgeTakenInfo` class for loops with exactly one computable exit.
I've sanity checked that this does not regress compile time performance,
using sqlite3's amalgamated build.
llvm-svn: 282365
There is no benefit in looking through assumptions on UndefValue to
guess known bits. Return early to avoid walking their use-lists, and
assert that all instances of ConstantData are handled here for similar
reasons (UndefValue was the only integer/pointer holdout).
llvm-svn: 282337
Check and return early for ConstantPointerNull and UndefValue
specifically in isKnownNonNullAt, and assert that ConstantData never
make it to isKnownNonNullFromDominatingCondition.
This confirms that isKnownNonNullFromDominatingCondition never walks
through the use-list of an instance of ConstantData. Given that such
use-lists cross module boundaries, it never really made sense to do so,
and was potentially very expensive.
llvm-svn: 282333
Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst.
Reviewers: mcrosier, hfinkel, davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D24868
llvm-svn: 282262
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.
I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.
This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942
Differential Revision: https://reviews.llvm.org/D24677
llvm-svn: 281777
Enhance SCEV to compute the trip count for some loops with unknown stride.
Patch by Pankaj Chawla
Differential Revision: https://reviews.llvm.org/D22377
llvm-svn: 281732
LazyCallGraph to support repeated, stable iterations, even in the face
of graph updates.
This is particularly important to allow the CGSCC pass manager to walk
the RefSCCs (and thus everything else) in a module more than once. Lots
of unittests and other tests were hard or impossible to write because
repeated CGSCC pass managers which didn't invalidate the LazyCallGraph
would conclude the module was empty after the first one. =[ Really,
really bad.
The interesting thing is that in many ways this simplifies the code. We
can now re-use the same code for handling reference edge insertion
updates of the RefSCC graph as we use for handling call edge insertion
updates of the SCC graph. Outside of adapting to the shared logic for
this (which isn't trivial, but is *much* simpler than the DFS it
replaces!), the new code involves putting newly created RefSCCs when
deleting a reference edge into the cached list in the correct way, and
to re-formulate the iterator to be stable and effective even in the face
of these kinds of updates.
I've updated the unittests for the LazyCallGraph to re-iterate the
postorder sequence and verify that this all works. We even check for
using alternating iterators to trigger the lazy formation of RefSCCs
after mutation has occured.
It's worth noting that there are a reasonable number of likely
simplifications we can make past this. It isn't clear that we need to
keep the "LeafRefSCCs" around any more. But I've not removed that mostly
because I want this to be a more isolated change.
Differential Revision: https://reviews.llvm.org/D24219
llvm-svn: 281716
The patch is to partially fix PR10584. Correlated Value Propagation queries LVI
to check non-null for pointer params of each callsite. If we know the def of
param is an alloca instruction, we know it is non-null and can return early from
LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is
constant. If the def of the pointer is an alloca instruction, we know it is not
a constant pointer. These shortcuts can reduce the cost of CVP significantly.
Differential Revision: https://reviews.llvm.org/D18066
llvm-svn: 281586
value is a pointer.
This patch is to fix PR30213. When expanding an expr based on ValueOffsetPair,
if the value is of pointer type, we can only create a getelementptr instead
of sub expr.
Differential Revision: https://reviews.llvm.org/D24088
llvm-svn: 281439
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.
Example:
%cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>
On a little endian target, %cast could have been folded to:
<4 x i32><i32 undef, i32 undef, i32 2, i32 0>
This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.
Differential Revision: https://reviews.llvm.org/D24301
llvm-svn: 281343
Convert the previous introduced is-a relationship between the LVICache and LVIImple clases into a has-a relationship and hide all the implementation details of the cache from the lazy query layer.
The only slightly concerning change here is removing the addition of a queried block into the SeenBlock set in LVIImpl::getBlockValue. As far as I can tell, this was effectively dead code. I think it *used* to be the case that getCachedValueInfo wasn't const and might end up inserting elements in the cache during lookup. That's no longer true and hasn't been for a while. I did fixup the const usage to make that more obvious.
llvm-svn: 281272
Seperate the caching logic from the implementation of the lazy analysis. For the moment, the lazy analysis impl has a is-a relationship with the cache; this will change to a has-a relationship shortly. This was done as two steps merely to keep the changes simple and the diff understandable.
llvm-svn: 281266
Summary:
This will let e.g. the load/store vectorizer propagate this metadata
appropriately.
Reviewers: arsenm
Subscribers: tra, jholewinski, hfinkel, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23479
llvm-svn: 281153
make_scope_exit now that we have that utility.
This makes the code much more clear and readable by isolating the check.
It also makes it easy to go through and make sure all the interesting
update routines have a start and end verify so we don't slowly let the
graph drift into an invalid state.
llvm-svn: 280619
a postorder-sequence based update after edge insertion into a generic
helper function.
This separates the SCC-specific logic into two fairly simple lambdas and
extracts the rest into a generic helper template function. I think this
is a net win on its own merits because it disentangles different pieces
of the algorithm. Now there is one place that does the two-step
partition to identify a set of newly connected components and at the
same time update the postorder sequence.
However, I'm also hoping to re-use this an upcoming patch to update
a cached post-order sequence of RefSCCs when doing the analogous update
to the RefSCC graph, and I don't want to have two copies.
The diff is quite messy but this really is just moving things around and
making types generic rather than specific.
llvm-svn: 280618
We don't need to call `GetCompareTy(LHS)' every single time true or false is
returned from function SimplifyFCmpInst as suggested by Sanjay in review D24142.
llvm-svn: 280491
This patch fixes a crash caused by an incorrect folding of an ordered comparison
between a packed floating point vector and a splat vector of NaN.
An ordered comparison between a vector and a constant vector of NaN, should
always be folded into a constant vector where each element is i1 false.
Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar
'false'. Later on, this would cause an assertion failure, since the value type
of the folded value doesn't match the expected value type of the uses of the
original instruction: "Assertion failed: New->getType() == getType() &&
"replaceAllUses of value with new value of different type!".
This patch fixes the issue and adds a test case to the already existing test
InstSimplify/floating-point-compares.ll.
Differential Revision: https://reviews.llvm.org/D24143
llvm-svn: 280488
Summary:
Current implementation of LI verifier isn't ideal and fails to detect
some cases when LI is incorrect. For instance, it checks that all
recorded loops are in a correct form, but it has no way to check if
there are no more other (unrecorded in LI) loops in the function. This
patch adds a way to detect such bugs.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits, silvas, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23437
llvm-svn: 280280
There were paths where we wouldn't populate the visited set, causing us
to recurse forever if an SSA variable was defined in terms of itself.
This fixes PR30210.
llvm-svn: 280191
Or they were not instantiated as expected;
llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID
llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID
llvm-svn: 280105
Summary:
Changed this code because it was not very readable.
The one question that I got after changing it is, should we
count calls to intrinsics? We don't add them to caller summary,
so maybe we shouldn't also count them?
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23949
llvm-svn: 280036
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".
Differential revision: https://reviews.llvm.org/D23176
llvm-svn: 279930
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.
Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered. I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off. As before without -Rpass-with-hotness
there is no overhead.
Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'. As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.
Reviewers: eraman, chandlerc, davidxl, hfinkel
Subscribers: tejohnson, Prazek, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D23415
llvm-svn: 279860
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.
Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl
Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits
Differential Revision: https://reviews.llvm.org/D15537
llvm-svn: 279833
This patch changes LLVM_CONSTEXPR variable declarations to const
variable declarations, since LLVM_CONSTEXPR expands to nothing if the
current compiler doesn't support constexpr. In all of the changed
cases, it looks like the code intended the variable to be const instead
of sometimes-constexpr sometimes-not.
llvm-svn: 279696
manager, including both plumbing and logic to handle function pass
updates.
There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.
I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.
The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.
I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.
The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:
- We operate at three levels within the infrastructure: RefSCC, SCC, and
Node. In each case, we are working bottom up and so we want to
continue to iterate on the "lowest" node as the graph changes. Look at
how we iterate over nodes in an SCC running function passes as those
function passes mutate the CG. We continue to iterate on the "lowest"
SCC, which is the one that continues to contain the function just
processed.
- The call graph structure re-uses SCCs (and RefSCCs) during mutation
events for the *highest* entry in the resulting new subgraph, not the
lowest. This means that it is necessary to continually update the
current SCC or RefSCC as it shifts. This is really surprising and
subtle, and took a long time for me to work out. I actually tried
changing the call graph to provide the opposite behavior, and it
breaks *EVERYTHING*. The graph update algorithms are really deeply
tied to this particualr pattern.
- When SCCs or RefSCCs are split apart and refined and we continually
re-pin our processing to the bottom one in the subgraph, we need to
enqueue the newly formed SCCs and RefSCCs for subsequent processing.
Queuing them presents a few challenges:
1) SCCs and RefSCCs use wildly different iteration strategies at
a high level. We end up needing to converge them on worklist
approaches that can be extended in order to be able to handle the
mutations.
2) The order of the enqueuing need to remain bottom-up post-order so
that we don't get surprising order of visitation for things like
the inliner.
3) We need the worklists to have set semantics so we don't duplicate
things endlessly. We don't need a *persistent* set though because
we always keep processing the bottom node!!!! This is super, super
surprising to me and took a long time to convince myself this is
correct, but I'm pretty sure it is... Once we sink down to the
bottom node, we can't re-split out the same node in any way, and
the postorder of the current queue is fixed and unchanging.
4) We need to make sure that the "current" SCC or RefSCC actually gets
enqueued here such that we re-visit it because we continue
processing a *new*, *bottom* SCC/RefSCC.
- We also need the ability to *skip* SCCs and RefSCCs that get merged
into a larger component. We even need the ability to skip *nodes* from
an SCC that are no longer part of that SCC.
This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.
We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.
Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:
- It is really nice to do this a function at a time because that
function is likely hot in the cache. This means we want even the
function pass adaptor to support online updates to the call graph!
- To update the call graph after arbitrary function pass mutations is
quite hard. We have to build a fairly comprehensive set of
data structures and then process them. Fortunately, some of this code
is related to the code for building the cal graph in the first place.
Unfortunately, very little of it makes any sense to share because the
nature of what we're doing is so very different. I've factored out the
one part that made sense at least.
- We need to transfer these updates into the various structures for the
CGSCC pass manager. Once those were more sanely worked out, this
became relatively easier. But some of those needs necessitated changes
to the LazyCallGraph interface to make it significantly easier to
extract the changed SCCs from an update operation.
- We also need to update the CGSCC analysis manager as the shape of the
graph changes. When an SCC is merged away we need to clear analyses
associated with it from the analysis manager which we didn't have
support for in the analysis manager infrsatructure. New SCCs are easy!
But then we have the case that the original SCC has its shape changed
but remains in the call graph. There we need to *invalidate* the
analyses associated with it.
- We also need to invalidate analyses after we *finish* processing an
SCC. But the analyses we need to invalidate here are *only those for
the newly updated SCC*!!! Because we only continue processing the
bottom SCC, if we split SCCs apart the original one gets invalidated
once when its shape changes and is not processed farther so its
analyses will be correct. It is the bottom SCC which continues being
processed and needs to have the "normal" invalidation done based on
the preserved analyses set.
All of this is mostly background and context for the changes here.
Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.
Differential Revision: http://reviews.llvm.org/D21464
llvm-svn: 279618
And add a FIXME because the helper excludes folds for vectors. It's
not clear yet how many of these are actually testable (and therefore
necessary?) because later analysis uses computeKnownBits and other
methods to catch many of these cases.
llvm-svn: 279492
This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.
See https://reviews.llvm.org/D18777 for details.
llvm-svn: 279433
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.
Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705
llvm-svn: 279326
Repeated inserts into AliasSetTracker have quadratic behavior - inserting a
pointer into AST is linear, since it requires walking over all "may" alias
sets and running an alias check vs. every pointer in the set.
We can avoid this by tracking the total number of pointers in "may" sets,
and when that number exceeds a threshold, declare the tracker "saturated".
This lumps all pointers into a single "may" set that aliases every other
pointer.
(This is a stop-gap solution until we migrate to MemorySSA)
This fixes PR28832.
Differential Revision: https://reviews.llvm.org/D23432
llvm-svn: 279274
directly produce the index as the value type result.
This requires making the index movable which is straightforward. It
greatly simplifies things by allowing us to completely avoid the builder
API and the layers of abstraction inherent there. Instead both pass
managers can directly construct these when run by value. They still
won't be constructed truly eagerly thanks to the optional in the legacy
PM. The code that directly builds the index can also just share a direct
function.
A notable change here is that the result type of the analysis for the
new PM is no longer a reference type. This was really problematic when
making changes to how we handle result types to make our interface
requirements *much* more strict and precise. But I think this is an
overall improvement.
Differential Revision: https://reviews.llvm.org/D23701
llvm-svn: 279216
number of assume intrinsics.
The classical way to have a cache-friendly vector style container when
we need queue semantics for BFS instead of stack semantics for DFS is to
use an ever-growing vector and an index. Erasing from the front requires
O(size) work, and unless we expect the worklist to grow *very* large,
its probably cheaper to just grow and race down the list.
But that makes it more bad that we're putting the assume intrinsics in
this at all. We end up looking at the (by definition empty) use list to
see if they're ephemeral (when we've already put them in that set), etc.
Instead, directly populate the worklist with the operands when we mark
the assume intrinsics as ephemeral. Also, test the visited set *before*
putting things into the worklist so we don't accumulate the same value
in the list 100s of times.
It would be nice to use a set-vector for this but I think its useful to
test the set earlier to avoid repeatedly querying whether the same
instruction is safe to speculate.
Hopefully with these changes the number of values pushed onto the
worklist is smaller, and we avoid quadratic work by letting it grow as
necessary.
Differential Revision: https://reviews.llvm.org/D23396
llvm-svn: 279099
Summary:
This is part of the "NodeType* -> NodeRef" migration. Notice that since
GraphWriter prints object address as identity, I added a static_assert on
NodeRef to be a pointer type.
Reviewers: dblaikie
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D23580
llvm-svn: 278966
Refactored so that a LSRUse owns its fixups, as oppsed to letting the
LSRInstance own them. This makes it easier to rate formulas for
LSRUses, since the fixups are available directly. The Offsets vector
has been removed since it was no longer necessary.
New target hook isFoldableMemAccessOffset(), which is used during formula
rating.
For SystemZ, this is useful to express that loads and stores with
float or vector types with a big/negative offset should be avoided in
loops. Without this, LSR will generate a lot of negative offsets that
would require extra instructions for loading the address.
Updated tests:
test/CodeGen/SystemZ/loop-01.ll
Reviewed by: Quentin Colombet and Ulrich Weigand.
https://reviews.llvm.org/D19152
llvm-svn: 278927
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
When there's only one argument and it doesn't match one of the known
functions, return ARCInstKind::CallOrUser rather than falling through
to the two argument case. The old behaviour both incremented past and
dereferenced end().
llvm-svn: 278881
We are trying to prove that one group of operands is a subset of
another. We did this by populating two Sets and determining that every
element within one was inside the other.
However, this is unnecessary. We can simply construct a single set and
test if each operand is within it.
llvm-svn: 278641
Recursive calls to aliasCheck from alias[GEP|Select|PHI] may result in a second call to GetUnderlyingObject for a Value, whose underlying object is already computed. This patch ensures that in this situations, the underlying object is not computed again, and the result of the previous call is resued.
https://reviews.llvm.org/D22305
llvm-svn: 278519
Rewrite Visited[Cond] = getValueFromConditionImpl(..., Visited) statement which can lead to a memory corruption since getValueFromConditionImpl changes Visited map and invalidates the iterators.
llvm-svn: 278514
Summary:
Port the ModuleSummaryAnalysisWrapperPass to the new pass manager.
Use it in the ported BitcodeWriterPass (similar to how we use the
legacy ModuleSummaryAnalysisWrapperPass in the legacy WriteBitcodePass).
Also, pass the -module-summary opt flag through to the new pass
manager pipeline and through to the bitcode writer pass, and add
a test that uses it.
Reviewers: mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23439
llvm-svn: 278508
Take range metadata into account for conditions like this:
%length = load i32, i32* %length_ptr, !range !{i32 0, i32 2147483647}
%cmp = icmp ult i32 %a, %length
This is a common pattern for range checks where the length of the array is dynamically loaded.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23267
llvm-svn: 278496
Currently LVI can only gather value constraints from comparisons like:
* icmp <pred> Val, ...
* icmp ult (add Val, Offset), ...
In fact we can handle any predicate in latter comparisons.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23357
llvm-svn: 278493
This method had some duplicate code when we did or did not have a dom tree. Refactor
it to remove the duplication, but also clean up the control flow to have less duplication.
llvm-svn: 278450
There were 2 versions of this method. A public one which takes a
const Instruction* and a private implementation which takes a mutable
Value* and casts to an Instruction*.
There was no need for the 2 versions as all callers pass a const Instruction*
and there was no need for a mutable pointer as we only do analysis here.
llvm-svn: 278434
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
Summary:
I think it is much better this way.
When I firstly saw line:
Cost += InlineConstants::LastCallToStaticBonus;
I though that this is a bug, because everywhere where the cost is being reduced
it is usuing -=.
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23222
llvm-svn: 278290
Teach LVI how to gather information from conditions in the form of (cond1 && cond2). Our out-of-tree front-end emits range checks in this form.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23200
llvm-svn: 278231
Instead of returning bool and setting LVILatticeValue reference argument return LVILattice value. Use overdefined value to denote the case when we didn't gather any information from the condition.
This change was separated from the review "[LVI] Handle conditions in the form of (cond1 && cond2)" (https://reviews.llvm.org/D23200#inline-199531). Once getValueFromCondition returns LVILatticeValue we can cache the result in Visited map.
llvm-svn: 278224
The problem was triggered by my recent change in CVP (D23059). Current code expected that integer constants are represented by constantrange LVILatticeVal and never represented as LVILatticeVal with constant tag. That is true for ConstantInt constants, although ConstantExpr integer type constants are legally represented as constant LVILatticeVal.
This code fails with CVP change in:
@b = global i32 0, align 4
define void @test6(i32 %a) {
bb:
%add = add i32 %a, ptrtoint (i32* @b to i32)
ret void
}
Currently getConstantRange code is not executed by any of the upstream passes. I'm going to add a test case to test/Transforms/CorrelatedValuePropagation/add.ll once I resubmit the CVP change.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23194
llvm-svn: 278217
This adds an InlineParams struct which is populated from the command line options by getInlineParams and passed to getInlineCost for the call analyzer to use.
Differential revision: https://reviews.llvm.org/D22120
llvm-svn: 278189
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.
BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level). Thus there is no additional overhead without the
flag.
Reviewers: hfinkel, davidxl, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22694
llvm-svn: 278185
The patch is to fix the bug in PR28705. It was caused by setting wrong return
value for SCEVExpander::findExistingExpansion. The return values of findExistingExpansion
have different meanings when the function is used in different ways so it is easy to make
mistake. The fix creates two new interfaces to replace SCEVExpander::findExistingExpansion,
and specifies where each interface is expected to be used.
Differential Revision: https://reviews.llvm.org/D22942
llvm-svn: 278161
The fix for PR28705 will be committed consecutively.
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 278160
Summary:
We teach alias analysis that invariant.start is readonly.
This helps with GVN and memcopy optimizations that currently treat.
invariant.start as a clobber.
We need to treat this as readonly, so that DSE does not incorrectly
remove stores prior to the invariant.start
Reviewers: sanjoy, reames, majnemer, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23214
llvm-svn: 278138
One exception here is LoopInfo which must forward-declare it (because
the typedef is in LoopPassManager.h which depends on LoopInfo).
Also, some includes for LoopPassManager.h were needed since that file
provides the typedef.
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278079
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278078
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.
Thanks to David for the suggestion.
llvm-svn: 278077
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.
llvm-svn: 278053
Summary:
Based on two patches by Michael Mueller.
This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.
This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.
Reviewers: majnemer, sanjoy, rnk
Subscribers: dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D19908
llvm-svn: 278048
Gathering constantins from a condition on the false path ask makeAllowedICmpRegion about inverse predicate instead of inversing the resulting range.
This change was separated from the review "[LVI] Make LVI smarter about comparisons with non-constants" (https://reviews.llvm.org/D23205#inline-198361)
llvm-svn: 278009
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.
Reviewers: davidxl, eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D22368
llvm-svn: 277860
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.
Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.
Differential Revision: https://reviews.llvm.org/D23049
llvm-svn: 277782
I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well,
so we know where those folds are happening in InstSimplify.
llvm-svn: 277738
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
This reverts commit r277611 and the followup r277614.
Bootstrap builds and chromium builds are crashing during inlining after
this change.
llvm-svn: 277642
We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.
This fixes PR28802.
llvm-svn: 277611
There were issues with simply reporting AttrUnknown on
previously-unknown values in CFLAnders. So, we now act *entirely*
conservatively for values we haven't seen before. As in the prior patch
(r277362), writing a lit test for this isn't exactly trivial. If someone
wants a test badly, I'm willing to try to write one.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D23077
llvm-svn: 277533
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
Summary: By generalize the interface, users are able to inject more flexible Node token into the algorithm, for example, a pair of vector<Node>* and index integer. Currently I only migrated SCCIterator to use NodeRef, but more is coming. It's a NFC.
Reviewers: dblaikie, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22937
llvm-svn: 277399
As it turns out, modref queries are broken with CFLAA. Specifically,
the data source we were using for determining modref behaviors
explicitly ignores operations on non-pointer values. So, it wouldn't
note e.g. storing an i32 to an i32* (or loading an i64 from an i64*).
It also ignores external function calls, rather than acting
conservatively for them.
(N.B. These operations, where necessary, *are* tracked by CFLAA; we just
use a different mechanism to do so. Said mechanism is relatively
imprecise, so it's unlikely that we can provide reasonably good modref
answers with it as implemented.)
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22978
llvm-svn: 277366
Currently, CFLAnders assumes that values it hasn't seen don't alias
anything. This patch fixes that. Given that the only way for this to
happen is to query AA, rely on specific transformations happening, then
query AA again (looking for a specific set of queries), lit testing is a
bit difficult. If someone really wants a test, I'm happy to add one.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22981
llvm-svn: 277362
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277313
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
An undef vector element can be treated as if it had any value. Folding
such a vector element to 0 in a bitcast can open up further folding
opportunities.
llvm-svn: 277104
ConstantExpr::getWithOperands does much of the hard work that
ConstantFoldInstOperandsImpl tries to do but more completely.
This lets us fold ExtractValue/InsertValue expressions.
llvm-svn: 277100
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
This patch fixes an assertion that fires when we try to add non-pointer
Values to the CFLGraph. Centralizing the check for whether something
is/isn't a pointer type isn't completely trivial (and, in some cases,
would end up being entirely redundant), but it may be beneficial to do
so if this trips us up more in the future.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22947
llvm-svn: 277096
Summary:
The motivation is the same as in D22141: In order to add the hotness
attribute to optimization remarks we need BFI to be available in all
passes that emit optimization remarks. BFI depends on BPI so unless we
make this lazy as well we would still compute BPI unconditionally.
The solution is to use the new LazyBPI pass in LazyBFI and only compute
BPI when computation of BFI is requested by the client.
I extended the laziness test using a LoopDistribute test to also cover
BPI.
Reviewers: hfinkel, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22835
llvm-svn: 277083
When folding an expression, we run ConstantFoldConstantExpression on
each operand of that expression.
However, ConstantFoldConstantExpression can fail and retur nullptr.
Previously, we would bail on further refining the expression.
Instead, use the original operand and see if we can refine a later
operand.
llvm-svn: 276959
This patch lets CFLAnders respond to mod-ref queries. It also includes
a small bugfix to CFLSteens.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22823
llvm-svn: 276939
Summary:
This lets us avoid creating and destroying a CallbackVH every time we
check the cache.
This is good for a 2% e2e speedup when compiling one of the large Eigen
tests at -O3.
FTR, I tried making the ValueCache hashtable one-level -- i.e., mapping
a pair (Value*, BasicBlock*) to a lattice value, and that didn't seem to
provide any additional improvement. Saving a word in LVILatticeVal by
merging the Tag and Val fields also didn't yield a speedup.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21951
llvm-svn: 276926
This unblocks the new PM part of River's patch in
https://reviews.llvm.org/D22706
Conveniently, this same change was needed for D21921 and so these
changes are just spun out from there.
llvm-svn: 276515
While we handed loads past the end of an array, we didn't handle loads
_before_ the array.
This fixes PR28062.
N.B. While the bug in the code is obvious, I am struggling to craft a
test case which is reasonable in size.
llvm-svn: 276510
This change lets us prove things like
"{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow"
It does this by replacing replacing getConstantDifference by
computeConstantDifference (which is smarter) in
isImpliedCondOperandsViaRanges.
llvm-svn: 276505
This patch teaches FunctionInfo about offsets.
Like the last patch, this one doesn't introduce any visible
functionality change (the core algorithm knows nothing about offsets;
they're just plumbed through). Tests will come when we start acting
differently because of the offsets.
Patch by Jia Chen.
(N.B. I made a tiny change to Jia's patch to avoid warnings by GCC: I
put DenseMapInfo specializations in the `llvm` namespace. Only realized
that those appeared when compiling locally. :) )
Differential Revision: https://reviews.llvm.org/D22634
llvm-svn: 276486
rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466:
https://llvm.org/bugs/show_bug.cgi?id=28466
It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to
recognize that pattern to eliminate selects that are choosing between some value and some
bitmasked version of that value.
Note that there is significant room for improvement (refactoring) and enhancement (more
patterns, possibly in InstCombine rather than here).
Differential Revision: https://reviews.llvm.org/D22537
llvm-svn: 276341
std::numeric_limits<int64_t>::max() is not constexpr in VC 2013 headers,
and Clang complains that it isn't. MSVC 2013 itself is emitting a
dynamic initializer for this thing. Instead, use an enum.
llvm-svn: 276334
Having the added `\brief` made doxygen interpret it as the summary for
the `llvm` namespace (visible at:
http://llvm.org/doxygen/namespaces.html).
llvm-svn: 276331
(Also, refactor our constexpr handling to be less insane).
This patch lets us track field offsets in the CFL Graph, which is the
first step to making CFLAA field/offset sensitive. Woohoo! Note that
this patch shouldn't visibly change our behavior (since we make no use
of the offsets we're now tracking), so we can't quite add tests for this
yet.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22598
llvm-svn: 276201
The earlier change added hotness attribute to missed-optimization
remarks. This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).
llvm-svn: 276192
This helps because LoopAccessReport is passed around as a const
reference and we derive the basic block passed as the Value parameter
from the instruction in LoopAccessReport.
llvm-svn: 276191
In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion.
However, const folding and sext/zext distribution can make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as
V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a
complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused
by the fact that S3 is generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV to
ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the
ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first
expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to
V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
llvm-svn: 276136
We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.
Also port LoopInfo verifier pass to test LoopUnrollPass.
llvm-svn: 276063
This patch adds function summary support to CFLAnders. It also comes
with a lot of tests! Woohoo!
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22450
llvm-svn: 276026
This patch adds more specific edges to CFLAndersAliasAnalysis. The goal
of these edges is to give us more information about *how* two values
that MayAlias alias. With this, we can now tell cases like
a = b; // ergo, a may alias b
apart from
a = c;
b = c;
// so, a may alias b, but only because they were both assigned to c.
...And others.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22429
llvm-svn: 276023
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
llvm-svn: 275981
Summary:
The main goal is to able to start using the new OptRemarkEmitter
analysis from the LoopVectorizer. Since the vectorizer was recently
converted to the new PM, it makes sense to convert this analysis as
well.
This pass is currently tested through the LoopDistribution pass, so I am
also porting LoopDistribution to get coverage for this analysis with the
new PM.
Reviewers: davidxl, silvas
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D22436
llvm-svn: 275810
Summary:
To enable profile-guided indirect call promotion in ThinLTO mode, we
simply add call graph edges for each profitable target from the profile
to the summaries, then the summary-guided importing will consider the
callee for importing as usual.
Also we need to enable the indirect call promotion pass creation in the
PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO
backend), so that the newly imported functions are considered for
promotion in the backends.
The IC promotion profiles refer to callees by GUID, which required
adding GUIDs to the per-module VST in bitcode (and assigning them
valueIds similar to how they are assigned valueIds in the combined
index).
Reviewers: mehdi_amini, xur
Subscribers: mehdi_amini, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21932
llvm-svn: 275707
This patch adds proper handling of stratified attributes into our
anders-style CFLAA implementation. It also comes bundled with more
CFLAnders tests. :)
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22325
llvm-svn: 275604
This adds an incomplete anders-style implementation for CFLAA. It's
incomplete in that it's missing interprocedural analysis, attrs
handling, etc. and that it needs more tests. More tests and features
will be added in future commits.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22291
llvm-svn: 275602
Summary:
This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334
This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution. My goal is to shake out the design issues before scaling
it up to other types and passes.
Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count. It's only printed in opt
currently since clang prints the diagnostic fields directly. E.g.:
remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)
A new API added is similar to emitOptimizationRemarkMissed. The
difference is that it additionally takes a code region that the
diagnostic corresponds to. From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.)
This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context. If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.
A new command-line option is added to turn this on in opt.
My plan is to switch all user of emitOptimizationRemark* to use this
module instead.
Reviewers: hfinkel
Subscribers: rcox2, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21771
llvm-svn: 275583
Calling getModRefInfo with a fence resulted in crashes because fences
don't have a memory location. Add a new predicate to Instruction
called isFenceLike which indicates that the instruction mutates memory
but not any single memory location in particular. In practice, it is a
proxy for the set of instructions which "mayWriteToMemory" but cannot be
used with MemoryLocation::get.
This fixes PR28570.
llvm-svn: 275581
Most possibly problem was caused by the same reason as PR28400. This change
bypasses it by using CallbackVH instead of AssertingVH.
Differential Revision: https://reviews.llvm.org/D20957
llvm-svn: 275563
Summary:
In preparation for changing GlobalsAA to stop assuming that intrinsics
can't read arbitrary globals, we need to make sure GlobalsAA is querying
function attributes rather than relying on this assumption.
This patch was inspired by: http://reviews.llvm.org/D20206
Reviewers: jmolloy, hfinkel
Subscribers: eli.friedman, llvm-commits
Differential Revision: https://reviews.llvm.org/D21318
llvm-svn: 275433
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.
Differential Revision: http://reviews.llvm.org/D21183
llvm-svn: 275382
Treat loads which clip before the start of a global initializer the same
way we treat clipping beyond the end of the initializer: use zeros.
llvm-svn: 275345
Summary:
This is necessary for D21771. In order to add the hotness attribute to
optimization remarks we need BFI to be available in all passes that emit
optimization remarks.
However we don't want to pay for computing BFI unless the hotness
attribute is requested.
This is achieved by making BFI lazy at the very high-level through a new
analysis pass -- BFI is not calculated unless requested.
I am adding a test to check the laziness under D21771 where the first
user of the analysis is added.
Reviewers: hfinkel, dexonsmith, davidxl
Subscribers: davidxl, dexonsmith, llvm-commits
Differential Revision: http://reviews.llvm.org/D22141
llvm-svn: 275250
Summary:
Refactored the profitability analysis out of the IC promotion pass and
into lib/Analysis so that it can be accessed by the summary index
builder in a follow-on patch to enable IC promotion in ThinLTO (D21932).
Reviewers: davidxl, xur
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22182
llvm-svn: 275216
This patch simplifies the graph builder by encoding nodes as {Value,
Dereference Level} pairs. This lets us kill edge types, and allows us to
get rid of hacks in StratifiedSets (like addAttrsBelow/...). This
simplification also allows us to remove InstantiatedRelations and
InstantiatedAttrs.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D22080
llvm-svn: 275122
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.
E.g.
if (A1 && A2 && A3 && ..... && A10) {
for (i=0; i < 100000000; i++) {
callsite();
}
}
Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.
In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.
Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.
Reviewers: davidxl, eraman, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22118
llvm-svn: 275073
This subtle change to getModRefInfo(Instruction, ImmutableCallSite) is to
ensure that the semantics are equal to that of getModRefInfo(CS1, CS2) when
the Instruction is a call-site.
This is now more in line with getModRefInfo generally: it returns Mod when
I modifies a memory location that is accessed (read or written) by CS and
Ref when I reads a memory location that is written by CS.
From a grep of the code, the only uses of this particular getModRefInfo
overload are in MemorySSA and MemCpyOptimizer, and they only care about
where the result is MR_NoModRef or not. Therefore, this change should have
no visible effect.
Separated out from D17279 upon request.
llvm-svn: 275065
For functions which are known to return a specific argument, pointer-comparison
folding can look through the function calls as part of its analysis.
Differential Revision: http://reviews.llvm.org/D9387
llvm-svn: 275039
For functions which are known to return their argument,
isDereferenceableAndAlignedPointer can examine the argument value.
Differential Revision: http://reviews.llvm.org/D9384
llvm-svn: 275038
When building SCEVs, if a function is known to return its argument, then we can
build the SCEV using the corresponding argument value.
Differential Revision: http://reviews.llvm.org/D9381
llvm-svn: 275037
If a function is known to return one of its arguments, we can use that in order
to compute known bits of the return value.
Differential Revision: http://reviews.llvm.org/D9397
llvm-svn: 275036
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.
Differential Revision: http://reviews.llvm.org/D9383
llvm-svn: 275035
This removes a few fields from the graph builder by making us compute
things (that we'd always compute anyway) more eagerly.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D22009
llvm-svn: 274957
We can fold truncs whose operand feeds from a load, if the trunc value
is available through a prior load/store.
This change is from: http://reviews.llvm.org/D21246, which folded the
trunc but missed the bitcast or ptrtoint/inttoptr required in the RAUW
call, when the load type didnt match the prior load/store type.
Differential Revision: http://reviews.llvm.org/D21791
llvm-svn: 274853
friend definitions.
Based on the experiments Sean Silva and Reid did, this seems the safest
course of action and also will work around a questionable warning
provided by GCC6 on the old form of the code. Thanks for Davide pointing
out the issue and other suggesting ways to fix.
llvm-svn: 274740
"More things" = StratifiedAttrs and various bits like interprocedural
summaries.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21964
llvm-svn: 274592
StratifiedSets (as implemented) is very fast, but its accuracy is also
limited. If we take a more aggressive andersens-like approach, we can be
way more accurate, but we'll also end up being slower.
So, we've decided to split CFLAA into CFLSteensAA and CFLAndersAA.
Long-term, we want to end up in a place where CFLSteens is queried
first; if it can provide an answer, great (since queries are basically
map lookups). Otherwise, we'll fall back to CFLAnders, BasicAA, etc.
This patch splits everything out so we can try to do something like
that when we get a reasonable CFLAnders implementation.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21910
llvm-svn: 274589
Summary:
This complements the earlier addition of IntrWriteMem and IntrWriteArgMem
LLVM intrinsic properties, see D18291.
Also start using the attribute for memset, memcpy, and memmove intrinsics,
and remove their special-casing in BasicAliasAnalysis.
Reviewers: reames, joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18714
llvm-svn: 274485
This actually uncovered a surprisingly large chain of ultimately unused
TLI args.
From what I can gather, this argument is a remnant of when
isKnownNonNull would look at the TLI directly.
The current approach seems to be that InferFunctionAttrs runs early in
the pipeline and uses TLI to annotate the TLI-dependent non-null
information as return attributes.
This also removes the dependence of functionattrs on TLI altogether.
llvm-svn: 274455
This patch makes CFLAA answer some ModRef queries. Because we don't
distinguish between reading/writing when making StratifiedSets, we're
unable to offer any of the readonly-related answers.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21858
llvm-svn: 274197
This is breaking an optimizaton remark test in clang. I've identified a couple fixes for that, but want to understand it better before I commit to anything.
llvm-svn: 274102
If a operation for a recurrence is an addition with no signed wrap and both input sign bits are 0, then the result sign bit must also be 0. Similar for the negative case.
I found this deficiency while playing around with a loop in the x86 backend that contained a signed division that could be optimized into an unsigned division if we could prove both inputs were positive. One of them being the loop induction variable. With this patch we can perform the conversion for this case. One of the test cases here is a contrived variation of the loop I was looking at.
Differential revision: http://reviews.llvm.org/D21493
llvm-svn: 274098
This patch enhances dot graph viewer to show hot regions
with hot bbs/edges displayed in red. The ratio of the bb
freq to the max freq of the function needs to be no less
than the value specified by view-hot-freq-percent option.
The default value is 10 (i.e. 10%).
llvm-svn: 273996
MBFI supports profile count dumping and function
name based filtering. Add these two feature to
BFI as well. The filtering option is shared between
BFI and MBFI: -view-bfi-func-name=..
llvm-svn: 273992
BFI and MBFI's dot traits class share most of the
code and all future enhancement. This patch extracts
common implementation into base class BFIDOTGraphTraitsBase.
This patch also enables BFI graph to show branch probability
on edges as MBFI does before.
llvm-svn: 273990
the new pass manager.
This adds operator<< overloads for the various bits of the
LazyCallGraph, dump methods for use from the debugger, and debug logging
using them to the CGSCC pass manager.
Having this was essential for debugging the call graph update patch, and
I've extracted what I could from that patch here to minimize the delta.
llvm-svn: 273961
Apparently, MSVC complains if there's an implicit conversion from
`unsigned` to `unsigned long long`, if the `unsigned` is the result of
a bit shift.
llvm-svn: 273955
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Re-commit rL273257 - revision: http://reviews.llvm.org/D20789
llvm-svn: 273864
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
Differential Revision: http://reviews.llvm.org/D21121
llvm-svn: 273756
The bitset metadata currently used in LLVM has a few problems:
1. It has the wrong name. The name "bitset" refers to an implementation
detail of one use of the metadata (i.e. its original use case, CFI).
This makes it harder to understand, as the name makes no sense in the
context of virtual call optimization.
2. It is represented using a global named metadata node, rather than
being directly associated with a global. This makes it harder to
manipulate the metadata when rebuilding global variables, summarise it
as part of ThinLTO and drop unused metadata when associated globals are
dropped. For this reason, CFI does not currently work correctly when
both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
globals, and fails to associate metadata with the rebuilt globals. As I
understand it, the same problem could also affect ASan, which rebuilds
globals with a red zone.
This patch solves both of those problems in the following way:
1. Rename the metadata to "type metadata". This new name reflects how
the metadata is currently being used (i.e. to represent type information
for CFI and vtable opt). The new name is reflected in the name for the
associated intrinsic (llvm.type.test) and pass (LowerTypeTests).
2. Attach metadata directly to the globals that it pertains to, rather
than using the "llvm.bitsets" global metadata node as we are doing now.
This is done using the newly introduced capability to attach
metadata to global variables (r271348 and r271358).
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
Differential Revision: http://reviews.llvm.org/D21053
llvm-svn: 273729
This patch also has a refactor that kills StratifiedAttr, and leaves us
with StratifiedAttrs, because having both was mildly redundant.
This patch makes us correctly handle stratified attributes when doing
interprocedural analysis. It also adds another attribute, AttrCaller,
which acts like AttrUnknown. We can filter out AttrCaller values when
during interprocedural analysis, since the caller should have
information about what arguments it's passing to its callee.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21645
llvm-svn: 273636
Summary:
This instcombine rule folds away trunc operations that have value available from a prior load or store.
This kind of code can be generated as a result of GVN widening the load or from source code as well.
Reviewers: reames, majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21246
llvm-svn: 273608
Previously, we just unified any arguments that seemed to be related to
each other. With this patch, we now respect dereference levels, etc.
which should make us substantially more accurate. Proper handling of
StratifiedAttrs will be done in a later patch.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21536
llvm-svn: 273596
This was noted in http://reviews.llvm.org/D21610 . The previous code
predated the use of APInt ( http://reviews.llvm.org/rL47654 ), so it
had to account for the fixed width of uint64_t.
Now that we're using the variable width APInt, we can remove some
complexity.
llvm-svn: 273584
When simplifying a load we need to make sure that the type of the
simplified value matches the type of the instruction we're processing.
In theory, we can handle casts here as we deal with constant data, but
since it's not implemented at the moment, we at least need to bail out.
This fixes PR28262.
llvm-svn: 273562
This is similar to the computeKnownBits improvement in rL268479.
There's probably more we can do for vector logic instructions, but
this should let us see non-splat constant masking ops that can
become vector selects instead of and/andn/or sequences.
Differential Revision: http://reviews.llvm.org/D21610
llvm-svn: 273459
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Differential revision: http://reviews.llvm.org/D20789
llvm-svn: 273257
This patch makes us perform interprocedural analysis on functions that
don't have internal linkage. It also removes a test that should've been
deleted in an earlier commit (since other tests now cover everything
that the newly-removed test covers).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21513
llvm-svn: 273229
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21475#inline-182390
llvm-svn: 273219
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Differential Revision: http://reviews.llvm.org/D21512
llvm-svn: 273200
On the surface, this might not look like it does anything... but
actually it brings in the declaration "extern template class
AnalysisManager<Loop>;", which suppresses the instantiation of the
constructor, which avoids the funny interaction between "extern
template" and -fvisibility-inlines-hidden.
llvm-svn: 273133
Access it through -passes=print-lcg-dot
Let me know any suggestions for changing the rendering; I'm not
particularly attached to what is implemented here.
llvm-svn: 273082
The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:
```
static int wrapping_add(int a, int b) {
return (int)((unsigned)a + (unsigned)b);
}
void test() {
volatile int end_buf = 2147483548; // INT_MIN - 100
int end = end_buf;
unsigned counter = 0;
for (int start = wrapping_add(end, 200); start < end; start++)
counter++;
print(counter);
}
```
Note: the `NoWrap` variable that was being tested has little to do with
the values flowing into the max expression; it is a property of the
induction variable.
test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test
functionality I'm reverting in this change, so I've deleted the test
fully.
llvm-svn: 273079
This is a functional change for LLE and LDist. The other clients (LV,
LVerLICM) already had this explicitly enabled.
The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides. This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter. This makes the
interface more friendly to the new Pass Manager.
The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.
llvm-svn: 273064
pass manager passes' `run` methods.
This removes a bunch of SFINAE goop from the pass manager and just
requires pass authors to accept `AnalysisManager<IRUnitT> &` as a dead
argument. This is a small price to pay for the simplicity of the system
as a whole, despite the noise that changing it causes at this stage.
This will also helpfull allow us to make the signature of the run
methods much more flexible for different kinds af passes to support
things like intelligently updating the pass's progression over IR units.
While this touches many, many, files, the changes are really boring.
Mostly made with the help of my trusty perl one liners.
Thanks to Sean and Hal for bouncing ideas for this with me in IRC.
llvm-svn: 272978
This is still NFCI, so the list of clients that allow symbolic stride
speculation does not change (yes: LV and LoopVersioningLICM, no: LLE,
LDist). However since the symbolic strides are now managed by LAA
rather than passed by client a new bool parameter is used to enable
symbolic stride speculation.
The existing test Transforms/LoopVectorize/version-mem-access.ll checks
that stride speculation is performed for LV.
The previously added test Transforms/LoopLoadElim/symbolic-stride.ll
ensures that no speculation is performed for LLE.
The next patch will change the functionality and turn on symbolic stride
speculation in all of LAA's clients and remove the bool parameter.
llvm-svn: 272970
We should update results of the BranchProbabilityInfo after removing block in JumpThreading. Otherwise
we will get dangling pointer inside BranchProbabilityInfo cache.
Differential Revision: http://reviews.llvm.org/D20957
llvm-svn: 272891
This patch makes CFLAA ignore non-pointer values, since we can now
sanely do that with the escaping/unknown attributes. Additionally,
StratifiedAttrs make more sense to sit on nodes than edges (since
they're properties of values, and ultimately end up on the nodes of
StratifiedSets). So, this patch puts said attributes on nodes.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21387
llvm-svn: 272833
We would fail to validate the type of the tan function which would cause
downstream users of isValidProtoForLibFunc to assert.
This fixes PR28143.
llvm-svn: 272802
Use Optional<T> to denote the absence of a solution, not
SCEVCouldNotCompute. This makes the usage of SolveQuadraticEquation
somewhat simpler.
llvm-svn: 272752
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
This change teaches llvm::isGuaranteedToTransferExecutionToSuccessor
that calls to @llvm.assume always terminate. Most other relevant
intrinsics should be covered by the "CS.onlyReadsMemory() ||
CS.onlyAccessesArgMemory()" bit but we were missing @llvm.assumes
because we state that it clobbers memory.
Added an LICM test case, but this change is not specific to LICM.
llvm-svn: 272703
This patch also includes some refactoring.
Prior to this patch, we tagged all CFLAA attributes as unknown. This is
suboptimal, since it meant that any Value used as an argument would be
considered to alias any other Value that existed.
Now that we have the machinery to tag sets below the set for an
arbitrary value with attributes, it's okay to be less conservative with
arguments. (Specifically, we still tag the set under an argument with
unknown).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21262
llvm-svn: 272690
This patch refactors CFLAA's graph building code. This makes keeping
track of common state (TargetLibraryInfo, ...) easier.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21261
llvm-svn: 272688
Summary:
The SimplifyLibCalls part of InstCombine generates calls to those otherwise.
I wonder if at some point we shouldn't just call disableAllFunctions() and
then enable functions on a whitelist basis...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96495
Reviewers: arsenm, tstellarAMD
Subscribers: llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21282
llvm-svn: 272664
This is a bit gnarly since LVI is maintaining its own cache.
I think this port could be somewhat cleaner, but I'd rather not spend
too much time on it while we still have the old pass hanging around and
limiting how much we can clean things up.
Once the old pass is gone it will be easier (less time spent) to clean
it up anyway.
This is the last dependency needed for porting JumpThreading which I'll
do in a follow-up commit (there's no printer pass for LVI or anything to
test it, so porting a pass that depends on it seems best).
I've been mostly following:
r269370 / D18834 which ported Dependence Analysis
r268601 / D19839 which ported BPI
llvm-svn: 272593
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
llvm-svn: 272580
Summary:
Make isGuaranteedToExecute use the
isGuaranteedToTransferExecutionToSuccessor helper, and make that helper
a bit more accurate.
There's a potential performance impact here from assuming that arbitrary
calls might not return. This probably has little impact on loads and
stores to a pointer because most things alias analysis can reason about
are dereferenceable anyway. The other impacts, like less aggressive
hoisting of sdiv by a variable and less aggressive hoisting around
volatile memory operations, are unlikely to matter for real code.
This also impacts SCEV, which uses the same helper. It's a minor
improvement there because we can tell that, for example, memcpy always
returns normally. Strictly speaking, it's also introducing
a bug, but it's not any worse than everywhere else we assume readonly
functions terminate.
Fixes http://llvm.org/PR27857.
Reviewers: hfinkel, reames, chandlerc, sanjoy
Subscribers: broune, llvm-commits
Differential Revision: http://reviews.llvm.org/D21167
llvm-svn: 272489
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.
Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21110
llvm-svn: 272335
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).
I believe with this change PR28012 is fixed.
Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.
llvm-svn: 272237
This is NFC as far as externally visible behavior is concerned, but will
keep us from spinning in the worklist traversal algorithm unnecessarily.
llvm-svn: 272182
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
llvm-svn: 272181
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
llvm-svn: 272179
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
This patch does a few things:
- Unifies AttrAll and AttrUnknown (since they were used for more or less
the same purpose anyway).
- Introduces AttrEscaped, an attribute that notes that a value escapes
our analysis for a given set, but not that an unknown value flows into
said set.
- Removes functions that take bit indices, since we also had functions
that took bitsets, and the use of both (with similar names) was
unclear and bug-prone.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21000
llvm-svn: 272040
In some cases, when simplifying with SCEV, we might consider pointer values as
just usual integer values. Thus, we might get a different type from what we
had originally in the map of simplified values, and hence we need to check
types before operating on the values.
This fixes PR28015.
llvm-svn: 271931
Now that `Value::getPointerDereferenceableBytes` looks beyond just
attributes, the name `isDereferenceableFromAttribute` is misleading.
Just inline the function, since it is small and only used once.
llvm-svn: 271456
... and merge into `Value::getPointerDereferenceableBytes`. This was
suggested by Artur Pilipenko in D20764 -- since we no longer allow loads
of unsized types, there is no need anymore to have this special logic.
llvm-svn: 271455
Summary:
Make sure that the SCEVExpander Builder insert point and any
saved/restored insert points are kept consistent (i.e. their Instruction
and BasicBlock match) when moving instructions in SCEVExpander.
This fixes an issue triggered by
http://reviews.llvm.org/D18001 [LSR] Create fewer redundant instructions.
Test case will be added in reapply commit of above change:
http://reviews.llvm.org/D18480 Reapply [LSR] Create fewer redundant instructions.
Reviewers: sanjoy
Subscribers: mzolotukhin, sanjoy, qcolombet, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20703
llvm-svn: 271424
This patch extends CFLAA to recognize allocation functions such as
malloc, free, etc, so we can treat them more aggressively.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20776
llvm-svn: 271421
Patch by Taewook Oh
Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure.
Reviewers: hfinkel, dberlin
Subscribers: dberlin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20665
llvm-svn: 271415
Summary:
Change some of the internal interfaces in Loads.cpp to keep track of the
number of bytes we're trying to prove dereferenceable using an explicit
`Size` parameter.
Before this, the `Size` parameter was implicitly inferred from the
pointee type of the pointer whose dereferenceability we were trying to
prove, causing us to be conservative around bitcasts. This was
unfortunate since bitcast instructions are no-ops and should never
break optimizations. With an explicit `Size` parameter, we're more
precise (as shown in the test cases), and the code is simpler.
We should eventually move towards a `DerefQuery` struct that groups
together a base pointer, an offset, a size and an alignment; but this
patch is a first step.
Reviewers: apilipenko, dblaikie, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20764
llvm-svn: 271406
Code like the following is considered broken, and doesn't need to be
supported by our AA magicks:
void getFoo(int *P) {
int *PAlias = (int *)((char *)NULL + (uintptr_t)P);
}
This patch makes CFLAA drop support for code like this.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20775
llvm-svn: 271322
This adds support to the backed to actually support SjLj EH as an exception
model. This is *NOT* the default model, and requires explicitly opting into it
from the frontend. GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244
Consolidate documentation by removing comments from the .cpp file where
the comments in the .cpp file were copy-pasted from the header.
llvm-svn: 271157
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 271152
Fixes PR27315.
The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression. Otherwise we miscompile
programs like
```
int main() {
int a = 0;
unsigned a_u = 0;
volatile long last_value;
do {
a_u += 3;
last_value = (long) ((int) a_u);
if (will_add_overflow(a, 3)) {
// Leave, and don't actually do the increment, so no UB.
printf("last_value = %ld\n", last_value);
exit(0);
}
a += 3;
} while (a != 46);
return 0;
}
```
This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.
To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases. This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.
llvm-svn: 271151
r270777 improved the precision of alloca vs. inbounbds GEP alias queries: if
we have (a) an inbounds GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points to would have a negative offset with
respect to the alloca, then the GEP can not alias pointer (b).
This makes the same logic fire when (b) is based on a GlobalVariable instead
of an alloca.
Differential Revision: http://reviews.llvm.org/D20652
llvm-svn: 270893
The memory location that corresponds to a volatile operation is very
special. They are observed by the machine in ways which we cannot
reason about.
Differential Revision: http://reviews.llvm.org/D20555
llvm-svn: 270879
It turns out that too many passes are relying on alias analysis results
for control dependencies. Until we fix that by introducing a more accurate
modelling of control dependencies, special case assume in MemorySSA instead.
Also introduce tests to ensure we don't regress the FunctionAttrs or LICM
passes.
Differential Revision: http://reviews.llvm.org/D20658
llvm-svn: 270823
If a we have (a) a GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points would have a negative offset with
repsect to the alloca, then the GEP can not alias pointer (b).
For example, consider code like:
struct { int f0, int f1, ...} foo;
...
foo alloca;
foo *random = bar(alloca);
int *f0 = &alloca.f0
int *f1 = &random->f1;
Which is lowered, approximately, to:
%alloca = alloca %struct.foo
%random = call %struct.foo* @random(%struct.foo* %alloca)
%f0 = getelementptr inbounds %struct, %struct.foo* %alloca, i32 0, i32 0
%f1 = getelementptr inbounds %struct, %struct.foo* %random, i32 0, i32 1
Assume %f1 and %f0 alias. Then %f1 would point into the object allocated
by %alloca. Since the %f1 GEP is inbounds, that means %random must also
point into the same object. But since %f0 points to the beginning of %alloca,
the highest %f1 can be is (%alloca + 3). This means %random can not be higher
than (%alloca - 1), and so is not inbounds, a contradiction.
Differential Revision: http://reviews.llvm.org/D20495
llvm-svn: 270777
Getting accurate locations for loops is important, because those locations are
used by the frontend to generate optimization remarks. Currently, optimization
remarks for loops often appear on the wrong line, often the first line of the
loop body instead of the loop itself. This is confusing because that line might
itself be another loop, or might be somewhere else completely if the body was
inlined function call. This happens because of the way we find the loop's
starting location. First, we look for a preheader, and if we find one, and its
terminator has a debug location, then we use that. Otherwise, we look for a
location on an instruction in the loop header.
The fallback heuristic is not bad, but will almost always find the beginning of
the body, and not the loop statement itself. The preheader location search
often fails because there's often not a preheader, and even when there is a
preheader, depending on how it was formed, it sometimes carries the location of
some preceeding code.
I don't see any good theoretical way to fix this problem. On the other hand,
this seems like a straightforward solution: Put the debug location in the
loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert
llvm.loop metadata with appropriate locations when generating debugging
information. With these changes, our loop remarks have much more accurate
locations.
Differential Revision: http://reviews.llvm.org/D19738
llvm-svn: 270771
There was a typo in r267758. It caused invalid accesses when
given something like "void @free(...)", as NumParams == 0, and
we then try to look at the 0th parameter.
Turns out, most of these were untested; add both attribute
and missing-prototype checks for all libc libfuncs.
Differential Revision: http://reviews.llvm.org/D20543
llvm-svn: 270750
Summary:
**Description**
This makes `WidenIV::widenIVUse` (IndVarSimplify.cpp) fail to widen narrow IV uses in some cases. The latter affects IndVarSimplify which may not eliminate narrow IV's when there actually exists such a possibility, thereby producing ineffective code.
When `WidenIV::widenIVUse` gets a NarrowUse such as `{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>`, it first tries to get a wide recurrence for it via the `getWideRecurrence` call.
`getWideRecurrence` returns recurrence like this: `{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>`.
Then a wide use operation is generated by `cloneIVUser`. The generated wide use is evaluated to `{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>`, which is different from the `getWideRecurrence` result. `cloneIVUser` sees the difference and returns nullptr.
This patch also fixes the broken LLVM tests by adding missing <nsw> entries introduced by the correction.
**Minimal reproducer:**
```
int foo(int a, int b, int c);
int baz();
void bar()
{
int arr[20];
int i = 0;
for (i = 0; i < 4; ++i)
arr[i] = baz();
for (; i < 20; ++i)
arr[i] = foo(arr[i - 4], arr[i - 3], arr[i - 2]);
}
```
**Clang command line:**
```
clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf test.cpp -o test.ir
```
**Expected result:**
The ` -mllvm -debug` log shows that all the IV's for the second `for` loop have been eliminated.
Reviewers: sanjoy
Subscribers: atrick, asl, aemerson, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D20058
llvm-svn: 270695
Similar in spirit to D20497 :
If all elements of a constant vector are known non-zero, then we can say that the
whole vector is known non-zero.
It seems like we could extend this to FP scalar/vector too, but isKnownNonZero()
says it only works for integers and pointers for now.
Differential Revision: http://reviews.llvm.org/D20544
llvm-svn: 270562
We could try harder to handle non-splat vector constants too,
but that seems much rarer to me.
Note that the div test isn't resolved because there's a check
for isIntegerTy() guarding that transform.
Differential Revision: http://reviews.llvm.org/D20497
llvm-svn: 270369
When it has a DataLayout, DecomposeGEPExpression() should return the same object
as GetUnderlyingObject(). Per the FIXME, it currently always has a DL, so the
runtime check is redundant and can become an assert.
llvm-svn: 270268
Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior.
Differential Revision: http://reviews.llvm.org/D20452
llvm-svn: 270153
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.
This change was requested in the review of D19984.
llvm-svn: 270072
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is the NUW variant of r269211 and fixes PR27691.
(Note: PR27691 is not a correct or stability bug, it was created to
track a pending task).
llvm-svn: 269790
This patch renames the option enabling the store-to-load forwarding conflict
detection optimization. This change was requested in the review of D20241.
llvm-svn: 269668
Also s/Cycles/Iters/ in NumCyclesForStoreLoadThroughMemory to make it
clear that this is not about clock cycles but loop cycles/iterations.
llvm-svn: 269667
Fix "Logic error" warnings of the type "Called C++ object pointer is
null" reported by Clang Static Analyzer on the following files:
lib/Analysis/ScalarEvolution.cpp,
lib/Analysis/LoopInfo.cpp.
Patch by Apelete Seketeli!
llvm-svn: 269424
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Summary:
Currently we consider such instructions as simplified, which is incorrect,
because if their user isn't simplified, we can't actually simplify them too.
This biases our estimates of profitability: for instance the analyzer expects
much more gains from unrolling memcpy loops than there actually are.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17365
llvm-svn: 269387
Ported DA to the new PM by splitting the former DependenceAnalysis Pass
into a DependenceInfo result type and DependenceAnalysisWrapperPass type
and adding a new PM-style DependenceAnalysis analysis pass returning the
DependenceInfo.
Patch by Philip Pfaffe, most of the review by Justin.
Differential Revision: http://reviews.llvm.org/D18834
llvm-svn: 269370
SCEVExpander::replaceCongruentIVs assumes the backedge value of an
SCEV-analysable PHI to always be an instruction, when this is not
necessarily true. For now address this by bailing out of the
optimization if the backedge value of the PHI is a non-Instruction.
llvm-svn: 269213
`SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the
original and the isomorphic increments are PHI nodes. Doing this can
break SSA if the isomorphic increment is not dominated by the original
increment. Get rid of the bypass, and let `hoistIVInc` do the right
thing.
Fixes PR27232 (compile time crash/hang).
llvm-svn: 269212
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is not a problem for "normal" loops[0] that don't have
guards or assumes, but helps in cases where we have guards or assumes in
the loop that can be used to constrain incoming values over the backedge.
This partially fixes PR27691 (we still don't handle the NUW case).
[0]: for "normal" loops, in the cases where we'd be able to prove
no-wrap via isKnownPredicate, we'd also be able to compute a max
tripcount.
llvm-svn: 269211
Equivalent GEP indices with different types are treated as different
indices altogether, leading to an incorrect AA result. Fix the issue
by comparing indices based on their values.
Thanks to Mikael Holmén for reporting the issue!
Differential Revision: http://reviews.llvm.org/D19935
llvm-svn: 269197
Extract a part of isDereferenceableAndAlignedPointer functionality to Value:
Reviewed By: hfinkel, sanjoy
Differential Revision: http://reviews.llvm.org/D17611
llvm-svn: 269190
Do simplifications common to all shift instructions based on the amount shifted:
1. If the shift amount is known larger than the bitwidth, the result is undefined.
2. If the valid bits of the shift amount are all known to be 0, it's a shift by zero, so the shift operand is the result.
Note that we could generalize the shift-by-zero transform into a shift-by-constant if all of the valid bits in the shift
amount are known, but that would have to be done in InstCombine rather than here because it would mean we need to create
a new shift instruction.
Differential Revision: http://reviews.llvm.org/D19874
llvm-svn: 269114
The plan is to eventually make this logic simpler, however I expect it to
be a little tricky for the foreseeable future (at least until we're rid of
pointee types), so move it here so that it can be reused to build a summary
index for devirtualization.
Differential Revision: http://reviews.llvm.org/D20005
llvm-svn: 269081
This removes a redundant stride versioning step (we already
do it in getPtrStride, so it has no effect) and uses PSE to
get the SCEV expressions for the source and destination
(this might have changed when getPtrStride was called).
I discovered this through code inspection, and couldn't
produce a regression test for it.
llvm-svn: 269052
Summary:
The idea is very close to what we do for assume intrinsics: we mark the
guard intrinsics as writing to arbitrary memory to maintain control
dependence, but under the covers we teach AA that they do not mod any
particular memory location.
Reviewers: chandlerc, hfinkel, gbiv, reames
Subscribers: george.burgess.iv, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19575
llvm-svn: 269007
We can use calls to @llvm.experimental.guard to prove predicates,
relying on the fact that in all locations domianted by a call to
@llvm.experimental.guard the predicate it is guarding is known to be
true.
llvm-svn: 268997
When we encounter unsafe memory dependencies, loop distribution could
help.
Even though, the diagnostics is in LAA, it's only currently emitted in
the vectorizer.
llvm-svn: 268987
When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations.
llvm-svn: 268921
A number of libcalls don't exist in any particular lib but are, instead,
defined in math.h as inline functions (even in C mode!). Don't rely on
their existence when lowering @llvm.{cos,sin,floor,..}.f32, promote them
instead.
N.B. We had logic to handle FREM but were missing out on a number of
others. This change generalizes the FREM handling.
llvm-svn: 268875
This test was crashing, and currently it breaks bootstrapping clang with debuginfo
Differential Revision: http://reviews.llvm.org/D20008
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268715
This message used to be correct, when all we cared about was whether the
dependence was safe (i.e. NoDep) or unsafe. With the current more
precise characterization, this is a forward dep.
llvm-svn: 268695
We assumed that ConstantVectors would be rather uninteresting from the
perspective of analysis. However, this is not the case due to a quirk
of how LLVM handles vectors of i1. Vectors of i1 are not
ConstantDataVectors like vectors of i8, i16, i32 or i64 because i1's
SizeInBits differs from it's StoreSizeInBytes. This leads to it being
categorized as a ConstantVector instead of a ConstantDataVector.
Instead, treat ConstantVector more uniformly.
This fixes PR27591.
llvm-svn: 268479
A loop pass that didn't preserve this entire set of passes wouldn't
play well with other loop passes, since these are generally a basic
requirement to do any interesting transformations to a loop.
Adds a helper to get the set of analyses a loop pass should preserve,
and checks that any loop pass we run satisfies the requirement.
llvm-svn: 268444
In the "LoopDispositions:" section:
- Instead of printing out a list, print out a "dictionary" to make it
obvious by inspection which disposition is for which loop. This is
just a cosmetic change.
- Print dispositions for parent _and_ sibling loops. I will use this
to write a test case.
llvm-svn: 268405
Summary
When a non-escaping pointer is compared to a global value, the
comparison can be folded even if the corresponding malloc/allocation
call cannot be elided.
We need to make sure the global value is not null, since comparisons to
null cannot be folded.
In future, we should also handle cases when the the comparison
instruction dominates the pointer escape.
Reviewers: sanjoy
Subscribers s.egerton, llvm-commits
Differential Revision: http://reviews.llvm.org/D19549
llvm-svn: 268390
We were overly cautious in our analysis of loops which have invokes
which unwind to EH pads. The loop unroll transform is safe because it
only clones blocks in the loop body, it does not try to split critical
edges involving EH pads. Instead, move the necessary safety check to
LoopUnswitch.
N.B. The safety check for loop unswitch is covered by an existing test
which fails without it.
llvm-svn: 268357
that it computes. Currently this is used for testing and precision
tuning, but it might be used by optimizations later.
Differential Revision: http://reviews.llvm.org/D19179
llvm-svn: 268291
As shown in the diff, we used to add to CFLAA's cache by doing
`Cache[Fn] = buildSetsFrom(Fn)`. `buildSetsFrom(Fn)` may cause `Cache`
to reallocate its underlying storage, if this happens and `Cache[Fn]`
was evaluated prior to `buildSetsFrom(Fn)`, then we'll store the result
to a bad address.
Patch by Jia Chen.
llvm-svn: 268269
There are currently some bugs in tree around SCEV caching an incorrect
loop disposition. Printing out loop dispositions will let us write
whitebox tests as those are fixed.
The dispositions are printed as a list in "inside out" order,
i.e. innermost loop first.
llvm-svn: 268177
matchSelectPattern attempts to see through casts which mask min/max
patterns from being more obvious. Under certain circumstances, it would
misidentify a sequence of instructions as a min/max because it assumed
that folding casts would preserve the result. This is not the case for
floating point <-> integer casts.
This fixes PR27575.
llvm-svn: 268086
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.
Differential Revision: http://reviews.llvm.org/D19469
llvm-svn: 267758
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Summary:
Refine the workaround from r266877 that attempts to prevent
renaming of locals in inline assembly, so that in addition to looking
for a llvm.used local value, that there is at least one inline assembly
call in the module. Otherwise, debug functions added to the llvm.used
can block importing/exporting unnecessarily.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19573
llvm-svn: 267717
Extract a part of isDereferenceableAndAlignedPointer functionality to Value::getPointerDerferecnceableBytes. Currently it's a NFC, but in future I'm going to accumulate all the logic about value dereferenceability in this function similarly to Value::getPointerAlignment function (D16144).
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17572
llvm-svn: 267708
This is required to use this function from isSafeToSpeculativelyExecute
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16231
llvm-svn: 267692
When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null. That's all well and good, but *then* we'd go recurse through our input blocks. As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain. The previous code papered over this by using the isKnownNonNull routine from value tracking. This made the duplication less painful in the common case.
Instead, we know do the block scan only *after* we've gotten the recursive results back. This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks. For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so.
Note that the case where we *can't* prove something non-null is still the really expensive case. We end up scanning each and every block looking for a dereference and never end up finding one.
llvm-svn: 267642
Previously we were recursing on our operands for unary and binary operators regardless of whether we knew how to reason about the operator in question. This has the effect of doing a potentially large amount of work, only to throw it away. By checking whether the operation is one LVI can handle, we can cut short the search and return the (overdefined) answer more quickly. The quality of the results produced should not change.
llvm-svn: 267626
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch builds on 267609 which did the same thing for unary casts.
llvm-svn: 267620
Essentially, I was using the wrong size function. For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert. I fixed this, and also added a guard that the input is a sized type. Test case is for the original mistake. I'm not sure how to actually exercise the sized type check.
llvm-svn: 267618
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.
This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations.
Differential Revision: http://reviews.llvm.org/D19492
llvm-svn: 267609
There has been much recent confusion about the partition in the lattice between constant and non-constant values. Hopefully, documenting this will prevent confusion going forward.
llvm-svn: 267440
This function handled both unary and binary operators. Cloning and specializing leads to much easier to follow code with minimal duplicatation.
llvm-svn: 267438
Summary:
This implements a new method of run-time checking the NoWrap
SCEV predicates, which should be easier to optimize and nicer
for targets that don't correctly handle multiplication/addition
of large integer types (like i128).
If the AddRec is {a,+,b} and the backedge taken count is c,
the idea is to check that |b| * c doesn't have unsigned overflow,
and depending on the sign of b, that:
a + |b| * c >= a (b >= 0) or
a - |b| * c <= a (b <= 0)
where the comparisons above are signed or unsigned, depending on
the flag that we're checking.
The advantage of doing this is that we avoid extending to a larger
type and we avoid the multiplication of large types (multiplying
i128 can be expensive).
Reviewers: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19266
llvm-svn: 267389
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Right now it only contains the LinkageType, but will be extended
with "hasSection", "isOptSize", "hasInlineAssembly", etc.
Differential Revision: http://reviews.llvm.org/D19404
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267319
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
Summary:
(... while still not using a PostDomTree)
The way we use isKnownNotFullPoison from SCEV today, the new CFG walking
logic will not trigger for any realistic cases -- it will kick in only
for situations where we could have merged the contiguous basic blocks
anyway[0], since the poison generating instruction dominates all of its
non-PHI uses (which are the only uses we consider right now).
However, having this change in place will allow a later bugfix to break
fewer llvm-lit tests.
[0]: i.e. cases where block A branches to block B and B is A's only
successor and A is B's only predecessor.
Reviewers: broune, bjarke.roune
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19212
llvm-svn: 267175
Summary:
Also adds a small comment blurb on control flow + no-wrap flags, since
that question came up a few days back on llvm-dev.
Reviewers: bjarke.roune, broune
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19209
llvm-svn: 267110
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
This change adds a couple of test cases to make sure FindAvailableLoadedValue does the right thing. At the moment, the code added is dead, but separating it makes follow on changes far more obvious.
llvm-svn: 266999
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:
```
A is
+, -, +/-
F F F + B is
T F ? -
? F ? +/-
```
The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.
There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.
llvm-svn: 266939
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
This patch improves SimplifyCFG to catch cases like:
if (a < b) {
if (a > b) <- known to be false
unreachable;
}
Phabricator Revision: http://reviews.llvm.org/D18905
llvm-svn: 266767
Rather than checking for the SCEV type prior to calling
getContantPart, perform the checks in the function. This reduces
the number of places where the checks are needed.
Differential Revision: http://reviews.llvm.org/D19241
llvm-svn: 266759
Summary:
Need to use predecessors for reverse graph, successors for forward graph.
succ_iterator/pred_iterator are not compatible, this patch is all the work necessary to work around that (which is what everywhere else does). Not sure if there is a better way, so cc'ing some random folks to take a gander :)
Reviewers: dblaikie, qcolombet, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18796
llvm-svn: 266718
Summary:
Calls to @llvm.experimental.deoptimize are expected to "never execute",
so optimize them as such.
Reviewers: chandlerc
Subscribers: junbuml, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19095
llvm-svn: 266654
This reverts commit r266477.
This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".
llvm-svn: 266619
Removed some unused headers, replaced some headers with forward class declarations.
Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'
Patch by Eugene Kosov <claprix@yandex.ru>
Differential Revision: http://reviews.llvm.org/D19219
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.
Differential Revision: http://reviews.llvm.org/D18622
llvm-svn: 266477
Summary:
InlineCost's threshold is multiplied by this value. This lets us adjust
the inlining threshold up or down on a per-target basis. For example,
we might want to increase the threshold on targets where calls are
unusually expensive.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18560
llvm-svn: 266405
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.
This fixes PR27262.
Differential Revision: http://reviews.llvm.org/D18939
llvm-svn: 266381
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.
This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.
This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex
Reviewers: arsenm, tstellarAMD, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19013
llvm-svn: 266347
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18438
llvm-svn: 266193
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.
This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.
The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.
Differential Revision: http://reviews.llvm.org/D14933
llvm-svn: 266032
Summary:
This is the first step in also serializing the index out to LLVM
assembly.
The per-module summary written to bitcode is moved out of the bitcode
writer and to a new analysis pass (ModuleSummaryIndexWrapperPass).
The pass itself uses a new builder class to compute index, and the
builder class is used directly in places where we don't have a pass
manager (e.g. llvm-as).
Because we are computing summaries outside of the bitcode writer, we no
longer can use value ids created by the bitcode writer's
ValueEnumerator. This required changing the reference graph edge type
to use a new ValueInfo class holding a union between a GUID (combined
index) and Value* (permodule index). The Value* are converted to the
appropriate value ID during bitcode writing.
Also, this enables removal of the BitWriter library's dependence on the
Analysis library that was previously required for the summary computation.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18763
llvm-svn: 265941
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 265912
Summary:
The llvm cos intrinsic currently does not propagate undef's. This change
transforms cos(undef) to null value or 0.
There are 2 test cases added as well.
Patch by Anna Thomas!
Reviewers: sanjoy
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D18863
llvm-svn: 265825
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.
Original commit message:
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265786
Summary:
Fixes PR26774.
If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".
Motivation:
I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard. So transforming:
```
void f(unsigned x) {
unsigned t = 5 / x;
(void)t;
}
```
to
```
void f(unsigned x) { }
```
is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).
Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM. For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).
Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have. This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.
For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store. As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal. The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal. Such a
refined variant will look like it is `readonly`. However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.
Note: this is not just a problem with atomics or with linking
differently optimized object files. See PR26774 for more realistic
examples that involved neither.
This patch:
This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time. It then changes a set of IPO passes to bail out if they see
such a function.
Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18634
llvm-svn: 265762
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.
The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).
This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.
As a follow-up I'll add:
bool operator<(AtomicOrdering, AtomicOrdering) = delete;
bool operator>(AtomicOrdering, AtomicOrdering) = delete;
bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.
Reviewers: jyknight, reames
Subscribers: jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D18775
llvm-svn: 265602