Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.
Reviewers: srhines, pirama
Reviewed By: srhines
Subscribers: kongyi, chh, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42288
llvm-svn: 323187
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165
llvm-svn: 323077
Summary:
In ModRefInfo "Must" was introduced to track presence of MustAlias, but we still want to return NoModRef when there is neither Mod or Ref, even when MustAlias is found. Patch has small fixes to ensure this happens.
Minor cleanup to remove nesting for 2 if statements when calling getModRefInfo for 2 ImmutableCallSites.
Reviewers: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42209
llvm-svn: 322932
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
An alternative (and probably better) fix would be that of
making `Scale` an APInt, and there's a patch floating around
to do this. As we're still discussing it, at least stop crashing
in the meanwhile (added bonus, we now have a regression test for
this situation).
Fixes PR35843.
Thanks to Eli for suggesting the fix and Simon for reporting and
reducing the bug.
llvm-svn: 322467
This doesn't handle the more complicated case in the bug report yet:
https://bugs.llvm.org/show_bug.cgi?id=35790
For that, we have to match / look through a cast.
llvm-svn: 322327
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on
smaller canonicalizations to get us to the goal.
As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.
We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603
An Alive proof for one variant of the pattern to show that the
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1
Name: min3_nots
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%cmpxyz = icmp slt i8 %minxz, %ny
%r = select i1 %cmpxyz, i8 %minxz, i8 %ny
Name: min3_nots_alt
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%xz = icmp sgt i8 %x, %z
%maxxz = select i1 %xz, i8 %x, i8 %z
%xyz = icmp sgt i8 %maxxz, %y
%maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
%r = xor i8 %maxxyz, -1
llvm-svn: 322283
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).
This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.
I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.
Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: uabelho, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D41898
llvm-svn: 322181
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.
As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578
llvm-svn: 322058
This patch was part of:
https://reviews.llvm.org/D41338
...but we can expose the bug in IR via constant propagation
as shown in the test. Unless the triple includes 'linux', we
should not fold these because the functions don't exist on
other platforms (yet?).
llvm-svn: 322010
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
Summary:
I basically copied this patch from here:
https://reviews.llvm.org/D1251
But I skipped some of the refactoring to make the patch more clean.
The new outer3/inner3 test case in ptr-diff.ll triggers the
following assert without this patch:
lib/IR/Constants.cpp:1834: static llvm::Constant *llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant *, llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed.
The other new test cases makes sure that there is code coverage
for all modifications in InlineCost.cpp (getting different values
due to not fetching sizes for address space zero). I only guarantee
code coverage for those tests. The tests are not written in a way
that they would break if not having the corrections in
InlineCost.cpp. I found it quite hard to fine tune the tests into
getting different results based on the pointer sizes (except for
the test case where we hit an assert if not teaching InlineCost
about address spaces).
Reviewers: chandlerc, arsenm, haicheng
Reviewed By: arsenm
Subscribers: wdng, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D40455
llvm-svn: 321809
In one case, we were handling out of bounds, but not undef indices. In the other, we were handling undef (with the comment making the analogy to out of bounds), but not out of bounds. Be consistent and treat both undef and constant out of bounds indices as producing undefined results.
As a side effect, this also protects instcombine from having to handle large constant indices as we always simplify first.
llvm-svn: 321575
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.
Reduced test case:
#include <assert.h>
#include <stdio.h>
#include <string>
#include <utility>
#include <vector>
void do_push_back(
std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
}
int __attribute__((optnone)) main() {
// Put some data in the vector and then remove it so we take the push_back
// fast path.
std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
crl_set.push_back({"asdf", {}});
crl_set.pop_back();
printf("first word in vector storage: %p\n", *(void**)crl_set.data());
// Do the push_back which may fail to initialize the data.
do_push_back(&crl_set);
auto* first = &crl_set.back().first;
printf("first word in vector storage (should be zero): %p\n",
*(void**)crl_set.data());
assert(first->empty());
puts("ok");
}
Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib
This program will assert with this change.
llvm-svn: 321510
Summary:
When using byval, the data is effectively copied as part of the call
anyway, so we aren't actually passing the pointer and thus there is no
reason to issue a warning.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40118
llvm-svn: 321478
InsertBinop tries to find an appropriate instruction instead of
creating a new instruction. When it checks whether instruction is
the same as we need to create it ignores nuw/nsw/exact flags.
It leads to invalid behavior when poison instruction can be used
when it was not expected. Specifically, for example Expander
expands the SCEV built for instruction
%a = add i32 %v, 1
It is possible that InsertBinop can find an instruction
% b = add nuw nsw i32 %v, 1
and will use it instead of version w/o nuw nsw.
It is incorrect.
The patch conservatively ignores all instructions with any of
poison flags installed.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41576
llvm-svn: 321475
This is fix for the crash caused by ScalarEvolution::getTruncateExpr.
It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.
However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.
So we must always check whether SCEV is in cache and do not insert item if it is already there.
Reviewers: sanjoy, mkazantsev, craig.topper
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380
llvm-svn: 321472
This is a preliminary step for the patch discussed in D41136 (and denoted here with the FIXME comment).
When we match an FP min/max that is cast to integer, any intermediate difference between +0.0 or -0.0
should be muted in the result by the conversion (either fptosi or fptoui) of the result. Thus, we can
enable 'nsz' for the purpose of matching fmin/fmax.
Note that there's probably room to generalize this more, possibly by fixing the current calls to the
weak version of isKnownNonZero() in matchSelectPattern() to the more powerful recursive version.
Differential Revision: https://reviews.llvm.org/D41333
llvm-svn: 321456
Summary:
Make MemorySSA allow reordering of two loads that may alias, when one is volatile.
This makes MemorySSA less conservative and behaving the same as the AliasSetTracker.
For more context, see D16875.
LLVM language reference: "The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior."
Reviewers: george.burgess.iv, dberlin
Subscribers: sanjoy, reames, hfinkel, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D41525
llvm-svn: 321382
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.
This patch checks for the long dependence chain, and give up if-conversion if find one.
Differential Revision: https://reviews.llvm.org/D39352
llvm-svn: 321377
Currently, inline cost model considers a binary operator as free only if both
its operands are constants. Some simple cases are missing such as a + 0, a - a,
etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without
going through simplifyInstruction() to get rid of the constant restriction.
Thus, visitAnd() and visitOr() are not needed.
Differential Revision: https://reviews.llvm.org/D41494
llvm-svn: 321366
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.
While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.
Differential Revision: https://reviews.llvm.org/D41522
llvm-svn: 321332
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331
Summary:
Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases.
Shift existing Mod/Ref/ModRef values to include an additional most
significant bit. Update wrappers that modify ModRefInfo values to
reflect the change.
Notes:
* ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it.
* Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA).
* GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost.
* There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location.
Reviewers: dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38862
llvm-svn: 321309
Summary:
The function section prefix for PGO based layout (e.g. hot/unlikely)
should look at the hotness of all blocks not just the entry BB.
A function with a cold entry but a very hot loop should be placed in the
hot section, for example, so that it is located close to other hot
functions it may call. For SamplePGO it was already looking at the
branch weights on calls, and I made that code conditional on whether
this is SamplePGO since it was essentially a noop for instrumentation
PGO anyway.
Reviewers: davidxl
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41395
llvm-svn: 321197
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
This is r319482 and r319483, along with fixes for PR35519: fix the
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.
Fixes PR28958 and PR35519.
Differential Revision: https://reviews.llvm.org/D40802
llvm-svn: 321138
There are cases when two tags with different base types denote
accesses to the same direct or indirect member of a structure
type. Currently, merging of such tags results in a tag that
represents an access to an object that has the type of that
member. This patch changes this so that if one of the accesses
encloses the other, then the generic tag is the one of the
enclosed access.
Differential Revision: https://reviews.llvm.org/D39557
llvm-svn: 321019
Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation.
Patch by Surya Kumari Jangala!
Differential Revision: https://reviews.llvm.org/D40886
llvm-svn: 320983
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:
sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))
Patch by Paul Walker, thanks!
Differential Revision: https://reviews.llvm.org/D40069
llvm-svn: 320901
SROA analysis of InlineCost can figure out that some stores can be removed
after inlining and then the repeated loads clobbered by these stores are also
free. This patch finds these clobbered loads and adjust the inline cost
accordingly.
Differential Revision: https://reviews.llvm.org/D33946
llvm-svn: 320814
We cannot move the insertion point to header if SCEV contains div/rem
operations due to they may go over check for zero denominator.
Reviewers: sanjoy, mkazantsev, sebpop
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41229
llvm-svn: 320789
Most of the -Wsign-compare warnings are due to the fact that
enums are signed by default in the MS ABI, while the
tautological comparison warnings trigger on x86 builds where
sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max()
is always false.
Differential Revision: https://reviews.llvm.org/D41256
llvm-svn: 320750
Summary:
The function is meant to recurse until it comes upon the
phi it's looking for. However, with the current condition,
it will recurse until it finds anything _but_ the phi.
The function will even fail for simple cases like:
%i = phi i32 [ %inc, %loop ], ...
...
%inc = add i32 %i, 1
because the base condition will not happen when the phi
is recursed to, and the recursion will end with a 'false'
result since the previous instruction is a phi.
Reviewers: sanjoy, atrick
Reviewed By: sanjoy
Subscribers: Ka-Ka, bjope, llvm-commits
Committing on behalf of: Bevin Hansson (bevinh)
Differential Revision: https://reviews.llvm.org/D40946
llvm-svn: 320700
This patch fix this FIXME in visitPHI()
FIXME: We should potentially be tracking values through phi nodes,
especially when they collapse to a single value due to deleted CFG edges
during inlining.
Differential Revision: https://reviews.llvm.org/D38594
llvm-svn: 320699
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
This is the last step in addressing PR30654.
llvm-svn: 320672
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mgrang, dcaballe, hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 320548
CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate
which allows the PSCEV rewriter to rewrite this scev expression:
(zext i8 {0, + , (trunc i32 step to i8)} to i32)
into
{0, +, (sext i8 (trunc i32 step to i8) to i32)}
But then it adds the wrong Equal predicate:
%step == (zext i8 (trunc i32 %step to i8) to i32).
instead of:
%step == (sext i8 (trunc i32 %step to i8) to i32)
This is fixed here.
Differential Revision: https://reviews.llvm.org/D40641
llvm-svn: 320298
When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible.
Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones.
Patch by: Pedro Ferreira
Reviewers: craig.topper, sanjoy, efriedma
Differential Revision: https://reviews.llvm.org/D34029
llvm-svn: 320269
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.
Abandon this patch. There is a more general solution
which also handles runtime infinite loop (but not statically).
llvm-svn: 320180
In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate
by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke
`computeShiftCompareExitLimit` with Values from which the SCEVs have been derived,
these Values have not been modified while `Cond` could be.
One of possible outcomes of this is that we may falsely prove that an infinite loop ends
within some finite number of iterations.
In this patch, we save the original `Cond` and pass it along with original operands.
This logic may be removed in future once `computeShiftCompareExitLimit` works
with SCEVs instead of value operands.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40953
llvm-svn: 320142
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
Summary:
An undef extract index can be arbitrarily chosen to be an
out-of-range index value, which would result in the instruction being undef.
This change closes a gap identified while working on lowering vector permute intrinsics
with variable index vectors to pure LLVM IR.
Reviewers: arsenm, spatel, majnemer
Reviewed By: arsenm, spatel
Subscribers: fhahn, nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D40231
llvm-svn: 319910
Lexicographical comparison of SCEV trees is potentially expensive for big
expression trees. We can define ordering between them for AddRecs and
N-ary operations by SCEV NoWrap flags to make non-equality check
cheaper.
This change does not prevent grouping eqivalent SCEVs together and is
not supposed to have any meaningful impact on behavior of any transforms.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40645
llvm-svn: 319889
Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s:
every time it sees one, it creates a new value cache and tries to prove equality of two values using it.
This cache reallocates and gets lost from SCEV to SCEV.
This patch changes this behavior: now we create one cache for all values and share it between SCEVs.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40597
llvm-svn: 319880
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
llvm-svn: 319873
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.
Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.
Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel
Subscribers: nlopes, llvm-commits
Differential Revision: https://reviews.llvm.org/D40749
llvm-svn: 319821
Summary:
I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only
hides the problem[0]. The real issue is that because of how we forget SCEV
expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494
computing the backedge for any loop invalidates the trip count for every other
loop. This effectively makes the SCEV cache useless.
I've instead made the SCEV expression invalidation in
ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue.
[0]: One way to think about this is that rL309080 essentially augmented the
backedge-taken-count cache with another equivalent exit-limit cache. The bug
went away because we were explicitly not clearing the exit-limit cache in
getBackedgeTakenInfo. But instead of doing all of that, we can just avoid
clearing the backedge-taken-count cache.
Reviewers: mkazantsev, mzolotukhin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D39361
llvm-svn: 319678
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
Re-commit after fixing the failing testcase in rL319576, rL319577 and
rL319578.
llvm-svn: 319581
Summary:
Adding support for -print-module-scope similar to how it is
being done for function passes. This option causes loop-pass printer
to emit a whole-module IR instead of just a loop itself.
Reviewers: sanjoy, silvas, weimingz
Reviewed By: sanjoy
Subscribers: apilipenko, skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D40247
llvm-svn: 319566
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
llvm-svn: 319556
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
Fixes PR28958.
Differential Revision: https://reviews.llvm.org/D38374
llvm-svn: 319482
Currently, we use a set of pairs to cache responces like `CompareValueComplexity(X, Y) == 0`. If we had
proved that `CompareValueComplexity(S1, S2) == 0` and `CompareValueComplexity(S2, S3) == 0`,
this cache does not allow us to prove that `CompareValueComplexity(S1, S3)` is also `0`.
This patch replaces this set with `EquivalenceClasses` that merges Values into equivalence sets so that
any two values from the same set are equal from point of `CompareValueComplexity`. This, in particular,
allows us to prove the fact from example above.
Differential Revision: https://reviews.llvm.org/D40429
llvm-svn: 319153
Currently, we use a set of pairs to cache responces like `CompareSCEVComplexity(X, Y) == 0`. If we had
proved that `CompareSCEVComplexity(S1, S2) == 0` and `CompareSCEVComplexity(S2, S3) == 0`,
this cache does not allow us to prove that `CompareSCEVComplexity(S1, S3)` is also `0`.
This patch replaces this set with `EquivalenceClasses` any two values from the same set are equal from
point of `CompareSCEVComplexity`. This, in particular, allows us to prove the fact from example above.
Differential Revision: https://reviews.llvm.org/D40428
llvm-svn: 319149
Summary:
For a given loop, getLoopLatch returns a non-null value
when a loop has only one latch block. In the modified
context adding an assertion to check that both the outgoing branches of
a terminator instruction (of latch) does not target same header.
+
few minor code reorganization.
Reviewers: jbhateja
Reviewed By: jbhateja
Subscribers: sanjoy
Differential Revision: https://reviews.llvm.org/D40460
llvm-svn: 318997
Summary:
For a given loop, getLoopLatch returns a non-null value
when a loop has only one latch block. In the modified
context a check on both the outgoing branches of a terminator instruction (of latch) to same header is redundant.
Reviewers: jbhateja
Reviewed By: jbhateja
Subscribers: sanjoy
Differential Revision: https://reviews.llvm.org/D40460
llvm-svn: 318991
Summary:
Loop-pass printing is somewhat deficient since it does not provide the
context around the loop (e.g. preheader). This context information becomes
pretty essential when analyzing transformations that move stuff out of the loop.
Extending printLoop to cover preheader and exit blocks (if any).
Reviewers: sanjoy, silvas, weimingz
Reviewed By: sanjoy
Subscribers: apilipenko, skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D40246
llvm-svn: 318878
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.
It can lead, for exaple, to attempt of incorrect folding. Consider:
AR1 = {a,+,b}<L1>
AR2 = {c,+,d}<L2>
EXAR2 = sext(AR1)
MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.
Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.
Differential Revision: https://reviews.llvm.org/D39453
llvm-svn: 318819
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.
New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.
Reviewers: sanjoy, davide, gberry
Subscribers: mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D40274
llvm-svn: 318772
The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions:
FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans)
FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) | isnan(Y)
So we can simplify patterns like this:
(fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y
(fcmp uno (known NNAN), X) || (fcmp uno X, Y) --> fcmp uno X, Y
It might be better to split this into (X uno 0) | (Y uno 0) as a canonicalization, but that
would be another patch.
Differential Revision: https://reviews.llvm.org/D40130
llvm-svn: 318627
making it no longer even remotely simple.
The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.
The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:
1) Fully unswitch a branch or switch instruction from inside of a loop to
outside of it.
2) Update the CFG and IR. This avoids needing to "remember" the
unswitched branches as well as avoiding excessively cloning and
reliance on complex parts of simplify-cfg to cleanup the cfg.
3) Update the analyses (where we can) rather than just blowing them away
or relying on something else updating them.
Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I will need to make
another attempt to do this now that we have a nice dynamic update API
for dominators. However, we do adhere to #3 w.r.t. LoopInfo.
This approach also adds an important principls specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of the total loop
size.
Some remaining issues that I will be addressing in subsequent commits:
- Handling unstructured control flow.
- Unswitching 'switch' cases instead of just branches.
- Moving to the dynamic update API for dominators.
Some high-level, interesting limitationsV that folks might want to push
on as follow-ups but that I don't have any immediate plans around:
- We could be much more clever about not cloning things that will be
deleted. In fact, we should be able to delete *nothing* and do
a minimal number of clones.
- There are many more interesting selection criteria for which branch to
unswitch that we might want to look at. One that I'm interested in
particularly are a set of conditions which all exit the loop and which
can be merged into a single unswitched test of them.
Differential revision: https://reviews.llvm.org/D34200
llvm-svn: 318549
The assertion was introduced in r317853 but there are cases when a call
isn't handled either as direct or indirect. In this case we add a
reference graph edge but not a call graph edge.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D40056
llvm-svn: 318540
This function checks that:
1) It is safe to expand a SCEV;
2) It is OK to materialize it at the specified location.
For example, attempt to expand a loop's AddRec to the same loop's preheader should fail.
Differential Revision: https://reviews.llvm.org/D39236
llvm-svn: 318377
Summary:
This fixes PR35241.
When using byval, the data is effectively copied as part of the call
anyway, so the pointer returned by the alloca will not be leaked to the
callee and thus there is no reason to issue a warning.
Reviewers: rnk
Reviewed By: rnk
Subscribers: Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D40009
llvm-svn: 318279
I don't believe this was a problem in practice, as it's likely that the
boolean wasn't checked unless the backend condition was non-null.
llvm-svn: 318073
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
This change doesn't fix the root cause of the miscompile PR34966 as the root
cause is in the linker ld64. This change makes call graph more complete
allowing to have better module imports/exports.
rdar://problem/35344706
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39356
llvm-svn: 317853
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].
Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.
As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.
[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html
Differential Revision: https://reviews.llvm.org/D38336
llvm-svn: 317729
There are cases when we have to merge TBAA access tags with the
same base access type, but different final access types. For
example, accesses to different members of the same structure may
be vectorized into a single load or store instruction. Since we
currently assume that the tags to merge always share the same
final access type, we incorrectly return a tag that describes an
access to one of the original final access types as the generic
tag. This patch fixes that by producing generic tags for the
common type and not the final access types of the original tags.
Resolves:
PR35225: Wrong tbaa metadata after load store vectorizer due to
recent change
https://bugs.llvm.org/show_bug.cgi?id=35225
Differential Revision: https://reviews.llvm.org/D39732
llvm-svn: 317682
My fix is conservative and will make us return may-alias instead.
The test case is:
check(gep(x, 0), n, gep(x, n), -1) with n == sizeof(x)
Here, the first value accesses the whole object, but the second access
doesn't access anything. The semantics of -1 is read until the end of the
object, which in this case means read nothing.
No test case, since isn't trivial to exploit this one, but I've proved it correct.
llvm-svn: 317680
As discussed in D39204, this is effectively a revert of rL265521 which required nnan
to vectorize sqrt libcalls based on the old LangRef definition of llvm.sqrt. Now that
the definition has been updated so the libcall and intrinsic have the same semantics
apart from potentially setting errno, we can remove the nnan requirement.
We have the right check to know that errno is not set:
if (!ICS.onlyReadsMemory())
...ahead of the switch.
This will solve https://bugs.llvm.org/show_bug.cgi?id=27435 assuming that's being
built for a target with -fno-math-errno.
Differential Revision: https://reviews.llvm.org/D39642
llvm-svn: 317519
single-iteration loop
This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that
Stride >= Trip-Count. Such a predicate will effectively optimize a single
or zero iteration loop, as Trip-Count <= Stride == 1.
Differential Revision: https://reviews.llvm.org/D38785
llvm-svn: 317438
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374
This patch combines the code that matches and merges TBAA access
tags. The aim is to simplify future changes and making sure that
these operations produce consistent results.
Differential Revision: https://reviews.llvm.org/D39463
llvm-svn: 317311
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.
The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.
This patch is a basic support for this.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39028
llvm-svn: 317278
Summary:
Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.
Reviewers: dexonsmith, davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D39385
llvm-svn: 317094
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. The
first patch was reverted because it caused miscompile in NVPTX target.
Added corresponding test cases.
Reviewers: spatel, majnemer, efriedma, reames
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D39240
llvm-svn: 316795
Max backedge taken count is always expected to be a constant; and this is
usually true by construction -- it is a SCEV expression with constant inputs.
However, if the max backedge expression ends up being computed to be a udiv with
a constant zero denominator[0], SCEV does not fold the result to a constant
since there is no constant it can fold it to (SCEV has no representation for
"infinity" or "undef").
However, in computeMaxBECountForLT we already know the denominator is positive,
and thus at least 1; and we can use this fact to avoid dividing by zero.
[0]: We can end up with a constant zero denominator if the signed range of the
stride is more precise than the unsigned range.
llvm-svn: 316615
This patch allows SCEVFindUnsafe algorithm to tread division by any non-positive
value as safe. Previously, it could only recognize non-zero constants.
Differential Revision: https://reviews.llvm.org/D39228
llvm-svn: 316568
Summary:
Memory dependence analysis no longer counts DbgInfoIntrinsics towards the
limit where to abort the analysis. Before, a bunch of calls to dbg.value
could affect the generated code, meaning that with -g we could generate
different code than without.
Reviewers: chandlerc, Prazek, davide, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39181
llvm-svn: 316551
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.
Differential Revision: https://reviews.llvm.org/D39026
llvm-svn: 316495
Neither of these cases really require a temporary APInt outside the loop. For the ConstantDataSequential case the APInt will never be larger than 64-bits so its fine to just call getElementAsAPInt. For ConstantVector we can get the APInt by reference and only make a copy where the inversion is needed.
llvm-svn: 316265
(recommit #2 after checking for timeout issue).
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames, hfinkel
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 316208
Added check that type of CmpConst and source type of trunc are equal
for correct matching of the case when we can set widened C constant
equal to CmpConstant.
%cond = cmp iN %x, CmpConst
%tr = trunc iN %x to iK
%narrowsel = select i1 %cond, iK %t, iK C
Patch by: Gainullin, Artur <artur.gainullin@intel.com>
llvm-svn: 316082
Summary:
When we have the following case:
%cond = cmp iN %x, CmpConst
%tr = trunc iN %x to iK
%narrowsel = select i1 %cond, iK %t, iK C
We could possibly match only min/max pattern after looking through cast.
So it is more profitable if widened C constant will be equal CmpConst.
That is why just set widened C constant equal to CmpConst, because there
is a further check in this function that trunc CmpConst == C.
Also description for lookTroughCast function was added.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38536
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 316070
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 316054
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. This
change breaks the canonical form of cmp inside the matchMinMax function,
that is why additional checks for compare predicates is needed. Added
corresponding test cases.
Reviewers: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38531
Patch by: Artur Gainullin <artur.gainullin@intel.com>
llvm-svn: 315992
This reverts commit r315713. It causes PR34968.
I think I know what the problem is, but I don't think I'll have time to fix it
this week.
llvm-svn: 315962
This avoid code duplication and allow us to add the disable unroll metadata elsewhere.
Differential Revision: https://reviews.llvm.org/D38928
llvm-svn: 315850
This patch moves some common utility functions out of IPSCCP and makes them
available globally. The functions determine if interprocedural data-flow
analyses can propagate information through function returns, arguments, and
global variables.
Differential Revision: https://reviews.llvm.org/D37638
llvm-svn: 315719
Summary:
This change uses the loop use list added in the previous change to remember the
loops that appear in the trip count expressions of other loops; and uses it in
forgetLoop. This lets us not scan every loop in the function on a forgetLoop
call.
With this change we no longer invalidate clear out backedge taken counts on
forgetValue. I think this is fine -- the contract is that SCEV users must call
forgetLoop(L) if their change to the IR could have changed the trip count of L;
solely calling forgetValue on a value feeding into the backedge condition of L
is not enough. Moreover, I don't think we can strengthen forgetValue to be
sufficient for invalidating trip counts without significantly re-architecting
SCEV. For instance, if we have the loop:
I = *Ptr;
E = I + 10;
do {
// ...
} while (++I != E);
then the backedge taken count of the loop is 9, and it has no reference to
either I or E, i.e. there is no way in SCEV today to re-discover the dependency
of the loop's trip count on E or I. So a SCEV client cannot change E to (say)
"I + 20", call forgetValue(E) and expect the loop's trip count to be updated.
Reviewers: atrick, sunfish, mkazantsev
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38435
llvm-svn: 315713
Summary:
This patch teaches SCEV to calculate the maxBECount when the end bound
of the loop can vary. Note that we cannot calculate the exactBECount.
This will only be done when both conditions are satisfied:
1. the loop termination condition is strictly LT.
2. the IV is proven to not overflow.
This provides more information to users of SCEV and can be used to
improve identification of finite loops.
Reviewers: sanjoy, mkazantsev, silviu.baranga, atrick
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38825
llvm-svn: 315683
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)
I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.
llvm-svn: 315680
Summary:
Currently we do not correctly invalidate memoized results for add recurrences
that were created directly (i.e. they were not created from a `Value`). This
change fixes this by keeping loop use lists and using the loop use lists to
determine which SCEV expressions to invalidate.
Here are some statistics on the number of uses of in the use lists of all loops
on a clang bootstrap (config: release, no asserts):
Count: 731310
Min: 1
Mean: 8.555150
50th %time: 4
95th %tile: 25
99th %tile: 53
Max: 433
Reviewers: atrick, sunfish, mkazantsev
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38434
llvm-svn: 315672
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
llvm-svn: 315590
This patch fixes the bug introduced in https://reviews.llvm.org/D35907; the bug is reported by http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20171002/491452.html.
Before D35907, when GetUnderlyingObjects fails to find an identifiable object, allMMOsOkay lambda in getUnderlyingObjectsForInstr returns false and Objects vector is cleared. This behavior is unintentionally changed by D35907.
This patch makes the behavior for such case same as the previous behavior.
Since D35907 introduced a wrapper function getUnderlyingObjectsForCodeGen around GetUnderlyingObjects, getUnderlyingObjectsForCodeGen is modified to return a boolean value to ask the caller to clear the Objects vector.
Differential Revision: https://reviews.llvm.org/D38735
llvm-svn: 315565
Summary:
This patch fixes an error in the patch to ScalarEvolution::createAddRecFromPHIWithCastsImpl
made in D37265. In that patch we handle the cases where the either the start or accum values can be
zero after truncation. But, we assume that the start value must be a constant if the accum is
zero. This is clearly an erroneous assumption. This change removes that assumption.
Reviewers: sanjoy, dorit, mkazantsev
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38814
llvm-svn: 315491
parameterized emit() calls
Summary: This is not functional change to adopt new emit() API added in r313691.
Reviewed By: anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38285
llvm-svn: 315476
AbstractLatticeFunction and SparseSolver are class templates parameterized by a
lattice value, so we need to move these member functions over to the header.
Differential Revision: https://reviews.llvm.org/D38561
llvm-svn: 314996
Recommitting r314517 with the fix for handling ConstantExpr.
Original commit message:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
mode in the target. However, since it doesn't check its actual users, it will
return FREE even in cases where the GEP cannot be folded away as a part of
actual addressing mode. For example, if an user of the GEP is a call
instruction taking the GEP as a parameter, then the GEP may not be folded in
isel.
llvm-svn: 314923
Before the patch this was in Analysis. Moving it to IR and making it implicit
part of LLVMContext::diagnose allows the full opt-remark facility to be used
outside passes e.g. the pass manager. Jessica is planning to use this to
report function size after each pass. The same could be used for time
reports.
Tested with BUILD_SHARED_LIBS=On.
llvm-svn: 314909
Test needs some slight adjustment because we no longer check the existence of
BFI but rather that the actual hotness is set on the remark. If entry_count
is not set getBlockProfileCount returns None.
llvm-svn: 314874
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314824
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314806
The code responsible for analysis of inbounds GEPs is extracted into a separate
function: CallAnalyzer::canFoldInboundsGEP. With the patch SROA
enabling/disabling code is localized at one place instead of spreading across
the code of CallAnalyzer::visitGetElementPtr.
Differential Revision: https://reviews.llvm.org/D38233
llvm-svn: 314787
Summary:
When checking if a constant expression is a noop cast we fetched the
IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can
be cases where V doesn't return a pointer, and then getIntPtrType()
triggers an assertion.
Now we pass DataLayout to isNoopCast so the method itself can determine
what the IntPtrType is.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D37894
llvm-svn: 314763
Call ConstantFoldSelectInstruction() to fold cases like below
select <2 x i1><i1 true, i1 false>, <2 x i8> <i8 0, i8 1>, <2 x i8> <i8 2, i8 3>
All operands are constants and the condition has mixed true and false conditions.
Differential Revision: https://reviews.llvm.org/D38369
llvm-svn: 314741
Summary:
This avoids using void * as the type of the lattice value and ugly casts needed to make that happen.
(If folks want to use references, etc, they can use a reference_wrapper).
Reviewers: davide, mssimpso
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38476
llvm-svn: 314734
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
Summary:
This allows sharing the lattice value code between LVI and SCCP (D36656).
It also adds a `satisfiesPredicate` function, used by D36656.
Reviewers: davide, sanjoy, efriedma
Reviewed By: sanjoy
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37591
llvm-svn: 314411
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
InlineCost can understand Select IR now. This patch finds free Select IRs and
continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and
SROAArgValues.
Differential Revision: https://reviews.llvm.org/D37198
llvm-svn: 314307
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.
Remove the unused Instruction* parameter.
Differential Revision: https://reviews.llvm.org/D38165
llvm-svn: 314096
Summary:
A SCEV such as:
{%v2,+,((-1 * (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop>
can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr()
will not try to apply the simplification m*trunc(x)+n*trunc(y) -> trunc(trunc(m)*x+trunc(n)*y)
because it only keys off having a non-multiplied trunc as the first term in the simplification.
This patch generalizes this code to try to do a more generic fold of these trunc
expressions.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37888
llvm-svn: 313988
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313781
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Subscribers: mzolotukhin
Reviewed By: ayal
Differential Revision: https://reviews.llvm.org/D36130
Review comments updated accordingly
Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
Added a TODO for sortLoadAccesses API
Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
Modified the TODO for sortLoadAccesses API
Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
Review comment update for using OpdNum to insert the mask in respective location
Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
Commit after rebase for patch D36130
Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
Summary:
With this change:
- Methods in LoopBase trip an assert if the receiver has been invalidated
- LoopBase::clear frees up the memory held the LoopBase instance
This change also shuffles things around as necessary to work with this stricter invariant.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38055
llvm-svn: 313708
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
This should bring signed div/rem analysis up to the same level as unsigned.
We use icmp simplification to determine when the divisor is known greater than the dividend.
Each positive test is followed by a negative test to show that we're not overstepping the boundaries of the known bits.
There are extra tests for the signed-min-value special cases.
Alive proofs:
http://rise4fun.com/Alive/WI5
Differential Revision: https://reviews.llvm.org/D37713
llvm-svn: 313264
The idea to make an 'isDivZero' helper was suggested for the signed case in D37713:
https://reviews.llvm.org/D37713
This clean-up makes it clear that D37713 is just filling the gap for signed div/rem,
removes unnecessary code, and allows us to remove a bit of duplicated code from the
planned improvement in D37713.
llvm-svn: 313261
invalidated SCCs even when we do not have an updated SCC to redirect
towards.
This comes up in a fairly subtle and surprising circumstance: we need to
have a connected but internal node in the call graph which later becomes
a disconnected island, and then gets deleted. All of this needs to
happen mid-CGSCC walk. Because it is disconnected, we have no way of
computing a new "current" SCC when it gets deleted. Instead, we need to
explicitly check for a deleted "current" SCC and bail out of the current
CGSCC step. This will bubble all the way up to the post-order walk and
then resume correctly.
I've included minimal tests for this bug. The specific behavior
matches something we've seen in the wild with the new PM combined with
ThinLTO and sample PGO, but I've not yet confirmed whether this is the
only issue there.
llvm-svn: 313242
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.
Differential Revision: https://reviews.llvm.org/D37507
llvm-svn: 313237
Summary:
Full inline cost is computed when -inline-cost-full is true or ORE is
non-null. This patch adds another way to compute full inline cost by
adding a field to InlineParams. This will be used by SampleProfileLoader
to check legality of inlining a callee that it wants to inline.
Reviewers: danielcdh, haicheng
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37819
llvm-svn: 313185
Summary:
Added text options to -pgo-view-counts and -pgo-view-raw-counts that dump block frequency and branch probability info in text.
This is useful when the graph is very large and complex (the dot command crashes, lines/edges too close to tell apart, hard to navigate without textual search) or simply when text is preferred.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37776
llvm-svn: 313159
Summary: References should only be on the aliasee.
Reviewers: pcc
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D37814
llvm-svn: 313158
Summary:
LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.
This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.
Reviewers: anemet, mzolotukhin, mkuper, sanjoy, hfinkel
Reviewed By: hfinkel
Subscribers: mssimpso, Ayal, dorit, roman.shirokiy, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D17080
llvm-svn: 313012
forgetLoop() has pretty bad performance because it goes over
the same instructions over and over again in particular when
nested loop are involved.
The refactoring changes the function to a not-recursive function
and reusing the allocation for data-structures and the Visited
set.
NFCI
Differential Revision: https://reviews.llvm.org/D37659
llvm-svn: 312920
I'm trying to refactor some shared code for integer div/rem,
but I keep having to scroll through fdiv. The FP ops have
nothing in common with the integer ops, so I'm moving FP
below everything else.
While here, improve a couple of comments and fix some formatting.
llvm-svn: 312913
This removes some duplicated code and makes it easier to support signed div/rem
in a similar way if we want to do that. Note that the existing comments were not
accurate - we don't need a constant divisor to simplify; icmp simplification does
more than that. But as the added tests show, it could go even further.
llvm-svn: 312885
It now knows the tricks of both functions.
Also, fix a bug that considered allocas of non-zero address space to be always non null
Differential Revision: https://reviews.llvm.org/D37628
llvm-svn: 312869
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
llvm-svn: 312862
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:
enum TargetCostKind {
TCK_RecipThroughput, ///< Reciprocal throughput.
TCK_Latency, ///< The latency of instruction.
TCK_CodeSize ///< Instruction code size.
};
int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;
All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.
This patch also provides a simple default implementation of getInstructionLatency.
The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:
Add more detail into this function.
Add getXXXLatency function and call it from here.
Implement target specific getInstructionLatency function.
Differential Revision: https://reviews.llvm.org/D37170
llvm-svn: 312832
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.
Differential revision: https://reviews.llvm.org/D27846
llvm-svn: 312791
The current code that handles personality functions when creating a
module summary does not correctly handle the case where a function's
personality function operand refers to the function indirectly
(e.g. via a bitcast). This patch handles such cases by treating
personality function references like any other reference, i.e. by
adding them to the function's reference list. This has the minor side
benefit of allowing personality functions to participate in early
dead stripping.
We do this by calling findRefEdges on the function itself. This way
we also end up handling other function operands (specifically prefix
data and prologue data) for free.
Differential Revision: https://reviews.llvm.org/D37553
llvm-svn: 312698
Remove code that assumed that a nullptr of address space != 0 couldnt alias with a non-null pointer. This is incorrect, since nothing can be concluded about a null pointer in an address space != 0.
This code was written before address spaces were introduced
Differential Revision: https://reviews.llvm.org/D37518
llvm-svn: 312648
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145
In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.
But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.
By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.
Differential Revision: https://reviews.llvm.org/D37427
llvm-svn: 312591
Summary:
When constructing the predicate P1 in ScalarEvolution::createAddRecFromPHIWithCastsImpl() it is possible
for the PHISCEV from which the predicate is constructed to be a SCEVConstant instead of a SCEVAddRec. If
this happens, then the cast<SCEVAddRec>(PHISCEV) in the code will assert.
Such a PHISCEV is possible if either the start value or the accumulator value is a constant value
that not equal to its truncated value, and if the truncated value is zero.
This patch adds tests that demonstrate the cast<> assertion, and fixes this problem by checking
whether the PHISCEV is a constant before constructing the P1 predicate; if it is, then P1 is
equivalent to one of P2 or P3. Additionally, if we know that the start value or accumulator
value are constants then we check whether the P2 and/or P3 predicates are known false at compile
time; if either is, then we bail out of constructing the AddRec.
Reviewers: sanjoy, mkazantsev, silviu.baranga
Reviewed By: mkazantsev
Subscribers: mkazantsev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37265
llvm-svn: 312568
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.
This allows some code to be removed from InstSimplify that was doing this functionality.
This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.
There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.
Differential Revision: https://reviews.llvm.org/D37158
llvm-svn: 312382
If a function contains inline asm and the module-level inline asm
contains the definition of a local symbol, prevent the function from
being imported in case the function-level inline asm refers to a
symbol in the module-level inline asm.
Differential Revision: https://reviews.llvm.org/D37370
llvm-svn: 312332
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented
with minimal effort using that relation:
%r --> (-%b * (%t /u %b)) + %t
We implement two special cases:
- if %b is 1, the result is always 0
- if %b is a power-of-two, we produce a zext/trunc based expression instead
That is, the following code:
%r = urem i32 %t, 65536
Produces:
%r --> (zext i16 (trunc i32 %a to i16) to i32)
Note that while this helps get a tighter bound on the range analysis and the
known-bits analysis, this exposes some normalization shortcoming of SCEVs:
%div = udim i32 %a, 65536
%mul = mul i32 %div, 65536
%rem = urem i32 %a, 65536
%add = add i32 %mul, %rem
Will usually not be reduced.
llvm-svn: 312329
Summary:
Remove redundant explicit template instantiation.
This was reported by Andrew Kelley building release_50 with gcc7.2.0 on MacOS: duplicate symbol llvm::DominatorTreeBase.
Reviewers: kuhar, andrewrk, davide, hans
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37185
llvm-svn: 311835
Summary:
Add options -print-bfi/-print-bpi that dump block frequency and branch
probability info like -view-block-freq-propagation-dags and
-view-machine-block-freq-propagation-dags do but in text.
This is useful when the graph is very large and complex (the dot command
crashes, lines/edges too close to tell apart, hard to navigate without textual
search) or simply when text is preferred.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37165
llvm-svn: 311822
Change the early exit condition from Cost > Threshold to Cost >= Threshold
because the inline condition is Cost < Threshold.
Differential Revision: https://reviews.llvm.org/D37087
llvm-svn: 311791
Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO.
Reviewers: davidxl, rsmith
Reviewed By: davidxl
Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37113
llvm-svn: 311706
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:
- Penry
- Nehalem
- Westmere
- Sandy Bridge
- Ivy Bridge
- Haswell
- Broadwell
- Skylake
- Kabylake
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb
Reviewed By: fhahn, asb
Subscribers: lsaba, asb, pollydev, llvm-commits
Differential Revision: https://reviews.llvm.org/D37051
llvm-svn: 311647
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.
Differential Revision: https://reviews.llvm.org/D37074
llvm-svn: 311604
This is PR33245.
Case I am fixing is next:
Imagine we have 2 BC files, one defines and uses personality routine,
second has only declaration and also uses it.
Previously algorithm computing dead symbols (llvm::computeDeadSymbols) did
not know about personality routines and leaved them dead even if function that
has routine was live.
As a result thinLTOInternalizeAndPromoteGUID() method changed binding for
such symbol to local. Later when LLD tried to link these objects it failed
because one object had undefined global symbol for routine and second
object contained local definition instead of global.
Patch set the live root flag on the corresponding FunctionSummary
for personality routines when we build the per-module summaries
during the compile step.
Differential revision: https://reviews.llvm.org/D36834
llvm-svn: 311432
The function does an equality check later to terminate the recursion, but that won't work if its starts out too high. Similar assert already exists in computeKnownBits.
llvm-svn: 311400
Currently, the inline cost model will bail once the inline cost exceeds the
inline threshold in order to avoid unnecessary compile-time. However, when
debugging it is useful to compute the full cost, so this command line option
is added to override the default behavior.
I took over this work from Chad Rosier (mcrosier@codeaurora.org).
Differential Revision: https://reviews.llvm.org/D35850
llvm-svn: 311371
This adds support non-canonical compare predicates. InstSimplify can't rely on canonicalization to have occurred.
Differential Revision: https://reviews.llvm.org/D36646
llvm-svn: 310893
This recommits r310869, with the moved files and no extra changes.
Original commit message:
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310889
localized to the code that uses those analyses.
Technically, this can change behavior as we no longer require the
existence of the ProfileSummaryInfo analysis to use local profile
information via BFI. We didn't actually require the PSI to have an
interesting profile though, so this only really impacts the behavior in
non-default pass pipelines.
IMO, this makes it substantially less surprising how everything works --
before an analysis that wasn't actually used had to exist to trigger
*any* profile aware inlining. I think the new organization makes it more
obvious where various checks for profile signals happen.
Differential Revision: https://reviews.llvm.org/D36710
llvm-svn: 310888
Failed to add the two files that moved. And then added an extra change I didn't mean to while trying to fix that. Reverting everything.
llvm-svn: 310873
This addresses a fixme in InstSimplify about using decomposeBitTest. This also fixes InstSimplify to handle ugt and ult compares too.
I've modified the interface a little to return only the APInt version of the mask that InstSimplify needs. InstCombine now has a small wrapper routine to create a Constant out of it. I've also dropped the returning of 0 since InstSimplify doesn't need that. So InstCombine creates a zero constant itself.
I also had to make decomposeBitTest support vectors since InstSimplify needs that.
As InstSimplify can't use something from the Transforms library, I've moved the CmpInstAnalysis code to the Analysis library.
Differential Revision: https://reviews.llvm.org/D36593
llvm-svn: 310869
ValueTracking has to strike a balance when attempting to propagate information
backwards from assumes, because if the information is trivially propagated
backwards, it can appear to LLVM that the assumption is known to be true, and
therefore can be removed.
This is sound (because an assumption has no semantic effect except for causing
UB), but prevents the assume from allowing further optimizations.
The isEphemeralValueOf check exists to try and prevent this issue by not
removing the source of an assumption. This tries to make it a little bit more
general to handle the case of side-effectful instructions, such as in
%0 = call i1 @get_val()
%1 = xor i1 %0, true
call void @llvm.assume(i1 %1)
Patch by Ariel Ben-Yehuda, thanks!
Differential Revision: https://reviews.llvm.org/D36590
llvm-svn: 310859
causing compile time issues.
Moreover, the patch *deleted* the flag in addition to changing the
default, and links to a code review that doesn't even discuss the flag
and just has an update to a Clang test case.
I've followed up on the commit thread to ask for numbers on compile time
at this point, leaving the flag in place until things stabilize, and
pointing at specific code that seems to exhibit excessive compile time
with this patch.
Original commit message for r310583:
"""
[ValueTracking] Enabling ValueTracking patch by default (recommit). Part 2.
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
""""
llvm-svn: 310816
printing techniques with a DEBUG_TYPE controlling them.
It was a mistake to start re-purposing the pass manager `DebugLogging`
variable for generic debug printing -- those logs are intended to be
very minimal and primarily used for testing. More detailed and
comprehensive logging doesn't make sense there (it would only make for
brittle tests).
Moreover, we kept forgetting to propagate the `DebugLogging` variable to
various places making it also ineffective and/or unavailable. Switching
to `DEBUG_TYPE` makes this a non-issue.
llvm-svn: 310695
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames, hfinkel
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 310583
of the returned value.
Checking the returned value from inside of a scoped exit isn't actually
valid. It happens to work when NRVO fires and the stars align, which
they reliably do with Clang but don't, for example, on MSVC builds.
llvm-svn: 310547
Summary:
Avoid checking each operand and calling getValueFromCondition() before calling
constantFoldUser() when the instruction type isn't supported by
constantFoldUser().
This fixes a large compile time regression in an internal build.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36552
llvm-svn: 310545
must-alias(p, sz_p, p, sz_q) irrespective of access sizes sz_p, sz_q
As discussed a couple of weeks ago on the ML.
This makes the behavior consistent with that of BasicAA.
AA clients already check the obj size themselves and may not require the
obj size to match exactly the access size (e.g., in case of store forwarding)
llvm-svn: 310495
The recently improved support for `icmp` in ValueTracking
(r307304) exposes the fact that `isImplied` condition doesn't
really bail out if we hit the recursion limit (and calls
`computeKnownBits` which increases the depth and asserts).
Differential Revision: https://reviews.llvm.org/D36512
llvm-svn: 310481
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
to Nodes when removing ref edges from a RefSCC.
This map based association turns out to be pretty expensive for large
RefSCCs and pointless as we already have embedded data members inside
nodes that we use to track the DFS state. We can reuse one of those and
the map becomes unnecessary.
This also fuses the update of those numbers into the scan across the
pending stack of nodes so that we don't walk the nodes twice during the
DFS.
With this I expect the new PM to be faster than the old PM for the test
case I have been optimizing. That said, it also seems simpler and more
direct in many ways. The side storage was always pretty awkward.
The last remaining hot-spot in the profile of the LCG once this is done
will be the edge iterator walk in the DFS. I'll take a look at improving
that next.
llvm-svn: 310456
that RefSCC still connected.
This is common and can be handled much more efficiently. As soon as we
know we've covered every node in the RefSCC with the DFS, we can simply
reset our state and return. This avoids numerous data structure updates
and other complexity.
On top of other changes, this appears to get new PM back to parity with
the old PM for a large protocol buffer message source code. The dense
map updates are very hot in this function.
llvm-svn: 310451
limited batch updates.
Specifically, allow removing multiple reference edges starting from
a common source node. There are a few constraints that play into
supporting this form of batching:
1) The way updates occur during the CGSCC walk, about the most we can
functionally batch together are those with a common source node. This
also makes the batching simpler to implement, so it seems
a worthwhile restriction.
2) The far and away hottest function for large C++ files I measured
(generated code for protocol buffers) showed a huge amount of time
was spent removing ref edges specifically, so it seems worth focusing
there.
3) The algorithm for removing ref edges is very amenable to this
restricted batching. There are just both API and implementation
special casing for the non-batch case that gets in the way. Once
removed, supporting batches is nearly trivial.
This does modify the API in an interesting way -- now, we only preserve
the target RefSCC when the RefSCC structure is unchanged. In the face of
any splits, we create brand new RefSCC objects. However, all of the
users were OK with it that I could find. Only the unittest needed
interesting updates here.
How much does batching these updates help? I instrumented the compiler
when run over a very large generated source file for a protocol buffer
and found that the majority of updates are intrinsically updating one
function at a time. However, nearly 40% of the total ref edges removed
are removed as part of a batch of removals greater than one, so these
are the cases batching can help with.
When compiling the IR for this file with 'opt' and 'O3', this patch
reduces the total time by 8-9%.
Differential Revision: https://reviews.llvm.org/D36352
llvm-svn: 310450
Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: davidxl
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36341
llvm-svn: 310416
I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier.
Wonder if we should use it in SelectionDAG's computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D36433
llvm-svn: 310378
This was just a bad oversight on my part. The code in question should
never have worked without this fix. But it turns out, there are
relatively few places that involve libfunctions that participate in
a single SCC, and unless they do, this happens to not matter.
The effect of not having this correct is that each time through this
routine, the edge from write_wrapper to write was toggled between a call
edge and a ref edge. First time through, it becomes a demoted call edge
and is turned into a ref edge. Next time it is a promoted call edge from
a ref edge. On, and on it goes forever.
I've added the asserts which should have always been here to catch silly
mistakes like this in the future as well as a test case that will
actually infloop without the fix.
The other (much scarier) infinite-inlining issue I think didn't actually
occur in practice, and I simply misdiagnosed this minor issue as that
much more scary issue. The other issue *is* still a real issue, but I'm
somewhat relieved that so far it hasn't happened in real-world code
yet...
llvm-svn: 310342
After the previous series of patches, this is now trivial and deletes
a pretty astonishing amount of complexity. This has been a long time
coming, as the move toward a PO sequence of RefSCCs started eroding the
underlying use cases for this half of the data structure.
Among the biggest advantages here is that now there aren't two
independent data structures that need to stay in sync.
Some of my profiling has also indicated that updating the parent sets
was among the most expensive parts of the lazy call graph. Eliminating
it whole sale is likely to be a nice win in terms of compile time.
Last but not least, I had discussed with some folks previously keeping
it around for asserts and other correctness checking, but once the
fundamentals of the parent and child checking were implemented without
the parent sets their value in correctness checking was tiny and no
where near worth the cost of the complexity required to keep everything
up-to-date.
llvm-svn: 310171
isDescendantOf methods on RefSCCs in terms of the forward edges rather
than the parent sets.
This is technically slower, but probably not interestingly slower, and
all of these routines were already so expensive that they're guarded
behind both !NDEBUG and EXPENSIVE_CHECKS.
This removes another non-critical usage of parent sets.
I've also added some comments to try and help clarify to any potential
users the costs of these routines. They're mostly useful for debugging,
asserts, or other queries.
llvm-svn: 310170
walk over the parent set.
When removing a single function from the call graph, we previously would
walk the entire RefSCC's parent set and then walk every outgoing edge
just to find the ones to remove. In addition to this being quite high
complexity in theory, it is also the last fundamental use of the parent
sets.
With this change, when we remove a function we transform the node
containing it to be recognizably "dead" and then teach the edge
iterators to recognize edges to such nodes and skip them the same way
they skip null edges.
We can't move fully to using "dead" nodes -- when disconnecting two live
nodes we need to null out the edge. But the complexity this adds to the
edge sequence isn't too bad and the simplification of lazily handling
this seems like a significant win.
llvm-svn: 310169
The definition of 'false' here was already pretty vague and debatable,
and I'm about to add another potential 'false' that would actually make
much more sense in a bool operator. Especially given how rarely this is
used, a nicely named method seems better.
llvm-svn: 310165
structures, actually null out the graph pointers as well. We won't ever
update these, and we certainly shouldn't be calling any methods on them,
so it seems good to defensively nuke them.
llvm-svn: 310164
pointers in node objects, just walk the map from function to node.
It doesn't have stable ordering, but works just as well and is much
simpler. We don't need ordering when just updating internal pointers.
llvm-svn: 310163
merging RefSCCs.
The logic to directly use the reference edges is simpler and not
substantially slower (despite the comments to the contrary) because this
is not actually an especially hot part of LCG in practice.
llvm-svn: 310161
Pushes the sext onto the operands of a Sub if NSW is present.
Also adds support for propagating the nowrap flags of the
llvm.ssub.with.overflow intrinsic during analysis.
Differential Revision: https://reviews.llvm.org/D35256
llvm-svn: 310117
Summary: We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites.
Reviewers: davidxl, tejohnson, eraman
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D36317
llvm-svn: 310065
Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules.
llvm-svn: 310061
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 310054
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.
When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.
*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).
Reviewers: davidxl
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36288
llvm-svn: 310005
Summary:
This increases the inlining threshold for hot callsites. Hotness is
defined in terms of block frequency of the callsite relative to the
caller's entry block's frequency. Since this requires BFI in the
inliner, this only affects the new PM pipeline. This is enabled by
default at -O3.
This improves the performance of some internal benchmarks. Notably, an
internal benchmark for Gipfeli compression
(https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006
improves by ~2.5%. I am running more experiments and will update the
thread if other benchmarks show improvement/regression.
In terms of text size, LLVM test-suite shows an 1.22% text size
increase. Diving into the results, 13 of the benchmarks in the
test-suite increases by > 10%. Most of these are small, but
Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase)
have >250K text size. On a large application, the text size increases by
2%
Reviewers: chandlerc, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36199
llvm-svn: 309994
Summary:
(This is a second attempt as https://reviews.llvm.org/D34822 was reverted.)
LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges.
But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur.
This patch adds a small logic to handle such a case in getEdgeValueLocal().
This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary.
With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36247
llvm-svn: 309986
Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate.
Reviewers: tejohnson, davidxl, eraman
Reviewed By: eraman
Subscribers: sanjoy, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36025
llvm-svn: 309964
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
Differential Revision: https://reviews.llvm.org/D36087
llvm-svn: 309925
This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments.
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
This patch fixes PR33928.
llvm-svn: 309849
If SCEV can prove that the backedge taken count for a loop is zero, it does not
need to "understand" a recursive PHI to compute its exiting value.
This should fix PR33885.
llvm-svn: 309758
This causes assertion failures in (a somewhat old version of) SpiderMonkey.
I have already forwarded reproduction instructions to the patch author.
llvm-svn: 309659
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
llvm-svn: 309651
Summary:
Adding part of the changes in D30369 (needed to make progress):
Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA.
Original summary from D30369, by dberlin:
Currently, we have instructions which affect memory but have no memory
location. If you call, for example, MemoryLocation::get on a fence,
it asserts. This means things specifically have to avoid that. It
also means we end up with a copy of each API, one taking a memory
location, one not.
This starts to fix that.
We add MemoryLocation::getOrNone as a new call, and reimplement the
old asserting version in terms of it.
We make MemoryLocation optional in the (Instruction, MemoryLocation)
version of getModRefInfo, and kill the old one argument version in
favor of passing None (it had one caller). Now both can handle fences
because you can just use MemoryLocation::getOrNone on an instruction
and it will return a correct answer.
We use all this to clean up part of MemorySSA that had to handle this difference.
Note that literally every actual getModRefInfo interface we have could be made private and replaced with:
getModRefInfo(Instruction, Optional<MemoryLocation>)
and
getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>)
and delegating to the right ones, if we wanted to.
I have not attempted to do this yet.
Reviewers: dberlin, davide, dblaikie
Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D35441
llvm-svn: 309641
Summary:
Inlining threshold is increased by application of bonuses when the
callee has a single reachable basic block or is rich in vector
instructions. Similarly, inlining cost is reduced by applying a large
bonus when the last call to a static function is considered for
inlining. This patch disables the application of these bonuses when the
callsite or the callee is cold. The intention here is to prevent a large
cold callsite from being inlined to a non-cold caller that could prevent
the caller from being inlined. This is especially important when the
cold callsite is a last call to a static since the associated bonus is
very high.
Reviewers: chandlerc, davidxl
Subscribers: danielcdh, llvm-commits
Differential Revision: https://reviews.llvm.org/D35823
llvm-svn: 309441
Summary:
LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges.
But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur.
This patch adds a small logic to handle such a case in getEdgeValueLocal().
This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary.
With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%.
Reviewers: wmi, dberlin, sanjoy
Reviewed By: sanjoy
Subscribers: davide, davidxl, llvm-commits
Differential Revision: https://reviews.llvm.org/D34822
llvm-svn: 309415
This patch reworks the function that searches constants in Add and Mul SCEV expression
chains so that now it does not visit a node more than once, and also renames this function
for better correspondence between its implementation and semantics.
Differential Revision: https://reviews.llvm.org/D35931
llvm-svn: 309367
This reverts commit r309080. The patch needs to clear out the
ScalarEvolution::ExitLimits cache in forgetMemoizedResults.
I've replied on the commit thread for the patch with more details.
llvm-svn: 309357
Summary: In performance tuning, we see performance benefits when enlarge the maximum num promotion targets to 3. This is safe as soon as we have total percentage threshold properly setup (https://reviews.llvm.org/D35962)
Reviewers: davidxl, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D35966
llvm-svn: 309346
Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively.
Reviewers: davidxl, tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35962
llvm-svn: 309345
Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free.
TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies
(see https://reviews.llvm.org/D31186 for example).
There is TTI::getUserCost which can calculate the cost more accurately by
taking dependencies into account.
Differential Revision: https://reviews.llvm.org/D33685
llvm-svn: 309268
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
Differential Revision: https://reviews.llvm.org/D35827
llvm-svn: 309080
`SCEVUnknown::allUsesReplacedWith` does not need to call `forgetMemoizedResults`
since RAUW does a value-equivalent replacement by assumption. If this
assumption was false then the later setValPtr(New) call would be incorrect too.
This is a non-trivial performance optimization for functions with a large number
of loops since `forgetMemoizedResults` walks all loop backedge taken counts to
see if any of them use the SCEVUnknown being RAUWed. However, this improvement
is difficult to demonstrate without checking in an excessively large IR file.
llvm-svn: 309072
When SCEV calculates product of two SCEVAddRecs from the same loop, it
tries to combine them into one big AddRecExpr. If the sizes of the initial
SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every
operand of the resulting SCEV is combined from operands of initial SCEV and
has much higher complexity than they have.
As result, if we try to calculate something like:
%x1 = {a,+,b}
%x2 = mul i32 %x1, %x1
%x3 = mul i32 %x2, %x1
%x4 = mul i32 %x3, %x2
...
The size of such SCEVs grows as `2^N`, and the arguments
become more and more complex as we go forth. This leads
to long compilation and huge memory consumption.
This patch sets a limit after which we don't try to combine two
`SCEVAddRecExpr`s into one. By default, max allowed size of the
resulting AddRecExpr is set to 16.
Differential Revision: https://reviews.llvm.org/D35664
llvm-svn: 308847
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
using runtime checks
Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.
Differential revision: http://reviews.llvm.org/D30041
llvm-svn: 308299
Summary:
Previously, we counted TotalMemInst by reading certain instruction counters before and after calling visit and then finding the difference. But that wouldn't be thread safe if this same pass was being ran on multiple threads.
This list of "memory instructions" doesn't make sense to me as it includes call/invoke and is missing atomics.
This patch removes the counter all together.
Reviewers: hfinkel, chandlerc, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D33608
llvm-svn: 308260
function to every defined function known to LLVM as a library function.
LLVM can introduce calls to these functions either by replacing other
library calls or by recognizing patterns (such as memset_pattern or
vector math patterns) and replacing those with calls. When these library
functions are actually defined in the module, we need to have reference
edges to them initially so that we visit them during the CGSCC walk in
the right order and can effectively rebuild the call graph afterward.
This was discovered when building code with Fortify enabled as that is
a common case of both inline definitions of library calls and
simplifications of code into calling them.
This can in extreme cases of LTO-ing with libc introduce *many* more
reference edges. I discussed a bunch of different options with folks but
all of them are unsatisfying. They either make the graph operations
substantially more complex even when there are *no* defined libfuncs, or
they introduce some other complexity into the callgraph. So this patch
goes with the simplest possible solution of actual synthetic reference
edges. If this proves to be a memory problem, I'm happy to implement one
of the clever techniques to save memory here.
llvm-svn: 308088
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.
This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35315
llvm-svn: 308040
I used the wrong variable to update. This was even covered by a unittest
I wrote, and the comments for the unittest were correct (if confusing)
but the test itself just matched the buggy behavior. =[
llvm-svn: 307764
Summary:
Solves PR33689.
If the pointer size is less than the size of the type used for the array
size in an alloca (the <ty> type below) then we could trigger the assert in
the PR. In that example we have pointer size i16 and <ty> is i32.
<result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>]
Handle the situation by allowing truncation as well as zero extension in
ObjectSizeOffsetVisitor::visitAllocaInst().
Also, we now detect overflow in visitAllocaInst(), similar to how it was
already done in visitCallSite().
Reviewers: craig.topper, rnk, george.burgess.iv
Reviewed By: george.burgess.iv
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35003
llvm-svn: 307754
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
llvm-svn: 307498
dependencies between analyses.
This uncovers even more issues with the proxies and the splitting apart
of SCCs which are fixed in this patch. I discovered this while trying to
add more rigorous testing for a change I'm making to the call graph
update invalidation logic.
llvm-svn: 307497
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
llvm-svn: 307487
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D35096
llvm-svn: 307439
Prior to this commit both of the added test cases were passing. However, in the
latter case (test7) we were doing a lot more work to arrive at the same answer
(i.e., we were using isImpliedCondMatchingOperands() to determine the
implication.).
llvm-svn: 307400
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
This patch adds support for handling some forms of ands and ors in
ValueTracking's isImpliedCondition API.
PR33611
https://reviews.llvm.org/D34901
llvm-svn: 307304
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
The dependence analysis was returning incorrect information when using the GEPs
to compute dependences. The analysis uses the GEP indices under certain
conditions, but was doing it incorrectly when the base objects of the GEP are
aliases, but pointing to different locations in the same array.
This patch adds another check for the base objects. If the base pointer SCEVs
are not equal, then the dependence analysis should fall back on the path
that uses the whole SCEV for the dependence check. This fixes PR33567.
Differential Revision: https://reviews.llvm.org/D34702
llvm-svn: 307203
This reverts commit r306907 and reapplies the patches in the title.
The patches used to make one of the
CodeGen/ARM/2011-02-07-AntidepClobber.ll test to fail because of a
missing null check.
llvm-svn: 306919
Summary:
Add an option to prevent diagnostics that do not meet a minimum hotness
threshold from being output. When generating optimization remarks for
large codebases with a ton of cold code paths, this option can be used
to limit the optimization remark output at a reasonable size. Discussion of
this change can be read here:
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html
Reviewers: anemet, davidxl, hfinkel
Reviewed By: anemet
Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34867
llvm-svn: 306912
This reverts commit r306894.
Revert "[Dominators] Add NearestCommonDominator verification"
This reverts commit r306893.
Revert "[Dominators] Keep tree level in DomTreeNode and use it to find NCD and answer dominance queries"
This reverts commit r306892.
llvm-svn: 306907
Summary: This patch teaches IteratedDominanceFrontier to use the level information stored in DomTreeNodes instead of calculating it manually.
Reviewers: dberlin, sanjoy, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34703
llvm-svn: 306894
Summary:
To enable profile hotness information in diagnostics output, Clang takes
the option `-fdiagnostics-show-hotness` -- that's "diagnostics", with an
"s" at the end. Clang also defines `CodeGenOptions::DiagnosticsWithHotness`.
LLVM, on the other hand, defines
`LLVMContext::getDiagnosticHotnessRequested` -- that's "diagnostic", not
"diagnostics". It's a small difference, but it's confusing, typo-inducing, and
frustrating.
Add a new method with the spelling "diagnostics", and "deprecate" the
old spelling.
Reviewers: anemet, davidxl
Reviewed By: anemet
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D34864
llvm-svn: 306848
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.
In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.
Differential Revision: https://reviews.llvm.org/D34273
llvm-svn: 306785
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to:
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be
implemented with minimal effort this way.
Note: While SRem and SDiv are also related this way, SCEV does not
provides SDiv yet.
llvm-svn: 306695
The changes are a result of discussion of https://reviews.llvm.org/D33685.
It solves the following problem:
1. We can inform getGEPCost about simplified indices to help it with
calculating the cost. But getGEPCost does not take into account the
context which GEPs are used in.
2. We have getUserCost which can take the context into account but we cannot
inform about simplified indices.
With the changes getUserCost will have access to additional information
as getGEPCost has.
The one parameter getUserCost is also provided.
Differential Revision: https://reviews.llvm.org/D34057
llvm-svn: 306674
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
llvm-svn: 306528
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 306525
Using Optional<> here doesn't seem to be terribly valuable, but
this is not the main point of this change. The change enables
us to merge the (now) two identical copies of parentFunctionOfValue()
that Steensgaard's and Andersens' provide.
llvm-svn: 306351
Summary:
Make sure we are comparing the unknown instructions in the alias set and the instruction interested in.
I believe this is clearly a bug (missed opportunity). I can also add some test cases if desired.
Reviewers: hfinkel, davide, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34597
llvm-svn: 306241
Summary:
This patch changes getRange to getRangeRef and returns a reference to the ConstantRange object stored inside the DenseMap caches. We then take advantage of that to add new helper methods that can return min/max value of a signed or unsigned ConstantRange using that reference without first copying the ConstantRange.
getRangeRef calls itself recursively and I believe the reference return is fine for those calls.
I've left getSignedRange and getUnsignedRange returning a ConstantRange object so they will make a copy now. This is to ensure safety since the reference will be invalidated if the DenseMap changes.
I'm sure there are still more places that can take advantage of the reference and I'll submit future patches as I find them.
Reviewers: sanjoy, davide
Reviewed By: sanjoy
Subscribers: zzheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D32978
llvm-svn: 306229
Summary:
m_CombineOr isn't very efficient. The code using it is also quite verbose.
This patch adds m_Shift and m_BitwiseLogic matchers to make the using code more concise and improve the match efficiency.
Reviewers: spatel, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34593
llvm-svn: 306206
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.
Reviewers: hans, echristo, dmgreen
Reviewed By: dmgreen
Subscribers: mcrosier, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D34436
llvm-svn: 306118
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.
But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.
This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.
Differential Revision: https://reviews.llvm.org/D33262
llvm-svn: 306085
Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see.
Reviewers: anna, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34431
llvm-svn: 306076
This matches the checks done at the beginning of isKnownNonEqual that this code is partially emulating.
Without this we can get assertion failures due to the bit widths of the KnownBits not matching.
llvm-svn: 306044
Using various methods, BasicAA tries to determine whether two
GetElementPtr memory locations alias when its base pointers are known
to be equal. When none of its heuristics are applicable, it falls back
to PartialAlias to, according to a comment, protect TBAA making a wrong
decision in case of unions and malloc. PartialAlias is not correct,
because a PartialAlias result implies that some, but not all, bytes
overlap which is not necessarily the case here.
AAResults returns the first analysis result that is not MayAlias.
BasicAA is always the first alias analysis. When it returns
PartialAlias, no other analysis is queried to give a more exact result
(which was the intention of returning PartialAlias instead of MayAlias).
For instance, ScopedAA could return a more accurate result.
The PartialAlias hack was introduced in r131781 (and re-applied in
r132632 after some reverts) to fix llvm.org/PR9971 where TBAA returns a
wrong NoAlias result due to a union. A test case for the malloc case
mentioned in the comment was not provided and I don't think it is
affected since it returns an omnipotent char anyway.
Since r303851 (https://reviews.llvm.org/D33328) clang does emit specific
TBAA for unions anymore (but "omnipotent char" instead). Hence, the
PartialAlias workaround is not required anymore.
This patch passes the test-suite and check-llvm/check-clang of a
self-hoisted build on x64.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D34318
llvm-svn: 305938
MulOpsInlineThreshold option of SCEV is defaulted to 1000, which is inadequately high.
When constructing SCEVs of expressions like:
x1 = a * a
x2 = x1 * x1
x3 = x2 * x2
...
We actually have huge SCEVs with max allowed amount of operands inlined.
Such expressions are easy to get from unrolling of loops looking like
x = a
for (i = 0; i < n; i++)
x = x * x
Or more tricky cases where big powers are involved. If some non-linear analysis
tries to work with a SCEV that has 1000 operands, it may lead to excessively long
compilation. The attached test does not pass within 1 minute with default threshold.
This patch decreases its default value to 32, which looks much more reasonable if we
use analyzes with complexity O(N^2) or O(N^3) working with SCEV.
Differential Revision: https://reviews.llvm.org/D34397
llvm-svn: 305882