The description of this option was copy-pasted from another one and does not
correspond to reality.
Differential Revision: https://reviews.llvm.org/D34390
llvm-svn: 305782
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like
(x * x * x * x * x * x * x * x)
If we try to expand it, it generates a very straightforward sequence of muls, like:
x2 = mul x, x
x3 = mul x2, x
x4 = mul x3, x
...
x8 = mul x7, x
This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:
x2 = mul x, x
x4 = mul x2, x2
x8 = mul x4, x4
In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:
x = a;
for (int i = 0; i < 3; i++)
x = x * x;
And this loop have been fully unrolled, we have something like:
x = a;
x2 = x * x;
x4 = x2 * x2;
x8 = x4 * x4;
The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.
This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.
Differential Revision: https://reviews.llvm.org/D34025
llvm-svn: 305663
This is a fix for the test case in PR32314.
Basic Alias Analysis can ask if two nodes are known non-equal after looking through a phi node to find a GEP. isAddOfNonZero saw an add of a constant from the same phi and said that its output couldn't be equal. But Basic Alias Analysis was really asking about the value from the previous loop iteration.
This patch at least makes that case not happen anymore, I'm not sure if there were still other ways this can fail. As was discussed in the bug, it looks like fixing BasicAA would be difficult so this patch seemed like a possible workaround
Differential Revision: https://reviews.llvm.org/D33136
llvm-svn: 305481
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.
We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain
getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr
can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.
This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.
Differential Revision: https://reviews.llvm.org/D33984
llvm-svn: 305463
There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask.
This patch corrects this by rounding up to a power of 2 before doing the subtract and mask.
Differential Revision: https://reviews.llvm.org/D34165
llvm-svn: 305400
Previously it was non-const reference named Result which would tend to make someone think that it was an outparam when really its an input.
llvm-svn: 305114
Summary:
Unless I'm mistaken, the special handling for EQ/NE should cover everything and there is no reason to fallthrough to the more complex code. For that matter I'm not sure there's any reason to special case EQ/NE other than avoiding creating temporary ConstantRanges.
This patch moves the complex code into an else so we only do it when we are handling a predicate other than EQ/NE.
Reviewers: anna, reames, resistor, Farhana
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34000
llvm-svn: 305086
This is to prepare to allow for dead stripping of globals in the
merged modules.
Differential Revision: https://reviews.llvm.org/D33921
llvm-svn: 305027
The zero heuristic assumes that integers are more likely positive than negative,
but this also has the effect of assuming that strcmp return values are more
likely positive than negative. Given that for nonzero strcmp return values it's
the ordering of arguments that determines the sign of the result there's no
reason to assume that's true.
Fix this by inspecting the LHS of the compare and using TargetLibraryInfo to
decide if it's strcmp-like, and if so only assume that nonzero is more likely
than zero i.e. strings are more often different than the same. This causes a
slight code generation change in the spec2006 benchmark 403.gcc, but with no
noticeable performance impact. The intent of this patch is to allow better
optimisation of dhrystone on Cortex-M cpus, but currently it won't as there are
also some changes that need to be made to if-conversion.
Differential Revision: https://reviews.llvm.org/D33934
llvm-svn: 304970
Summary:
Check that the first access before one being tested is valid.
Before this patch, if there was no definition prior to the Use being tested,
the first time Iter was deferenced, it hit the sentinel.
Reviewers: dberlin, gbiv
Subscribers: sanjoy, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D33950
llvm-svn: 304926
Seems like at least one reasonable interpretation of optnone is that the
optimizer never "looks inside" a function. This fix is consistent with
that interpretation.
Specifically this came up in the situation:
f3 calls f2 calls f1
f2 is always_inline
f1 is optnone
The application of readnone to f1 (& thus to f2) caused the inliner to
kill the call to f2 as being trivially dead (without even checking the
cost function, as it happens - not sure if that's also a bug).
llvm-svn: 304833
Summary:
LVIPrinter pass was previously relying on the LVICache. We now directly call the
the LVI functions which solves the value if the LVI information is not already
available in the cache. This has 2 benefits over the printing of LVI cache:
1. higher coverage (i.e. catches errors) in LVI code when cache value is
invalidated.
2. relies on the core functions, and not dependent on the LVI cache (which may
be scrapped at some point).
It would still catch any cache invalidation errors, since we first go through
the cache.
Reviewers: reames, dberlin, sanjoy
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32135
llvm-svn: 304819
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
Patch by Daniel Neilson!
Reviewers: reames, anna, skatkov
Reviewed By: reames, anna
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33243
llvm-svn: 304806
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
isKnownNonEqual is called a little earlier in this function and can handle the case that we were checking here as well as more complex cases.
llvm-svn: 304775
This will be used by another commit to remove some code from InstSimplify that is redundant for scalars, but was needed for vectors due to this issue.
llvm-svn: 304774
This is actually NFC because the next case starts with the same if statement as this case did. So the result will be the same and it will fallthrough to the end of the switch. But there's no reason to rely on that so we should just break.
llvm-svn: 304680
Summary:
This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.
Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo
Reviewed By: echristo
Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D32653
llvm-svn: 304594
Summary:
The constant folding code currently assumes that the constant expression will always be on the left and the simple null will be on the right. But that's not true at least on the path from InstSimplify.
This patch adds support to ConstantFolding to detect the reversed case.
Reviewers: spatel, dberlin, majnemer, davide, joey
Reviewed By: joey
Subscribers: joey, llvm-commits
Differential Revision: https://reviews.llvm.org/D33801
llvm-svn: 304559
Summary:
Reduce min percent required for indirect call promotion from 33% to 30%,
which matches gcc's threshold and catches the same hot opportunities.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33798
llvm-svn: 304469
Replace GVFlags::LiveRoot with GVFlags::Live and use that instead of
all the DeadSymbols sets. This is refactoring in order to make
liveness information available in the RegularLTO pipeline.
llvm-svn: 304466
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.
Differential Revision: https://reviews.llvm.org/D28637
llvm-svn: 304313
Thanks to Galina Kistanova for finding the missing break!
When trying to make a test for this, I realized our logic for handling
extractvalue/insertvalue/... is somewhat broken. This makes constructing
a test-case for this missing break nontrivial.
llvm-svn: 304275
Params DT and LI are redundant, because these values are contained in fields anyways.
Differential Revision: https://reviews.llvm.org/D33668
llvm-svn: 304204
The optimistic delinearization implemented in LLVM detects array sizes by
looking for non-linear products between parameters and induction variables.
In OpenCL code, such products often look like:
A[get_global_id(0) * N + get_global_id(1)]
Hence, the IV is hidden in the get_global_id() call and consequently
delinearization would fail as no induction variable is available that helps
us to identify N as array size parameter.
We now use a very simple heuristic to change this. We assume that each parameter
that comes directly from a function call is a hidden induction variable. As
a result, we can delinearize the access above to:
A[get_global_id(0)][get_global_id(1]
llvm-svn: 304073
Summary:
This fixes introduction of an incorrect inttoptr/ptrtoint pair in
the included test case which makes use of non-integral pointers. I
suspect there are more cases like this left, but this takes care of
the one I was seeing at the moment.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33129
llvm-svn: 304058
Previously, we called simplifyPossiblyCastedAndOrOfICmps twice with the operands commuted, but the call to simplifyAndOrOfICmpsWithConstants further down already handles commuting and doesn't need to be called both ways.
This patch pushes double calls further down to just the individual routines that need to be called twice.
Differential Revision: https://reviews.llvm.org/D33603
llvm-svn: 304044
This code was replicated two additional times to handle commuted cases, but I think a commutable matcher can take care of it.
Differential Revision: https://reviews.llvm.org/D33585
llvm-svn: 304022
The tests here are have operands commuted to provide more coverage. I also commuted one of the instructions in the scalar tests so the 4 tests cover the 4 commuted variations
Differential Revision: https://reviews.llvm.org/D33599
llvm-svn: 304021
The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on
many non-X86 configs with this patch. The reason of this is that the patch
makes a correctless fix that changes optimizer's behavior for this test.
Without the change, LSR was making an overconfident simplification basing on a
wrong SCEV. Apparently it did not need the IV analysis to do this. With the
change, it chose a different way to simplify (that wasn't so confident), and
this way required the IV analysis. Now, following the right execution path,
LSR tries to make a transformation relying on IV Users analysis. This analysis
is target-dependent due to this code:
// LSR is not APInt clean, do not touch integers bigger than 64-bits.
// Also avoid creating IVs of non-native types. For example, we don't want a
// 64-bit IV in 32-bit code just because the loop has one 64-bit cast.
uint64_t Width = SE->getTypeSizeInBits(I->getType());
if (Width > 64 || !DL.isLegalInteger(Width))
return false;
To make a proper transformation in this test case, the type i32 needs to be
legal for the specified data layout. When the test runs on some non-X86
configuration (e.g. pure ARM 64), opt gets confused by the specified target
and does not use it, rejecting the specified data layout as well. Instead,
it uses some default layout that does not treat i32 as a legal type
(currently the layout that is used when it is not specified does not have
legal types at all). As result, the transformation we expect to happen does
not happen for this test.
This re-enabling patch does not have any source code changes compared to the
original patch rL303730. The only difference is that the failing test is
moved to X86 directory and now has requirement of running on x86 only to comply
with the specified target triple and data layout.
Differential Revision: https://reviews.llvm.org/D33543
llvm-svn: 303971
having it internally allocate the loop.
This is a much more flexible API and necessary in the new loop unswitch
to reasonably support both new and old PMs in common code. It also just
seems like a cleaner separation of concerns.
NFC, this should just be a pure refactoring.
Differential Revision: https://reviews.llvm.org/D33528
llvm-svn: 303834
Summary: This code was migrated from InstCombine a few years ago. InstCombine had nearby code that would move Constants to the RHS for these, but InstSimplify doesn't have such code on this path.
Reviewers: spatel, majnemer, davide
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33473
llvm-svn: 303774
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.
Differential Revision: https://reviews.llvm.org/D33431
llvm-svn: 303773
The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:
* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)
Benchmarking show improvements on SystemZ with this new behaviour.
Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().
Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422
llvm-svn: 303744
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.
Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:
for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.
This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.
llvm-svn: 303730
When presented with an icmp/select pair, we can end up asking what would happen
if we replaced one constant with another in an instruction. This is a mistake,
while non-constant Values could become a constant, constants cannot change and
trying to do so can lead to completely invalid IR (a GEP referencing a
non-existant field in the original case).
llvm-svn: 303580
This is a re-application of a r303497 that was reverted in r303498.
I thought it had broken a bot when it had not (the breakage did not
go away with the revert).
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303531
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303497
Summary: This allows pthread_self to be pulled out of a loop by LICM.
Reviewers: hfinkel, arsenm, davide
Reviewed By: davide
Subscribers: davide, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D32782
llvm-svn: 303495
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
llvm-svn: 303461
Summary:
Implements PR889
Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:
https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing
This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal. However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue. I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.
I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.
Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv
Reviewed By: chandlerc
Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D31261
llvm-svn: 303362
Replace two places that duplicate the code of isLoopInvariant method with
the invocation of this method.
Differential Revision: https://reviews.llvm.org/D33313
llvm-svn: 303336
The probability of edge coming to unreachable block should be as low as possible.
The change reduces the probability to minimal value greater than zero.
The bug https://bugs.llvm.org/show_bug.cgi?id=32214 show the example when
the probability of edge coming to unreachable block is greater than for edge
coming to out of the loop and it causes incorrect loop rotation.
Please note that with this change the behavior of unreachable heuristic is a bit different
than others. Specifically, before this change the sum of probabilities
coming to unreachable blocks have the same weight for all branches
(it was just split over all edges of this block coming to unreachable blocks).
With this change it might be slightly different but not to much due to probability of
taken branch to unreachable block is really small.
Reviewers: chandlerc, sanjoy, vsk, congh, junbuml, davidxl, dexonsmith
Reviewed By: chandlerc, dexonsmith
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D30633
llvm-svn: 303327
Summary:
There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways:
MaxNumFoo = std::max(MaxNumFoo, NumFoo);
or
MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo;
The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare. But we have no way of knowing if the value was changed by another thread between the reads and the writes.
This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again.
This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ
Reviewers: dberlin, chandlerc, hfinkel, dblaikie
Reviewed By: chandlerc
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D33301
llvm-svn: 303318
We already handled all of the new tests identically, but several
of those went through a lot of unnecessary processing before
getting folded.
Another motivation for grouping these cases together is that
InstCombine needs a similar fold. Currently, it handles the
'not' cases inefficiently which can lead to bugs as described
in the post-commit comments of:
https://reviews.llvm.org/D32143
llvm-svn: 303295
Sorting of AddRecExprs by loop nesting does not make sense since we only invoke
the CompareSCEVComplexity for AddRecExprs that are used by one SCEV. This
guarantees that there is always a dominance relationship between them. This
patch removes the sorting by nesting which is a dead code in current usage of
this function.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D33228
llvm-svn: 303235
We would eventually catch these via demanded bits and computing known bits in InstCombine,
but I think it's better to handle the simple cases as soon as possible as a matter of efficiency.
This fold allows further simplifications based on distributed ops transforms. eg:
%a = lshr i8 %x, 7
%b = or i8 %a, 2
%c = and i8 %b, 1
InstSimplify can directly fold this now:
%a = lshr i8 %x, 7
Differential Revision: https://reviews.llvm.org/D33221
llvm-svn: 303213
Update threshold based on callee's hotness only when BFI is not available.
Otherwise use only callsite's hotness. This makes it easier to reason about
hotness related threshold updates.
Differential revision: https://reviews.llvm.org/D33157
llvm-svn: 303210
ProfileSummaryInfo already checks whether the module has sample profile
in determining profile counts. This will also be useful in inliner to
clean up threshold updates.
llvm-svn: 303204
The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs
by loop depth, but does not pay attention to dominance of loops. This can
lead us to the following buggy situation:
for (...) { // loop1
op1 = {A,+,B}
}
for (...) { // loop2
op2 = {A,+,B}
S = add op1, op2
}
In this case there is no guarantee that in operand list of S the op2 comes
before op1 (loop depth is the same, so they will be sorted just
lexicographically), so we can incorrectly treat S as a recurrence of loop1,
which is wrong.
This patch changes the sorting logic so that it places the dominated recs
before the dominating recs. This ensures that when we pick the first recurrency
in the operands order, it will be the bottom-most in terms of domination tree.
The attached test set includes some tests that produce incorrect SCEV
estimations and crashes with oldlogic.
Reviewers: sanjoy, reames, apilipenko, anna
Reviewed By: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33121
llvm-svn: 303148
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.
Differential Revision: https://reviews.llvm.org/D33162
llvm-svn: 303134
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
Summary:
Merge overflow computation for signed add,
appearing both in InstCombine and ValueTracking.
As part of the merge,
cleanup the interface for overflow checks in InstCombine.
Patch by Yoav Ben-Shalom.
Reviewers: craig.topper, majnemer
Reviewed By: craig.topper
Subscribers: takuto.ikuta, llvm-commits
Differential Revision: https://reviews.llvm.org/D32946
llvm-svn: 303029
This patch adds min/max population count, leading/trailing zero/one bit counting methods.
The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.
Differential Revision: https://reviews.llvm.org/D32931
llvm-svn: 302925
This is a follow up patch for https://reviews.llvm.org/rL300440
to address a comment.
To make implementation to be consistent with other cases we just
ignore the remainder after distribution of remaining probability between
reachable edges.
If we reduced the probability of some edges coming to unreachable
blocks we should distribute the remaining part across other edges
coming to reachable blocks to satisfy the condition that sum of all
probabilities should be equal to one. If this remaining part is not
divided by number of "reachable" edges then we get this remainder.
This remainder probability should be pretty small. Other cases just ignore
if the sum of probabilities is not equal to one so we do the same.
Reviewers: chandlerc, sanjoy, vsk, junbuml, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D32124
llvm-svn: 302883
Summary:
Don't use the metadata on call instructions for determining hotness
unless we are in sample PGO mode, where it is needed because profile
counts are not accurate. In instrumentation mode this is not necessary
and does more harm than good when calls have VP metadata that hasn't
been properly scaled after transformations or dropped after constant
prop based devirtualization (both should be fixed, but we don't need
to do this in the first place for instrumentation PGO).
This required adjusting a number of tests to distinguish between sample
and instrumentation PGO handling, and to add in profile summary metadata
so that getProfileCount can get the summary.
Reviewers: davidxl, danielcdh
Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D32877
llvm-svn: 302844
I ran the test-suite (including SPEC 2006) in PGO mode comparing cold
thresholds of 225 and 45. Here are some stats on the text size:
Out of 904 tests that ran, 197 see a change in text size. The average
text size reduction (of all the 904 binaries) is 1.07%. Of the 197
binaries, 19 see a text size increase, as high as 18%, but most of them
are small single source benchmarks. There are 3 multisource benchmarks
with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On
the other side of the spectrum, 31 benchmarks see >10% size reduction
and 6 of them are MultiSource.
I haven't run the test-suite with other values of inlinecold-threshold.
Since we have a cold callsite threshold of 45, I picked this value.
Differential revision: https://reviews.llvm.org/D33106
llvm-svn: 302829
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
llvm-svn: 302705
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.
Differential revision: https://reviews.llvm.org/D33012
llvm-svn: 302597
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
This patch uses KnownOnes of the input of ctlz/cttz to bound the value that can be returned from these intrinsics. This makes these intrinsics more similar to the handling for ctpop which already uses known bits to produce a similar bound.
Differential Revision: https://reviews.llvm.org/D32521
llvm-svn: 302444
This introduces a new interface for computeKnownBits that returns the KnownBits object instead of requiring it to be pre-constructed and passed in by reference.
This is a much more convenient interface as it doesn't require the caller to figure out the BitWidth to pre-construct the object. It's so convenient that I believe we can use this interface to remove the special ComputeSignBit flavor of computeKnownBits.
As a step towards that idea, this patch replaces all of the internal usages of ComputeSignBit with this new interface. As you can see from the patch there were a couple places where we called ComputeSignBit which really called computeKnownBits, and then called computeKnownBits again directly. I've reduced those places to only making one call to computeKnownBits. I bet there are probably external users that do it too.
A future patch will update the external users and remove the ComputeSignBit interface. I'll also working on moving more locations to the KnownBits returning interface for computeKnownBits.
Differential Revision: https://reviews.llvm.org/D32848
llvm-svn: 302437
Summary:
Minor refactoring of foldIdentityShuffles() which allows the removal of a
ConstantDataVector::get() in SimplifyShuffleVectorInstruction.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32955
Conflicts:
lib/Analysis/InstructionSimplify.cpp
llvm-svn: 302433
Summary:
Following up on Sanjay's suggetion in D32955, move this functionality
into ShuffleVectornstruction.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32956
llvm-svn: 302420
Summary:
Re-applying r301766 with a fix to a typo and a regression test.
The log message for r301766 was:
==================================================================================
InstructionSimplify: Canonicalize shuffle operands. NFC-ish.
Summary:
Apply canonicalization rules:
1. Input vectors with no elements selected from can be replaced with undef.
2. If only one input vector is constant it shall be the second one.
This allows constant-folding to cover more ad-hoc simplifications that
were in place and avoid duplication for RHS and LHS checks.
There are more rules we may want to add in the future when we see a
justification. e.g. mask elements that select undef elements can be
replaced with undef.
==================================================================================
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32863
llvm-svn: 302373
Summary: This makes setRange take ConstantRange by rvalue reference since most callers were passing an unnamed temporary ConstantRange. We can then move that ConstantRange into the DenseMap caches. For the callers that weren't passing a temporary, I've added std::move to to the local variable being passed.
Reviewers: sanjoy, mzolotukhin, efriedma
Reviewed By: sanjoy
Subscribers: takuto.ikuta, llvm-commits
Differential Revision: https://reviews.llvm.org/D32943
llvm-svn: 302371
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.
This is a follow-up to:
https://reviews.llvm.org/rL301260https://reviews.llvm.org/D32143
llvm-svn: 302370
This changes one parameter to be a const APInt& since we only read from it. Use std::move on local APInts once they are no longer needed so we can reuse their allocations. Lastly, use operator+=(uint64_t) instead of adding 1 to an APInt twice creating a new APInt each time.
llvm-svn: 302335
Summary:
ConstantRange contains two APInts which can allocate memory if their width is larger than 64-bits. So we shouldn't copy it when we can avoid it.
This changes LVILatticeVal::getConstantRange() to return its internal ConstantRange by reference. This allows many places that just need a ConstantRange reference to avoid making a copy.
Several places now capture the return value of getConstantRange() by reference so they can call methods on it that don't need a new object.
Lastly it adds std::move in one place to capture to move a local ConstantRange into an LVILatticeVal.
Reviewers: reames, dberlin, sanjoy, anna
Reviewed By: reames
Subscribers: grandinj, llvm-commits
Differential Revision: https://reviews.llvm.org/D32884
llvm-svn: 302331
wcslen is part of the C99 and C++98 standards.
- This introduces the function to TargetLibraryInfo.
- Also set attributes for wcslen in llvm::inferLibFuncAttributes().
Differential Revision: https://reviews.llvm.org/D32837
llvm-svn: 302278
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.
Differential Revision: https://reviews.llvm.org/D32637
llvm-svn: 302262
The sibling folds for 'and' with casts were added with https://reviews.llvm.org/rL273200.
This is a preliminary step for adding the 'or' variants for the folds added with https://reviews.llvm.org/rL301260.
The reason for the strange form with constant LHS in the 1st test is because there's another missing fold in that
case for the inverted predicate. That should be fixed when we add the ConstantRange functionality for 'or-of-icmps'
that already exists for 'and-of-icmps'.
I'm hoping to share more code for the and/or cases, so we won't have these differences. This will allow us to remove
code from InstCombine. It's also possible that we can remove some code here in InstSimplify. I think we have some
duplicated folds because patterns are not matched in a general way.
Differential Revision: https://reviews.llvm.org/D32876
llvm-svn: 302189
Putting these next to each other should make it easier to see
what's missing from each side. Patch to plug one of those holes
should be posted soon.
llvm-svn: 302178
When profiling a no-op incremental link of Chromium I found that the functions
computeImportForFunction and computeDeadSymbols were consuming roughly 10% of
the profile. The goal of this change is to improve the performance of those
functions by changing the map lookups that they were previously doing into
pointer dereferences.
This is achieved by changing the ValueInfo data structure to be a pointer to
an element of the global value map owned by ModuleSummaryIndex, and changing
reference lists in the GlobalValueSummary to hold ValueInfos instead of GUIDs.
This means that a ValueInfo will take a client directly to the summary list
for a given GUID.
Differential Revision: https://reviews.llvm.org/D32471
llvm-svn: 302108
Summary:
The existing implementation creates a symbolic SCEV expression every
time we analyze a phi node and then has to remove it, when the analysis
is finished. This is very expensive, and in most of the cases it's also
unnecessary. According to the data I collected, ~60-70% of analyzed phi
nodes (measured on SPEC) have the following form:
PN = phi(Start, OP(Self, Constant))
Handling such cases separately significantly speeds this up.
Reviewers: sanjoy, pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32663
llvm-svn: 302096
This patch adds isConstant and getConstant for determining if KnownBits represents a constant value and to retrieve the value. Use them to simplify code.
Differential Revision: https://reviews.llvm.org/D32785
llvm-svn: 302091
I don't believe its possible to have non-zero values here since DataLayout became required. The APInt constructor inside of the KnownBits object will assert if this ever happens.
llvm-svn: 302089
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.
Differential Revision: https://reviews.llvm.org/D32784
llvm-svn: 302088
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
We should always expect values to be named before running the module summary
analysis (see NameAnonGlobals pass), so it's fine if we crash in that case.
llvm-svn: 301991
Turns out this wasn't NFC-ish at all because there's a bug processing shuffles
that change the size of their input vectors (that case always seems to trip us
up).
This should fix PR32872 while we investigate how it failed and reduce a testcase:
https://bugs.llvm.org/show_bug.cgi?id=32872
llvm-svn: 301977
This change caused buildbot failures, apparently because we're not
passing around types that InstSimplify is used to seeing. I'm not overly
familiar with InstSimplify, so I'm reverting this until I can figure out
what exactly is wrong.
llvm-svn: 301885
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 301880
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
Summary:
Apply canonicalization rules:
1. Input vectors with no elements selected from can be replaced with undef.
2. If only one input vector is constant it shall be the second one.
This allows constant-folding to cover more ad-hoc simplifications that
were in place and avoid duplication for RHS and LHS checks.
There are more rules we may want to add in the future when we see a
justification. e.g. mask elements that select undef elements can be
replaced with undef.
Reviewers: spatel, RKSimon, andreadb, davide
Reviewed By: spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32338
llvm-svn: 301766
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.
Reviewers: RKSimon, spatel, davide
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32651
llvm-svn: 301747
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
Summary:
The motivation example is like below which has 13 cases but only 2 distinct targets
```
lor.lhs.false2: ; preds = %if.then
switch i32 %Status, label %if.then27 [
i32 -7012, label %if.end35
i32 -10008, label %if.end35
i32 -10016, label %if.end35
i32 15000, label %if.end35
i32 14013, label %if.end35
i32 10114, label %if.end35
i32 10107, label %if.end35
i32 10105, label %if.end35
i32 10013, label %if.end35
i32 10011, label %if.end35
i32 7008, label %if.end35
i32 7007, label %if.end35
i32 5002, label %if.end35
]
```
which is compiled into a balanced binary tree like this on AArch64 (similar on X86)
```
.LBB853_9: // %lor.lhs.false2
mov w8, #10012
cmp w19, w8
b.gt .LBB853_14
// BB#10: // %lor.lhs.false2
mov w8, #5001
cmp w19, w8
b.gt .LBB853_18
// BB#11: // %lor.lhs.false2
mov w8, #-10016
cmp w19, w8
b.eq .LBB853_23
// BB#12: // %lor.lhs.false2
mov w8, #-10008
cmp w19, w8
b.eq .LBB853_23
// BB#13: // %lor.lhs.false2
mov w8, #-7012
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_14: // %lor.lhs.false2
mov w8, #14012
cmp w19, w8
b.gt .LBB853_21
// BB#15: // %lor.lhs.false2
mov w8, #-10105
add w8, w19, w8
cmp w8, #9 // =9
b.hi .LBB853_17
// BB#16: // %lor.lhs.false2
orr w9, wzr, #0x1
lsl w8, w9, w8
mov w9, #517
and w8, w8, w9
cbnz w8, .LBB853_23
.LBB853_17: // %lor.lhs.false2
mov w8, #10013
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_18: // %lor.lhs.false2
mov w8, #-7007
add w8, w19, w8
cmp w8, #2 // =2
b.lo .LBB853_23
// BB#19: // %lor.lhs.false2
mov w8, #5002
cmp w19, w8
b.eq .LBB853_23
// BB#20: // %lor.lhs.false2
mov w8, #10011
cmp w19, w8
b.eq .LBB853_23
b .LBB853_3
.LBB853_21: // %lor.lhs.false2
mov w8, #14013
cmp w19, w8
b.eq .LBB853_23
// BB#22: // %lor.lhs.false2
mov w8, #15000
cmp w19, w8
b.ne .LBB853_3
```
However, the inline cost model estimates the cost to be linear with the number
of distinct targets and the cost of the above switch is just 2 InstrCosts.
The function containing this switch is then inlined about 900 times.
This change use the general way of switch lowering for the inline heuristic. It
etimate the number of case clusters with the suitability check for a jump table
or bit test. Considering the binary search tree built for the clusters, this
change modifies the model to be linear with the size of the balanced binary
tree. The model is off by default for now :
-inline-generic-switch-cost=false
This change was originally proposed by Haicheng in D29870.
Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier
Reviewed By: hans
Subscribers: joerg, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D31085
llvm-svn: 301649
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
llvm-svn: 301424
Summary:
Expose the internal query structure, start using it.
Note: This is the most minimal change possible i could create. I have
trivial followups, like fixing the one use of const FastMathFlags &,
the renaming of CtxI to be consistent, etc.
This should be NFC.
Reviewers: majnemer, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32448
llvm-svn: 301379
This patch uses various APInt methods to reduce temporary APInt creation.
This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.
I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.
Differential Revision: https://reviews.llvm.org/D32495
llvm-svn: 301338
The code Sanjay Patel moved over from InstCombine doesn't work properly if the 'and' has both inputs as nots because we used a commuted op matcher on the 'and' first. But this will bind to the first 'not' on 'and' when there could be two 'not's. InstCombine could rely on DeMorgan to ensure the 'and' wouldn't have two 'not's eventually, but InstSimplify can't rely on that.
This patch matches the xor first then checks for the ands and allows a not of either operand of the xor.
Differential Revision: https://reviews.llvm.org/D32458
llvm-svn: 301329
This is a pre-commit for a patch I'm working on to turn KnownZero/One into a struct. Once I do that the type here will be less obvious.
llvm-svn: 301324
This is a pre-commit for a patch that I'm working on to merge KnownZero/KnownOne into a KnownBits struct which would have had to touch this line.
llvm-svn: 301323
Summary:
In a previous change I changed SCEV's normalization / denormalization
to work with non-affine add recs. So the bailout in IVUsers can be
removed.
Reviewers: atrick, efriedma
Reviewed By: atrick
Subscribers: davide, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D32105
llvm-svn: 301298
Summary:
Before this change, SCEV Normalization would incorrectly normalize
non-affine add recurrences. To work around this there was (still is)
a check in place to make sure we only tried to normalize affine add
recurrences.
We recently found a bug in aforementioned check to bail out of
normalizing non-affine add recurrences. However, instead of fixing
the bailout, I have decided to teach SCEV normalization to work
correctly with non-affine add recurrences, making the bailout
unnecessary (I'll remove it in a subsequent change).
I've also added some unit tests (which would have failed before this
change).
Reviewers: atrick, sunfish, efriedma
Reviewed By: atrick
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32104
llvm-svn: 301281
We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything
seems to check out. Eg, the code in instcombine was completely ignoring predicates with
mismatched signedness.
Handling or-of-icmps would be a follow-up step.
Differential Revision: https://reviews.llvm.org/D32143
llvm-svn: 301260
Summary:
llvm.invariant.group.barrier returns pointer that mustalias
pointer it takes. It can't be marked with `returned` attribute,
because it would be remove easily. The other reason is that
only Alias Analysis can know about this, because if any other
pass would know it, then the result would be replaced with it's
argument, which would be invalid.
We can think about returned pointer as something that mustalias, but
it doesn't have to be bitwise the same as the argument.
Reviewers: dberlin, chandlerc, hfinkel, sanjoy
Subscribers: reames, nlewycky, rsmith, anna, amharc
Differential Revision: https://reviews.llvm.org/D31585
llvm-svn: 301227
This is a straight cut and paste, but there's a bigger problem: if this
fold exists for simplifyOr, there should be a DeMorganized version for
simplifyAnd. But more than that, we have a patchwork of ad hoc logic
optimizations in InstCombine. There should be some structure to ensure
that we're not missing sibling folds across and/or/xor.
llvm-svn: 301213
This change reboots SCEV's current (off by default) verification logic
to avoid false failures. Instead of stringifying trip counts, it maps
old and new trip counts to the same ScalarEvolution "universe" and
asks ScalarEvolution to compute the difference between them. If the
difference comes out to be a non-zero constant, then (barring some
corner cases) we *know* we messed up.
I've not yet enabled this by default since it hits an exponential time
issue in SCEV, but once I fix that, I'll flip it on by default in
EXPENSIVE_CHECKS builds.
llvm-svn: 301146
- Mark an internal function static
- Remove the llvm namespace (just holding on to the `using namespace
llvm;` Works on My Machine(TM))
llvm-svn: 300947
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.
llvm-svn: 300928
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.
We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.
llvm-svn: 300758
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.
Differential Revision: https://reviews.llvm.org/D32239
llvm-svn: 300746
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches
the basic identity shuffle tests that Zvi recently added.
I'm not sure if we have something like this in DAGCombiner, but we should?
It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment.
We might want to bump that up if there are longer shuffle chains like this in the wild.
For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.
Differential Revision: https://reviews.llvm.org/D31960
llvm-svn: 300714
InstSimplify returned the wrong type when simplifying a vector GEP
and we ended up crashing when trying to replace all uses with the
new value. Fixes PR32697.
Differential Revision: https://reviews.llvm.org/D32180
llvm-svn: 300693
BasicAA wants to know if a function is either a malloc or calloc like function. Currently we have to check both separately. This means both calls check if its an intrinsic, query TLI, check the nobuiltin attribute, scan the AllocationFnData, etc.
This patch adds a isMallocOrCallocLikeFn so we can go through all of the checks once per call.
This also changes the one other location I saw that called both together.
Differential Revision: https://reviews.llvm.org/D32188
llvm-svn: 300608
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
the exponential behavior.
The patch is to fix PR32043. Functions getZeroExtendExpr and getSignExtendExpr
may call themselves recursively more than once. This is potentially a 2^N
complexity behavior. The exponential behavior was not commonly exposed before
because of existing global cache mechnism like UniqueSCEVs or some early return
mechanism when flags FlagNSW or FlagNUW are seen. However, we still have case
which can expose the exponential behavior, like the case in PR32043, so we add
a local cache in getZeroExtendExpr and getSignExtendExpr. If the input of the
functions -- SCEV and type pair have been seen before, we can find the extended
expression directly in the local cache.
Differential Revision: https://reviews.llvm.org/D30350
llvm-svn: 300494
This is non-functional change to re-order if statements to bail out earlier
from unreachable and ColdCall heuristics.
Reviewers: sanjoy, reames, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31704
llvm-svn: 300442
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.
An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.
This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214
Reviewers: sanjoy, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits, reames, davidxl
Differential Revision: https://reviews.llvm.org/D30631
llvm-svn: 300440
If we already called computeKnownBits for the RHS being a constant power of 2, we've already computed everything we can and should just stop. I think previously we would still recurse if we had determined the result was negative or had not determined the sign bit at all.
llvm-svn: 300432
The ConstantInt version has the same assert, and using null/allOnes is likely less efficient.
The only advantage of these local variants (and there's probably a better way to achieve this?)
is to save typing "ConstantInt::" over and over.
llvm-svn: 300426
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
It won't compile after the recent changes I've made, and I think
keeping it in provides very little value.
Instead I've added (in an earlier commit) a C++ unit test to check the
Denormalize(Normalized(X)) == X property for specific instances of X,
which is what the assert was trying to do anyway.
llvm-svn: 300339
The PostIncTransform class was not pulling its weight, so delete it
and use free functions instead.
This also makes the use of `function_ref` more idiomatic. We were
storing an instance of function_ref in the PostIncTransform class
before, which was fine in that specific case, but the usage after this
change is more obviously okay.
llvm-svn: 300338
It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.
This is one step in a multi-step cleanup.
llvm-svn: 300330
The APInt was created from an 'unsigned' and we just wanted to know how many bits the value needed to represent it. We can just use Log2_32 from MathExtras.h to get the info.
llvm-svn: 300309
We call it unconditionally on the operands of the select. Then decide if its a min/max and call it on the min/max operands or on the select operands again. Either of those second calls will overwrite the results of the initial call so we can just delete the first call.
llvm-svn: 300256
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
Previously it tried to call SimplifyInstruction which doesn't know anything about alloca so defers to constant folding which also doesn't do anything with alloca. This results in wasted cycles making calls that won't do anything. Given the frequency with which this function is called this time adds up.
llvm-svn: 300118
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.
This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.
The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.
New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll
Review: Adam Nemet
https://reviews.llvm.org/D30680
llvm-svn: 300056
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).
This separation allows re-ordering of heuristics without loosing
the post-domination information.
Reviewers: sanjoy, junbuml, vsk, chandlerc, reames
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31701
llvm-svn: 300029
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
Summary:
getModRefInfo is meant to answer the question "what impact does this
instruction have on a given memory location" (not even another
instruction).
Long debate on this on IRC comes to the conclusion the answer should be "nothing special".
That is, a noalias volatile store does not affect a memory location
just by being volatile. Note: DSE and GVN and memdep currently
believe this, because memdep just goes behind AA's back after it says
"modref" right now.
see line 635 of memdep. Prior to this patch we would get modref there, then check aliasing,
and if it said noalias, we would continue.
getModRefInfo *already* has this same AA check, it just wasn't being used because volatile was
lumped in with ordering.
(I am separately testing whether this code in memdep is now dead except for the invariant load case)
Reviewers: jyknight, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31726
llvm-svn: 299741
We have dedicated handlers for every opcode so nothing can get here anymore. The switch doesn't get detected as fully covered because Opcode is an unsigned. Casting to Instruction::BinaryOps still doesn't detect it because BinaryOpsEnd is in the enum and 1 past the last opcode.
llvm-svn: 299687
This is a latent bug that's been hanging around for a while. For a loop-invariant
pointer, expandBounds would return the range {Ptr, Ptr}, but this was interpreted
as a half-open range, not a closed range. So we ended up planting incorrect
bounds checks. Even worse, they were tautological, so we ended up incorrectly
executing the optimized loop.
llvm-svn: 299526
Summary:
Add a hook for simplification of shufflevector's with the following rules:
- Constant folding - NFC, as it was already being done by the default handler.
- If only one of the operands is constant, constant fold the shuffle if the
mask does not select elements from the variable operand - to show the hook is firing and affecting the test-cases.
Reviewers: RKSimon, craig.topper, spatel, sanjoy, nlopes, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31525
llvm-svn: 299393
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.
Reviewers: jmolloy, mcrosier, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, aemerson, rengolin, mcrosier
Differential Revision: https://reviews.llvm.org/D28680
llvm-svn: 299379
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.
Differential Revision: https://reviews.llvm.org/D31565
llvm-svn: 299362
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
Reviewers: reames, apilipenko, anna, sanjoy
Reviewed By: sanjoy
Subscribers: mzolotukhin, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31238
llvm-svn: 299205
SimplifyDemandedUseBits for Add/Sub already recursed down LHS and RHS for simplifying bits. If that didn't provide any simplifications we fall back to calling computeKnownBits which will recurse again. Instead just take the known bits for LHS and RHS we already have and call into a new function in ValueTracking that can calculate the known bits given the LHS/RHS bits.
llvm-svn: 298711
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
llvm-svn: 298690
Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31225
llvm-svn: 298656
Using AssemblyAnnotationWriter for LVI printer prints
for instructions and basic blocks.
So, we explicitly need to print LVI info for the arguments of the function (these
are values and not instructions).
llvm-svn: 298640
Given below case:
%y = shl %x, n
%z = ashr %y, m
when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle
the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV
expression.
llvm-svn: 298631
Summary:
Adding a printer pass for printing the LVI cache values after transformations
that use LVI.
This will help us in identifying cases where LVI
invariants are violated, or transforms that leave LVI in an incorrect state.
Right now, I have added two test cases to show that the printer pass is working.
I will be adding more test cases in a later change, once this change is
checked in upstream.
Reviewers: reames, dberlin, sanjoy, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30790
llvm-svn: 298542
This patch allows SCEV predicate analysis to prove implication of some expression predicates
from context predicates related to arguments of those expressions.
It introduces three new rules:
For addition:
(A >X && B >= 0) || (B >= 0 && A > X) ===> (A + B) > X.
For division:
(A > X) && (0 < B <= X + 1) ===> (A / B > 0).
(A > X) && (-B <= X < 0) ===> (A / B >= 0).
Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0".
They can also be combined with the same context, to prove more complex expressions like
"if X > 1 then X/2 + 1 > 1".
Diffirential Revision: https://reviews.llvm.org/D30887
Reviewed by: sanjoy
llvm-svn: 298481
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.
Reviewers: tejohnson, eraman
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D31201
llvm-svn: 298428
Summary: ModuleSummary should use the standard interface of ProfileSummary::getProfileCount.
Reviewers: eraman, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31154
llvm-svn: 298404
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
After the loop unroll threshold was increased in r295538, very
large constant expressions can be created. This prevents them
from having to be recursively scanned, leading to a compile
time blow-up.
Differential Revision: https://reviews.llvm.org/D30689
llvm-svn: 298356
If loop bound containing calculations like min(a,b), the Scalar
Evolution API getSmallConstantTripMultiple returns 4294967295 "-1"
as the trip multiple. The problem is that, SCEV use -1 * umax to
represent umin. The multiple constant -1 was returned, and the logic
of guarding against huge trip counts was skipped. Because -1 has 32
active bits.
The fix attempt to factor more general cases. First try to get the
greatest power of two divisor of trip count expression. In case
overflow happens, the trip count expression is still divisible by the
greatest power of two divisor returned. Returns 1 if not divisible by 2.
Patch by Huihui Zhang <huihuiz@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D30840
llvm-svn: 298301
Summary:
Extract FindAvailablePtrLoadStore out of FindAvailableLoadedValue.
Prepare for upcoming change which will do phi-translation for load on
phi pointer in jump threading SimplifyPartiallyRedundantLoad.
This is in preparation for https://reviews.llvm.org/D30543
Reviewers: efriedma, sanjoy, davide, dberlin
Reviewed By: davide
Subscribers: junbuml, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D30524
llvm-svn: 298216
Summary:
The reverse of an artbitrary bitpattern is also an arbitrary
bitpattern.
Reviewers: trentxintong, arsenm, majnemer
Reviewed By: majnemer
Subscribers: majnemer, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D31118
llvm-svn: 298201
The code assigned to KnownZero, but later code unconditionally assigned over it. I'm pretty sure the later code can handle the same cases and more equally well.
llvm-svn: 298190
Summary:
This approach has two major advantages over the existing one:
1. We don't need to extend bitwidth in our computations. Extending
bitwidth is a big issue for compile time as we often end up working with
APInts wider than 64bit, which is a slow case for APInt.
2. When we zero extend a wrapped range, we lose some information (we
replace the range with [0, 1 << src bit width)). Thus, avoiding such
extensions better preserves information.
Correctness testing:
I ran 'ninja check' with assertions that the new implementation of
getRangeForAffineAR gives the same results as the old one (this
functionality is not present in this patch). There were several failures
- I inspected them manually and found out that they all are caused by
the fact that we're returning more accurate results now (see bullet (2)
above).
Without such assertions 'ninja check' works just fine, as well as
SPEC2006.
Compile time testing:
CTMark/Os:
- mafft/pairlocalalign -16.98%
- tramp3d-v4/tramp3d-v4 -12.72%
- lencod/lencod -11.51%
- Bullet/bullet -4.36%
- ClamAV/clamscan -3.66%
- 7zip/7zip-benchmark -3.19%
- sqlite3/sqlite3 -2.95%
- SPASS/SPASS -2.74%
- Average -5.81%
Performance testing:
The changes are expected to be neutral for runtime performance.
Reviewers: sanjoy, atrick, pete
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30477
llvm-svn: 297992
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.
In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.
Differential revision: https://reviews.llvm.org/D30781
llvm-svn: 297724
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.
Tests updates:
Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll
The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.
Review: Hal Finkel
https://reviews.llvm.org/D29540
llvm-svn: 297705
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.
AST was not correctly tracking and deleting UnknownInstructions via
handles. The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.
There are two other ways to solve this problem:
- Use the `PointerRec` scheme for both known and unknown instructions.
- Use a `CallbackVH` that erases the offending Instruction from the
UnknownInstruction list.
Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.
Reviewers: chandlerc, dberlin, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30849
llvm-svn: 297539
Summary: There is no need to check profile count as only CallInst will have metadata attached.
Reviewers: eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30799
llvm-svn: 297500
This reverts r293386, r294027, r294029 and r296411.
Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.
Revert until we can fix this properly.
llvm-svn: 297493
Summary: We should not use that to check basic block hotness as optimization may mess it up.
Reviewers: eraman
Reviewed By: eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30800
llvm-svn: 297437
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".
While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.
E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.
The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.
Reviewers: rafael, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D30485
llvm-svn: 297332
A block with an UnreachableInst does not transfer execution to a successor.
The problem was exposed by GVN-hoist. This patch fixes bug 32153.
Patch by Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D30667
llvm-svn: 297254
Fixes PR32142.
r287232 accidentally increased the recursion threshold for
CompareValueComplexity from 2 to 32. This change reverses that change
by introducing a separate flag for CompareValueComplexity's threshold.
llvm-svn: 296992
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.
Reviewers: mkuper
Differential Revision: https://reviews.llvm.org/D30159
Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
llvm-svn: 296863
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.
Reviewers: mkuper
Differential Revision: https://reviews.llvm.org/D30159
Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d
llvm-svn: 296575
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: davidxl, llvm-commits
Differential Revision: https://reviews.llvm.org/D30053
llvm-svn: 296498
Summary:
Previously we used to return a bogus result, 0, for IR like `ashr %val,
-1`.
I've also added an assert checking that `ComputeNumSignBits` at least
returns 1. That assert found an already checked in test case where we
were returning a bad result for `ashr %val, -1`.
Fixes PR32045.
Reviewers: spatel, majnemer
Reviewed By: spatel, majnemer
Subscribers: efriedma, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30311
llvm-svn: 296273
Last use was killed in my previous patch. The preferred way is now to
construct the remark, pipe things to it and pass it to ORE.emit.
llvm-svn: 296019
This needed a const_cast for the dominator tree recalculation in
OptimizationRemarkEmitter, but we do that all over the place already
and it's safe.
llvm-svn: 295812
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress). However, the actual content of PR31181 is not relevant
here.
This change makes poison propagation more aggressive in the following
cases:
1. poision * Val == poison, for any Val. In particular, this changes
existing intentional and documented behavior in these two cases:
a. Val is 0
b. Val is 2^k * N
2. poison << Val == poison, for any Val
3. getelementptr is poison if any input is poison
I think all of these are justified (and are axiomatically true in the
new poison / undef model):
1a: we need poison * 0 to be poison to allow transforms like these:
A * (B + C) ==> A * B + A * C
If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS
poison * (1 + -1) = poison * 0 = 0
and the RHS
poison * 1 + poison * -1 = poison + poison = poison
1b: we need e.g. poison * 4 to be poison since we want to allow
A * 4 ==> A + A + A + A
If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.
2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:
A << 1 ==> A + A
Reviewers: majnemer, efriedma
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30185
llvm-svn: 295809
The change to InstCombine in:
https://reviews.llvm.org/D29729
...exposes this missing fold in InstSimplify, so adding this
first to avoid a regression.
llvm-svn: 295573
Several visitors check if operands to the instruction are constants,
either as it is or after looking up SimplifiedValues, check if the
result is a constant and update the SimplifiedValues map. This
refactoring splits it into a common function that does the checking of
whether the operands are constants and updating of the SimplifiedValues
table, and an instruction specific part that is implemented by each
instruction visitor as a lambda and passed to the common function.
Differential revision: https://reviews.llvm.org/D30104
llvm-svn: 295552
This creates and uses a DiagnosticLocation type rather than using
DebugLoc for this purpose in the backend diagnostics. This is NFC for
now, but will allow us to create locations for diagnostics without
having to create new metadata nodes when we don't have a DILocation.
llvm-svn: 295519
This is a short term solution to the problem that many passes currently fail
to update the assumption cache. In the long term the verifier should not
be controllable with a flag. We should either fix all passes to correctly
update the assumption cache and enable the verifier unconditionally or
somehow arrange for the assumption list to be updated automatically by passes.
Differential Revision: https://reviews.llvm.org/D30003
llvm-svn: 295236
proven larger than the loop-count
This fixes PR31098: Try to resolve statically data-dependences whose
compile-time-unknown distance can be proven larger than the loop-count,
instead of resorting to runtime dependence checking (which are not always
possible).
For vectorization it is sufficient to prove that the dependence distance
is >= VF; But in some cases we can prune unknown dependence distances early,
and even before selecting the VF, and without a runtime test, by comparing
the distance against the loop iteration count. Since the vectorized code
will be executed only if LoopCount >= VF, proving distance >= LoopCount
also guarantees that distance >= VF. This check is also equivalent to the
Strong SIV Test.
Reviewers: mkuper, anemet, sanjoy
Differential Revision: https://reviews.llvm.org/D28044
llvm-svn: 294892
The summary information includes all uses of llvm.type.test and
llvm.type.checked.load intrinsics that can be used to devirtualize calls,
including any constant arguments for virtual constant propagation.
Differential Revision: https://reviews.llvm.org/D29734
llvm-svn: 294795
Somewhat amazingly, this only requires teaching it to clean them up when
deleting a dead function from the graph. And we already have exactly the
necessary data structures to do that in the parent RefSCCs.
This allows ArgPromote to work in a much simpler way be merely letting
reference edges linger in the graph after the causing IR is deleted. We
will clean up these edges when we run any function pass over the IR, but
don't remove them eagerly.
This avoids all of the quadratic update issues both in the current pass
manager and in my previous attempt with the new pass manager.
Differential Revision: https://reviews.llvm.org/D29579
llvm-svn: 294663
disturbing the graph or having to update edges.
This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.
LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
from `Function*` edges to `Node*` edges.
All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!
Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.
The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.
The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
Differential Revision: https://reviews.llvm.org/D29577
llvm-svn: 294653
Summary:
Convert all obvious node_begin/node_end and child_begin/child_end
pairs to range based for.
Sending for review in case someone has a good idea how to make
graph_children able to be inferred. It looks like it would require
changing GraphTraits to be two argument or something. I presume
inference does not happen because it would have to check every
GraphTraits in the world to see if the noderef types matched.
Note: This change was 3-staged with clang as well, which uses
Dominators/etc from LLVM.
Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith
Subscribers: arsenm, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D29767
llvm-svn: 294620
Summary:
LVI is now depth first, which is optimal for iteration strategy in
terms of work per call. However, the way the results get cached means
it can still go very badly N^2 or worse right now. The overdefined
cache is per-block, because LVI wants to try to get different results
for the same name in different blocks (IE solve the problem
PredicateInfo solves). This means even if we discover a value is
overdefined after going very deep, it doesn't cache this information,
causing it to end up trying to rediscover it again and again. The
same is true for values along the way. In practice, overdefined
anywhere should mean overdefined everywhere (this is how, for example,
SCCP works).
Until we get around to reworking the overdefined cache, we need to
limit the worklist size we process. Note that permanently reverting
the DFS strategy exploration seems the wrong strategy (temporarily
seems fine if we really want). BFS is clearly the wrong approach, it
just gets luckier on some testcases. It's also very hard to design
an effective throttle for BFS. For DFS, the throttle is directly related
to the depth of the CFG. So really deep CFGs will get cutoff, smaller
ones will not. As the CFG simplifies, you get better results.
In BFS, the limit is it's related to the fan-out times average block size,
which is harder to reason about or make good choices for.
Bug being filed about the overdefined cache, but it will require major
surgery to fix it (plumbing predicateinfo through CVP or LVI).
Note: I did not make this number configurable because i'm not sure
anyone really needs to tweak this knob. We run CVP 3 times. On the
testcases i have the slow ones happen in the middle, where CVP is
doing cleanup work other things are effective at. Over the course of
3 runs, we don't see to have any real loss of performance.
I haven't gotten a minimized testcase yet, but just imagine in your
head a testcase where, going *up* the CFG, you have branches, one of
which leads 50000 blocks deep, and the other, to something where the
answer is overdefined immediately. BFS would discover the overdefined
faster than DFS, but do more work to do so. In practice, the right
answer is "once DFS discovers overdefined for a value, stop trying to
get more info about that value" (and so, DFS would normally cache the
overdefined results for every value it passed through in those 50k
blocks, and never do that work again. But it don't, because of the
naming problem)
Reviewers: chandlerc, djasper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29715
llvm-svn: 294463
The patch committed in r293017, as discussed on the list, doesn't really
make sense but was causing an actual issue to go away.
The issue turns out to be that in one place the extra template arguments
were dropped from the OuterAnalysisManagerProxy. This in turn caused the
types used in one set of places to access the key to be completely
different from the types used in another set of places for both Loop and
CGSCC cases where there are extra arguments.
I have literally no idea how anything seemed to work with this bug in
place. It blows my mind. But it did except for mingw64 in a DLL build.
I've added a really handy static assert that helps ensure we don't break
this in the future. It immediately diagnoses the issue with a compile
failure and a very clear error message. Much better that staring at
backtraces on a build bot. =]
llvm-svn: 294267
This patch changes the order in which LVI explores previously unexplored paths.
Previously, the code used an BFS strategy where each unexplored input was added to the search queue before any of them were explored. This has the effect of causing all inputs to be explored before returning to re-evaluate the merge point (non-local or phi node). This has the unfortunate property of doing redundant work if one of the inputs to the merge is found to be overdefined (i.e. unanalysable). If any input is overdefined, the result of the merge will be too; regardless of the values of other inputs.
The new code uses a DFS strategy where we re-evaluate the merge after evaluating each input. If we discover an overdefined input, we immediately return without exploring other inputs.
We have reports of large (4-10x) improvements of compile time with this patch and some reports of more precise analysis results as well. See the review discussion for details. The original motivating case was pr10584.
Differential Revision: https://reviews.llvm.org/D28190
llvm-svn: 294264
iteration.
The lazy formation of RefSCCs isn't really the most important part of
the laziness here -- that has to do with walking the functions
themselves -- and isn't essential to maintain. Originally, there were
incremental update algorithms that relied on updates happening
predominantly near the most recent RefSCC formed, but those have been
replaced with ones that have much tighter general case bounds at this
point. We do still perform asserts that only scale well due to this
incrementality, but those are easy to place behind EXPENSIVE_CHECKS.
Removing this simplifies the entire analysis by having a single up-front
step that builds all of the RefSCCs in a direct Tarjan walk. We can even
easily replace this with other or better algorithms at will and with
much less confusion now that there is no iterator-based incremental
logic involved. This removes a lot of complexity from LCG.
Another advantage of moving in this direction is that it simplifies
testing the system substantially as we no longer have to worry about
observing and mutating the graph half-way through the RefSCC formation.
We still need a somewhat special iterator for RefSCCs because we want
the iterator to remain stable in the face of graph updates. However,
this now merely involves relative indexing to the current RefSCC's
position in the sequence which isn't too hard.
Differential Revision: https://reviews.llvm.org/D29381
llvm-svn: 294227
for a quite big function with source like
%add = add nsw i32 %mul, %conv
%mul1 = mul nsw i32 %add, %conv
%add2 = add nsw i32 %mul1, %add
%mul3 = mul nsw i32 %add2, %add
; repeat couple of thousands times
that can be produced by loop unroll, getAddExpr() tries to recursively construct SCEV and runs almost infinite time.
Added recursion depth restriction (with new parameter to set it)
Reviewers: sanjoy
Subscribers: hfinkel, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D28158
llvm-svn: 294181
This generalizes memory access sorting to use differences between SCEVs,
instead of relying on constant offsets. That allows us to properly do
SLP vectorization of non-sequentially ordered loads within loops bodies.
Differential Revision: https://reviews.llvm.org/D29425
llvm-svn: 294027
This reverts commit r293970.
After more discussion, this belongs to the linker side and
there is no added value to do it at this level.
llvm-svn: 293993
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures,
and a recommit of r293918 after fixing LLD tests.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293970
1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
Now PGO count/raw-counts share the same
filter option: -view-bfi-func-name=.
llvm-svn: 293938
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293918
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293912
Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.
Reviewers: mcrosier, rengolin, reames, davidxl, haicheng
Reviewed By: rengolin
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29200
llvm-svn: 293896
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.
Differential Revision: https://reviews.llvm.org/D29398
llvm-svn: 293788
A program may contain llvm.assume info that disagrees with other analysis.
This may be caused by UB in the program, so we must not crash because of that.
As noted in the code comments:
https://llvm.org/bugs/show_bug.cgi?id=31809
...we can do better, but this at least avoids the assert/crash in the bug report.
Differential Revision: https://reviews.llvm.org/D29395
llvm-svn: 293773
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.
The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.
Differential Revision: https://reviews.llvm.org/D28884
llvm-svn: 293576
The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask.
Reviewers: hfinkel, mssimpso, mkuper
Differential Revision: https://reviews.llvm.org/D26905
Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad
llvm-svn: 293386
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
This is fixing pr31761: BasicAA is deducing NoAlias
on the result of the GEP if the base pointer is itself NoAlias.
This is possible only if the NoAlias on the base pointer is
deduced with a non-sized query: this should guarantee that
the pointers are belonging to different memory allocation
and that the GEP can't legally jump from one to another.
Differential Revision: https://reviews.llvm.org/D29216
llvm-svn: 293293
Summary:
CannotBeOrderedLessThanZero(powi(x, exp)) returns true if
CannotBeOrderedLessThanZero(x). But powi(-0, exp) is negative if exp is
odd, so we actually want to return SignBitMustBeZero(x).
Except that also isn't right, because we want to return true if x is
NaN, even if x has a negative sign bit.
What we really need in order to fix this is a consistent approach in
this function to handling the sign bit of NaNs. Without this it's very
difficult to say what the correct behavior here is.
Reviewers: hfinkel, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28927
llvm-svn: 293243
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28812
llvm-svn: 293176
with it.
This code was dereferencing the PoisoningVH which isn't allowed once it
is poisoned. But the code itself really doesn't need to access the
pointer, it is just doing the safe stuff of clearing out data structures
keyed on the pointer value.
Change the code to use iterators to erase directly from a DenseMap. This
is also substantially more efficient as it avoids lots of hashing and
lookups to do the erasure. DenseMap supports iterating behind the
iteration which is fairly easy to implement.
Sadly, I don't have a test case here. I'm not even close and I don't
know that I ever will be. The issue is that several of the tricky
aspects of fixing this only show up when you cause the stack's
SmallVector to be in *EXACTLY* the right location. I only ever got
a reproduction for those with Clang, and only with *exactly* the right
command line flags. Any adjustment, even to seemingly unrelated flags,
would make partial and half-way solutions magically start to "work". In
good news, all of this was caught with the LLVM test suite. Also, there
is no *specific* code here that is untested, just that the old pattern
of code won't immediately fail on any test case I've managed to
contrive.
llvm-svn: 293160
Refactoring to remove duplications of this method.
New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.
This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().
Review: Hal Finkel
https://reviews.llvm.org/D29017
llvm-svn: 293155
Summary:
Previously we assumed that the result of sqrt(x) always had 0 as its
sign bit. But sqrt(-0) == -0.
Reviewers: hfinkel, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28928
llvm-svn: 293115
This allows MIR passes to emit optimization remarks with the same level
of functionality that is available to IR passes.
It also hooks up the greedy register allocator to report spills. This
allows for interesting use cases like increasing interleaving on a loop
until spilling of registers is observed.
I still need to experiment whether reporting every spill scales but this
demonstrates for now that the functionality works from llc
using -pass-remarks*=<pass>.
Differential Revision: https://reviews.llvm.org/D29004
llvm-svn: 293110
Code region is the only part of this class that is IR-specific. Code
region is moved down in the inheritance tree to a new derived class,
called DiagnosticInfoIROptimization.
All the existing remarks are derived from this new class now.
This allows the new MIR pass-remark classes to be derived from
DiagnosticInfoOptimizationBase.
Also because we keep the name DiagnosticInfoOptimizationBase, the clang
parts don't need any adjustment.
Differential Revision: https://reviews.llvm.org/D29003
llvm-svn: 293109
Floating point intrinsics in LLVM are generally not speculatively
executed, since most of them are defined to behave the same as libm
functions, which set errno.
However, the @llvm.powi.* intrinsics do not correspond to any libm
function, and lacks any defined error handling semantics in LangRef.
It most certainly does not alter errno.
llvm-svn: 293041
I found root class should be instantiated for variadic tempate to instantiate static member explicitly.
This will fix failures in mingw DLL build.
llvm-svn: 293017
a lazy-asserting PoisoningVH.
AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.
This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.
The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.
The rest is straight cleanup.
I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.
Differential Revision: https://reviews.llvm.org/D29006
llvm-svn: 292928
Verifications of dominator tree and loop info are expensive operations
so they are disabled by default. They can be enabled by command line
options -verify-dom-info and -verify-loop-info. These options however
enable checks only in files Dominators.cpp and LoopInfo.cpp. If some
transformation changes dominaror tree and/or loop info, it would be
convenient to place similar checks to the files implementing the
transformation.
This change makes corresponding flags global, so they can be used in
any file to optionally turn verification on.
llvm-svn: 292889
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
become unavailable.
The AssumptionCache is now immutable but it still needs to respond to
DomTree invalidation if it ended up caching one.
This lets us remove one of the explicit invalidates of LVI but the
other one continues to avoid hitting a latent bug.
llvm-svn: 292769
This is similar to what the caller (matchSelectPattern()) does. In all
cases where we succeed in matching a min/max pattern, the values in
that pattern will be the values of the 'select', so hoist that and
remove a bunch of duplicated code.
llvm-svn: 292725