This fixes a layering violation:
Analysis/IVDescrtors.cpp can't include Transforms/Utils/BasicBlockUtils.h,
since TransformUtils depends on Analysis.
llvm-svn: 342024
Summary:
The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory.
The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html).
Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h|cpp to Analysis/IVDescriptors.h|cpp.
Reviewers: dmgreen, llvm-commits, hfinkel
Reviewed By: dmgreen
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D51153
llvm-svn: 342016
Fix for https://bugs.llvm.org/show_bug.cgi?id=38807, which occurred
while compiling SemaTemplateInstantiate.cpp with clang and GVNHoist
enabled. In the following example:
1=def(entry)
/ \
2=def(1) 4=def(1)
3=def(2) 5=def(4)
When removing the MemoryDef 2=def(1) from its basic block, and just
before adding it to the end of the parent basic block, we first
replace all its uses with the defining memory access:
3=def(2) -> 3=def(1)
Then we call insertDef for adding 2=def(1) to the parent basic block,
where we replace the uses of 1=def(entry) with 2=def(1). Doing so we
create a self reference:
2=def(1) -> 2=def(2) (bad)
3=def(1) -> 3=def(2) (ok)
4=def(1) -> 4=def(2) (ok)
Differential Revision: https://reviews.llvm.org/D51801
llvm-svn: 341947
The previous implementation traversed all loop blocks and bailed if one
was not a latch block. Since we are only interested in latch blocks, we
should only traverse those.
llvm-svn: 341926
This patch does the following things:
1. update SymbolicallyEvaluateGEP so that it bails out if it cannot preserve inrange arribute;
2. update llvm/test/Analysis/ConstantFolding/gep.ll to remove UB in it;
3. remove inaccurate comment above ConstantFoldInstOperandsImpl in llvm/lib/Analysis/ConstantFolding.cpp;
4. add a new regression test that makes sure that no optimizations change an inrange GEP in an unexpected way.
Patch by Zhaomo Yang!
Differential Revision: https://reviews.llvm.org/D51698
llvm-svn: 341888
Summary:
End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some of these loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.
Reviewers: george.burgess.iv, dberlin
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45299
llvm-svn: 341855
The only point to this change is the test diffs. When I remove this code entirely (in favor of the recently added generic handling), I don't want there to be any confusion due to spurious test diffs.
As an aside, the fact out tests are AST construction order dependent is not great. I thought about fixing that, but the reasonable schemes I might want (e.g. sort by name) need the test diffs anyways.
Philip
llvm-svn: 341841
AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated.
The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics *and only these three intrinsics*. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target.
The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM.
Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch. It was originally part of this one, but separate for ease of review and rebase.
Differential Revision: https://reviews.llvm.org/D50730
llvm-svn: 341713
Summary:
Block splitting is done with either identical edges being merged, or not.
Only critical edges can be split without merging identical edges based on an option.
Teach the memoryssa updater to take this into account: for the same edge between two blocks only move one entry from the Phi in Old to the new Phi in New.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51563
llvm-svn: 341709
This patch adds per-function size information remarks. Previously, passing
-Rpass-analysis=size-info would only give you per-module changes. By adding
the ability to do this per-function, it's easier to see which functions
contributed the most to size changes.
https://reviews.llvm.org/D51467
llvm-svn: 341588
Currently it has a set KnownBlocks that marks blocks as having cached
answers and a map FirstSpecialInsts that maps these blocks to first
special instructions in them. The value in the map is always non-null,
and for blocks that are known to have no special instructions the map
does not have an instance.
This patch removes KnownBlocks as obsolete. Instead, for blocks that
are known to have no special instructions, we just put a nullptr value.
This makes the code much easier to read.
llvm-svn: 341531
This validation patch has been reverted as rL341147 because of conserns raised by
@reames. This revision returns it as is to raise a discussion and address the concerns.
Differential Revision: https://reviews.llvm.org/D51523
Reviewed By: reames
llvm-svn: 341526
In basic block, loop, and function passes, we already have a function that
we can use to emit optimization remarks. We can use that instead of searching
the module for the first suitable function (that is, one that contains at
least one basic block.)
llvm-svn: 341253
Instead of counting the size of the entire module every time we run a pass,
pass along a delta instead and use that to emit the remark.
This means we only have to use (on average) smaller IR units to calculate
instruction counts. E.g, in a BB pass, we only need to look at the delta of
the BB instead of the delta of the entire module.
6/6
(This improved compile time for size remarks on sqlite3 + O2 significantly)
llvm-svn: 341250
Same vein as the previous commits. Pre-calculate the size of
the module and use that to decide if we're going to emit a
remark.
This one comes with a FIXME and TODO. First off, CallGraphSCC
and CallGraphNode don't have a getInstructionCount function. So,
for now, we do the same thing as in a module pass.
Second off, we're not really saving anything here yet, because
as before, I need to change emitInstrCountChangedRemark to take
in a delta. Keeping the patches small though, so that's coming up
next.
5/6
llvm-svn: 341249
Another commit reducing compile time in size remarks.
Cache the size of the module and loop, and update values based
off of deltas instead. Avoid recalculating the size of the
whole module whenever possible.
3/6
llvm-svn: 341247
Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.
Patch by: Simon Moll
Reviewed By: nhaehnle
Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50434
Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071
These classes don't make any changes to IR and have no reason to be in
Transform/Utils. This patch moves them to Analysis folder. This will allow
us reusing these classes in some analyzes, like MustExecute.
llvm-svn: 341015
rL340921 has been reverted by rL340923 due to linkage dependency
from Transform/Utils to Analysis which is not allowed. In this patch
this has been fixed, a new utility function moved to Analysis.
Differential Revision: https://reviews.llvm.org/D51152
llvm-svn: 341014
Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered.
Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include:
* For multi-exit loops, this doesn't require duplication of the store.
* It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit.
Differential Revision: https://reviews.llvm.org/D50925
llvm-svn: 340974
We have multiple places in code where we try to identify whether or not
some instruction is a guard. This patch factors out this logic into a separate
utility function which works uniformly in all places.
Differential Revision: https://reviews.llvm.org/D51152
Reviewed By: fedor.sergeev
llvm-svn: 340921
Moving PassTimingInfo from legacy pass manager code into a separate header.
Making it suitable for both legacy and new pass manager.
Adding a test on -time-passes main functionality.
llvm-svn: 340872
Summary:
Correct to use set like behaviour of AllocType. Should check for
subset, not precise value.
Reviewers: theraven
Reviewed By: theraven
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50959
llvm-svn: 340807
verify*() methods are intended to have no side-effects (unless we detect
broken MSSA, in which case they assert()), and all of the other verify
methods are wrapped by `#ifndef NDEBUG`.
llvm-svn: 340793
This reverts r319889.
Unfortunately, wrapping flags are not a part of SCEV's identity (they
do not participate in computing a hash value or in equality
comparisons) and in fact they could be assigned after the fact w/o
rebuilding a SCEV.
Grep for const_cast's to see quite a few of examples, apparently all
for AddRec's at the moment.
So, if 2 expressions get built in 2 slightly different ways: one with
flags set in the beginning, the other with the flags attached later
on, we may end up with 2 expressions which are exactly the same but
have their operands swapped in one of the commutative N-ary
expressions, and at least one of them will have "sorted by complexity"
invariant broken.
2 identical SCEV's won't compare equal by pointer comparison as they
are supposed to.
A real-world reproducer is added as a regression test: the issue
described causes 2 identical SCEV expressions to have different order
of operands and therefore compare not equal, which in its turn
prevents LoadStoreVectorizer from vectorizing a pair of consecutive
loads.
On a larger example (the source of the test attached, which is a
bugpoint) I have seen even weirder behavior: adding a constant to an
existing SCEV changes the order of the existing terms, for instance,
getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)).
Differential Revision: https://reviews.llvm.org/D40645
llvm-svn: 340777
This is a bit awkward in a handful of places where we didn't even have
an instruction and now we have to see if we can build one. But on the
whole, this seems like a win and at worst a reasonable cost for removing
`TerminatorInst`.
All of this is part of the removal of `TerminatorInst` from the
`Instruction` type hierarchy.
llvm-svn: 340701
The core get and set routines move to the `Instruction` class. These
routines are only valid to call on instructions which are terminators.
The iterator and *generic* range based access move to `CFG.h` where all
the other generic successor and predecessor access lives. While moving
the iterator here, simplify it using the iterator utilities LLVM
provides and updates coding style as much as reasonable. The APIs remain
pointer-heavy when they could better use references, and retain the odd
behavior of `operator*` and `operator->` that is common in LLVM
iterators. Adjusting this API, if desired, should be a follow-up step.
Non-generic range iteration is added for the two instructions where
there is an especially easy mechanism and where there was code
attempting to use the range accessor from a specific subclass:
`indirectbr` and `br`. In both cases, the successors are contiguous
operands and can be easily iterated via the operand list.
This is the first major patch in removing the `TerminatorInst` type from
the IR's instruction type hierarchy. This change was discussed in an RFC
here and was pretty clearly positive:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123407.html
There will be a series of much more mechanical changes following this
one to complete this move.
Differential Revision: https://reviews.llvm.org/D47467
llvm-svn: 340698
The way that PhiValues is integrated with BasicAA it is possible for a pass
which uses BasicAA to pick up an instance of BasicAA that uses PhiValues without
intending to, and then delete values from a function in a way that causes
PhiValues to return dangling pointers to these deleted values. Fix this by
having a set of callback value handles to invalidate values when they're
deleted.
llvm-svn: 340613
We need to allow ConstantExpr Selects in addition to SelectInst.
I'll try to put together a test case, but I wanted to fix the issues being reported.
Fixes PR38677
llvm-svn: 340546
If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.
Differential Revision: https://reviews.llvm.org/D51112
llvm-svn: 340480
We're currently getting this behavior implicitly, since we determine if
a Def's optimization is valid based on the ID of its defining access.
This is incorrect, though I wouldn't be surprised if this was masked in
part by that we're using a WeakVH to track what Defs are optimized to.
(Not to mention that we don't move Defs super often, AFAICT). I'll
submit a patch to fix this shortly.
This also includes a minor refactor to reduce duplication a bit.
No test is included, since like said, this already happens to be our
behavior. I'll add a test for this with my fix to the other bug
mentioned above.
llvm-svn: 340461
There's no need to track a seperate variable for argmemonly aliasing. This falls out naturally of the modinfo union. Note that we may return earlier than we would have earlier if all arguments are explicitly readnone. The overall result doesn't change, just how we get there.
llvm-svn: 340443
We're calling these functions quite a bit from outside of MemorySSA.cpp
now. Given that they're relatively simple one-liners, I think the style
preference is to have them inline.
llvm-svn: 340430
Volatility is not an aliasing property. We used to model volatile as if it had extremely conservative aliasing implications, but that hasn't been true for several years now. So, it doesn't make sense to be in AliasSet.
It also turns out the code is entirely a noop. Outside of the AST code to update it, there was only one user: load store promotion in LICM. L/S promotion doesn't need the check since it walks all the users of the address anyway. It already checks each load or store via !isUnordered which causes us to bail for volatile accesses. (Look at the lines immediately following the two remove asserts.)
There is the possibility of some small compile time impact here, but the only case which will get noticeably slower is a loop with a large number of loads and stores to the same address where only the last one we inspect is volatile. This is sufficiently rare it's not worth optimizing for..
llvm-svn: 340312
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.
llvm-svn: 340279
These intrinsics are modelled as writing for control flow purposes, but they don't actually write to any location. Marking these - as we did for guards - allows LICM to hoist loads out of loops containing invariant.starts.
Differential Revision: https://reviews.llvm.org/D50861
llvm-svn: 340245
Summary:
Create the ability to compute IDF using a CFG View.
For this, we'll need a new DT created using a list of Updates (to be refactored later to a GraphDiff), and the GraphTraits based on the same GraphDiff.
Reviewers: kuhar, george.burgess.iv, mzolotukhin
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50675
llvm-svn: 340052
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
This is another step towards being able to canonicalize to the funnel shift
intrinsics in IR (see D49242 for the initial patch).
We should not have any loss of simplification power in IR between these and
the equivalent IR constructs.
Differential Revision: https://reviews.llvm.org/D50848
llvm-svn: 340022
The description of `isGuaranteedToExecute` does not correspond to its implementation.
According to description, it should return `true` if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.
As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.
This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.
Differential Revision: https://reviews.llvm.org/D50558
Reviewed By: reames
llvm-svn: 339984
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected. To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed. I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes. We're probably talking months if not years.
llvm-svn: 339936
Main value is just simplifying code. I'll further simply the argument handling case in a bit, but that involved a slightly orthogonal change so I went with the mildy ugly intermediate for this patch.
Note that the isSized check in the old LICM code was not carried across. It turns out that check was dead. a) no test exercised it, and b) langref and verifier had been updated to disallow unsized types used in loads.
llvm-svn: 339930
Summary:
Profile count of a block is computed by multiplying its block frequency
by entry count and dividing the result by entry block frequency. Do
rounded division in the last step and update test cases appropriately.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50822
llvm-svn: 339835
Summary: Expose VerifyMemorySSA as a debug option. If set, passes will call the MSSA->verifyMemorySSA() after calling into the updater's APIs when MemorySSA should be valid.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D50749
llvm-svn: 339795
The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting
logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a
normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because
the guard may possibly alias with it.
This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes.
Differential Revision: https://reviews.llvm.org/D50497
Reviewed By: reames
llvm-svn: 339753
Summary:
Calls marked 'tail' cannot read or write allocas from the current frame
because the current frame might be destroyed by the time they run.
However, a tail call may use an alloca with byval. Calling with byval
copies the contents of the alloca into argument registers or stack
slots, so there is no lifetime issue. Tail calls never modify allocas,
so we can return just ModRefInfo::Ref.
Fixes PR38466, a longstanding bug.
Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50679
llvm-svn: 339636
Summary:
We've supported constant folding for sse versions for many years. This patch adds support for the avx512 versions including unsigned with the default rounding mode. We could probably do more with other roundings modes and SAE in the future.
The test cases are largely based on the sse.ll test cases. But I did add some test cases to ensure the unsigned versions don't accept negative values. Also checked the bounds of f64->i32 conversions to make sure unsigned has a larger positive range than signed.
Reviewers: RKSimon, spatel, chandlerc
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50553
llvm-svn: 339529
MemorySSA currently creates MemoryAccesses for lifetime intrinsics, and
sometimes treats them as clobbers. This may/may not be the best way
forward, but while we're doing it, we should consider
MayAlias/PartialAlias to be clobbers.
The ideal fix here is probably to remove all of this reasoning about
lifetimes from MemorySSA + put it into the passes that need to care. But
that's a wayyy broader fix that needs some consensus, and we have
miscompiles + a release branch today, and this should solve the
miscompiles just as well.
differential revision is D43269. Landing without an explicit LGTM (and
without using the special please-autoclose-this syntax) so we can still
use that revision as a place to decide what the right fix here is.
llvm-svn: 339411
getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in
https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0
instead of the true hot/cold cutoff value. The patch fixes the mistake. But the
mistake seems not causing big performance difference according to internal server
benchmarks testing.
Differential Revision: https://reviews.llvm.org/D50370
llvm-svn: 339162
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 339005
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 338994
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.
Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames
llvm-svn: 338990
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338969
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>.
The first patch is https://reviews.llvm.org/rL338485.
This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.
Differential Revision: https://reviews.llvm.org/D49981
llvm-svn: 338817
This adds the NAN checks suggested in PR37776:
https://bugs.llvm.org/show_bug.cgi?id=37776
If both operands to maxnum are NAN, that should get constant folded, so we don't
have to handle that case. This is the same assumption as other FP ops in this
function. Returning 'false' is always conservatively correct.
Copying from the bug report:
Currently, we have this for "when is cannotBeOrderedLessThanZero
(mustBePositiveOrNaN) true for maxnum":
L
-------------------
| Pos | Neg | NaN |
------------------------
|Pos | x | x | x |
------------------------
R |Neg | x | | x |
------------------------
|NaN | x | x | x |
------------------------
The cases with (Neg & NaN) are wrong. We should have:
L
-------------------
| Pos | Neg | NaN |
------------------------
|Pos | x | x | x |
------------------------
R |Neg | x | | |
------------------------
|NaN | x | | x |
------------------------
Differential Revision: https://reviews.llvm.org/D50081
llvm-svn: 338716
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338494
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.
std::pair<int, bool> callee(int v) {
int a = dummy(v);
if (a) return std::make_pair(dummy(v), true);
else return std::make_pair(v, v < 0);
}
int func(int v) {
std::pair<int, bool> rc = callee(v);
if (rc.second) {
// do something
}
SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).
This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading.
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.
This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.
Differential Revision: https://reviews.llvm.org/D48828
llvm-svn: 338485
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338387
This is being done in order to make GVN able to better optimize certain inputs.
MemDep doesn't use PhiValues directly, but does need to notifiy it when things
get invalidated.
Differential Revision: https://reviews.llvm.org/D48489
llvm-svn: 338384
By using PhiValuesAnalysis we can get all the values reachable from a phi, so
we can be more precise instead of giving up when a phi has phi operands. We
can't make BaseicAA directly use PhiValuesAnalysis though, as the user of
BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with.
For this optional usage to work correctly BasicAAWrapperPass now needs to be not
marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due
to how the legacy pass manager handles dependent passes being invalidated,
namely the depending pass still has a pointer to the now-dead dependent pass.
Differential Revision: https://reviews.llvm.org/D44564
llvm-svn: 338242
Summary:
In non-integral address spaces, we're not allowed to introduce inttoptr/ptrtoint
intrinsics. Instead, we need to expand any pointer arithmetic as geps on the
base pointer. Luckily this is a common task for SCEV, so all we have to do here
is hook up the corresponding helper function and add test case.
Fixes PR38290
Reviewers: sanjoy
Differential Revision: https://reviews.llvm.org/D49832
llvm-svn: 338073
Only wanting to pass a single SCEV operand to use as the offset of
the GEP is a common operation. Right now this requires creating a
temporary stack array at every call site. Add an overload
that encapsulates that pattern and simplify the call sites.
Suggested-By: sanjoy (in https://reviews.llvm.org/D49832)
llvm-svn: 338072
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's
if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's
(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)
This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:
1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
second one that may cause wrapping within the extension operator, and
move the first one that doesn't affect wrapping out of the extension
operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
AddExpr the 2nd and following terms and in case of AddRecExpr the
Step component don't have to be in the C2*X form or constant
(respectively), they just need to have enough trailing zeros,
which in turn could be guaranteed by means other than arithmetics,
e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
transformation works and profitable for zext's as well.
Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:
double bar(double *a, unsigned n) {
double x = 0.0;
double y = 0.0;
for (unsigned i = 0; i < n; i += 2) {
x += a[i];
y += a[i + 1];
}
return x * y;
}
If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})
it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).
How it all works:
Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:
i
XXXX...X000
YYYY...YY00
...
ZZZZ...0000
then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.
Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:
j
CCCC...CCCC
XXXX...XX00 // this is X = (x + y + ...)
Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).
So let's split C to 2 constants like follows:
0000...00CC = D
CCCC...CC00 = (C - D)
and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:
CCCC...CC00
XXXX...XX00
----------- // let's add them up
YYYY...YY00
The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:
YYYY...YY00 = Y
0000...00CC = D
----------- <nuw><nsw>
YYYY...YYCC
Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.
Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:
0001 0101 // 21
XXXX XX00 // 12x
YYYY Y000 // 8y
0001 0101 // 21
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
0001 0100 // 21 - D = 20
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
WWWW WW00 // 21 - D + 12x + 8y = 20 + 12x + 8y
therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>
This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:
i
10 1110...0011 // this is C
XX X1XX...XX00 // this is X = (x + y + ...)
Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.
We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:
i
00 0010...0011 = D
10 1100...0000 = (C - D)
Let's compute the KnownBits of (C - D) + X:
XX1 1 = carry bit, blanks stand for known zeroes
10 1100...0000 = (C - D)
XX X1XX...XX00 = X
--- -----------
XX X0XX...XX00
Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:
0 X
00 0010...0011 = D
sX X0XX...XX00 = (C - D) + X
--- -----------
sX XXXX XX11
As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.
The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.
Reviewed By: mzolotukhin, efriedma
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337943
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.
Differential Revision: https://reviews.llvm.org/D49759
llvm-svn: 337936
if the top level addition in (D + (C-D + x + ...)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the
transformation and better canonicalization of such expressions.
This enables better canonicalization of expressions like
1 + zext(5 + 20 * %x + 24 * %y) and
zext(6 + 20 * %x + 24 * %y)
which get both transformed to
2 + zext(4 + 20 * %x + 24 * %y)
This pattern is common in address arithmetics and the transformation
makes it easier for passes like LoadStoreVectorizer to prove that 2 or
more memory accesses are consecutive and optimize (vectorize) them.
Reviewed By: mzolotukhin
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337859
Summary:
Check if the parent basic block and caller exists
before calling CS.getCaller when constant folding
strip.invariant.group instrinsic.
This avoids a crash when the function containing the intrinsic
is being inlined. The instruction is checked for any simplifiction
but has not yet been added to a basic block.
Reviewers: Prazek, rsmith, efriedma
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D49690
llvm-svn: 337742
Bug fix for PR37445. The underlying problem and its fix are similar to PR37808.
The bug lies in MemorySSAUpdater::getPreviousDefRecursive(), where PhiOps is
computed before the call to tryRemoveTrivialPhi() and it ends up being out of
date, pointing to stale data. We have now turned each of the PhiOps into a
TrackingVH<MemoryAccess>.
Differential Revision: https://reviews.llvm.org/D49425
llvm-svn: 337680
Summary:
This takes 22ms out of ~20s compiling sqlite3.c because we call it
for every unit of compilation and every pass.
Reviewers: paquette, anemet
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D49586
llvm-svn: 337654
Summary:
When splitting predecessors in BasicBlockUtils, we create a new block as an immediate predecessor of the original BB, then we connect a given set of predecessors to the new block.
The API in this patch will be used to update MemoryPhis for this CFG change.
If all predecessors are being moved, we move the MemoryPhi directly. Otherwise we create a new MemoryPhi in the NewBB and populate its incoming values, while deleting them from BB's Phi.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D49156
llvm-svn: 337581
SCEV tries to constant-fold arguments of trunc operands in SCEVAddExpr, and when it does
that, it passes wrong flags into the recursion. It is only valid to pass flags that are proved for
narrow type into a computation in wider type if we can prove that trunc instruction doesn't
actually change the value. If it did lose some meaningful bits, we may end up proving wrong
no-wrap flags for sum of arguments of trunc.
In the provided test we end up with `nuw` where it shouldn't be because of this bug.
The solution is to conservatively pass `SCEV::FlagAnyWrap` which is always a valid thing to do.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D49471
llvm-svn: 337435
Bug fix for PR37808. The regression test is a reduced version of the
original reproducer attached to the bug report. As stated in the report,
the problem was that InsertedPHIs was keeping dangling pointers to
deleted Memory-Phis. MemoryPhis are created eagerly and sometimes get
zapped shortly afterwards. I've used WeakVH instead of an expensive
removal operation from the active workset.
Differential Revision: https://reviews.llvm.org/D48372
llvm-svn: 337149
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general bug.
llvm-svn: 337127
This reverts commit r336419: use-after-free on CallGraph::FunctionMap elements
due to the use of a stale iterator in CGPassManager::runOnModule.
The iterator may be invalidated if a pass removes a function, ex.:
llvm::LegacyInlinerBase::inlineCalls
inlineCallsImpl
llvm::CallGraph::removeFunctionFromModule
llvm-svn: 337018
Summary:
This commit does two things:
1. modified the existing DivergenceAnalysis::dump() so it dumps the
whole function with added DIVERGENT: annotations;
2. added code to do that dump if the appropriate -debug-only option is
on.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47700
Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83
llvm-svn: 336998
Summary:
The move APIs added in this patch will be used to update MemorySSA when CFG changes merge or split blocks, by moving memory accesses accordingly in MemorySSA's internal data structures.
[Split from D45299 for easier review]
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48897
llvm-svn: 336860
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.
Differential Revision: https://reviews.llvm.org/D48860
llvm-svn: 336611
This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.
Reviewers: chandlerc, sanjoy, hfinkel, fhahn
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D48817
llvm-svn: 336572
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.
I used Python's re.sub with this regex to update users:
r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'
llvm-svn: 336462
Previously we only iterated over functions reachable from the set of
external functions in the module. But since some of the passes under
this (notably the always-inliner and coroutine lowerer) are required for
correctness, they need to run over everything.
This just adds an extra layer of iteration over the CallGraph to keep
track of which functions we've already visited and get the next batch of
SCCs.
Should fix PR38029.
llvm-svn: 336419
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48409
llvm-svn: 336140
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
Summary:
MemoryPhis now have APIs analogous to BB Phis to remove an incoming value/block.
The MemorySSAUpdater uses the above APIs when updating MemorySSA given a set of dead blocks about to be deleted.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48396
llvm-svn: 336015
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only]. Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D40425
llvm-svn: 335996
We can have AddRec with loops having many predecessors.
This changes an assert to an early return.
Differential Revision: https://reviews.llvm.org/D48766
llvm-svn: 335965
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].
The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
with an type assertion in cast, because `%dec` is no longer an `Instruction`,
If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.
So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.
Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi
Reviewed By: mkazantsev
Subscribers: evstupac, javed.absar, spatel, llvm-commits
Differential Revision: https://reviews.llvm.org/D48599
llvm-svn: 335950
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.
Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.
Differential Revision: https://reviews.llvm.org/D47893
llvm-svn: 335857
Summary:
AliasSet::print uses `I->printAsOperand` to print UnknownInstructions. The problem is that not all UnknownInstructions have names (e.g. call instructions). When such instructions are printed, they appear as `<badref>` in AliasSets, which is very confusing, as the values are perfectly valid.
This patch fixes that by printing UnknownInstructions without a name using `print` instead of `printAsOperand`.
Reviewers: asbirlea, chandlerc, sanjoy, grosser
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48609
llvm-svn: 335751
Summary:
Adds a string saver to the ModuleSummaryIndex so it can store value
names in the case of adding a ValueInfo for a GUID when we don't
have the name stored in a Module string table. This is motivated
by the upcoming summary parser patch, where we will read value names
from the summary entry and want to store them, even when a Module
is not available.
Currently this allows us to store the name in the legacy bitcode case,
and I have added a test to show that.
Reviewers: pcc, dexonsmith
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits
Differential Revision: https://reviews.llvm.org/D47842
llvm-svn: 335570
Summary:
I discovered when writing the summary parsing support that the
per-module index builder and writer are computing the GUID from the
value name alone (ignoring the linkage type). This was ok since those
GUID were not emitted in the bitcode, and there are never multiple
conflicting names in a single module.
However, I don't see a reason for making the GUID computation different
for the per-module case. It also makes things simpler on the parsing
side to have the GUID computation consistent. So this patch changes the
summary analysis phase and the per-module summary writer to compute the
GUID using the facility on the GlobalValue.
Reviewers: pcc, dexonsmith
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D47844
llvm-svn: 335560
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.
Reviewers: vsk, aprantl, sanjoy, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D47874
llvm-svn: 335513
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D48548
llvm-svn: 335493
We can prove that some delinearized subscripts do not wrap around to become
negative by the fact that they are from inbound geps of load/store locations.
This helps improve the delinearisation in cases where we can't prove that they
are non-negative from SCEV alone.
Differential Revision: https://reviews.llvm.org/D48481
llvm-svn: 335481
It's easy for domination numbers to get out-of-date, and this is no more
costly than any of the other verifiers we already have, so it seems nice
to have.
A stage3 build with this Works On My Machine, so this hasn't caught any
bugs... yet. :)
llvm-svn: 335444
clear out deleted loops from the current queue beyond just the current
loop.
This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/
This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.
Differential Revision: https://reviews.llvm.org/D48470
llvm-svn: 335317
Summary:
This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338).
I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output.
All LLVM files are already reviewed in D48338.
Reviewers: jdoerfert, bollu, efriedma
Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia
Differential Revision: https://reviews.llvm.org/D48453
llvm-svn: 335292
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.
Differential Revision: https://reviews.llvm.org/D45872
llvm-svn: 335217
Summary:
Try to match udiv and urem patterns, and sink zext down to the leaves.
I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right.
Reviewers: sanjoy
Subscribers: jlebar, hiraditya, bixia, llvm-commits
Differential Revision: https://reviews.llvm.org/D48338
llvm-svn: 335197
Summary: Make the MemorySSA verify also check that all Phi incoming blocks are block predecessors.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D48333
llvm-svn: 335174
For both operands are unsigned, the following optimizations are valid, and missing:
1. X > Y && X != 0 --> X > Y
2. X > Y || X != 0 --> X != 0
3. X <= Y || X != 0 --> true
4. X <= Y || X == 0 --> X <= Y
5. X > Y && X == 0 --> false
unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.
Patch by: Li Jia He!
Differential Revision: https://reviews.llvm.org/D47922
llvm-svn: 335129
The optimizer is getting smarter (eg, D47986) about differentiating shuffles
based on its mask values, so we should make queries on the mask constant
operand generally available to avoid code duplication.
We'll probably use this soon in the vectorizers and instcombine (D48023 and
https://bugs.llvm.org/show_bug.cgi?id=37806).
We might clean up TTI a bit more once all of its current 'SK_*' options are
covered.
Differential Revision: https://reviews.llvm.org/D48236
llvm-svn: 335067
This reverts r334428. It incorrectly marks some multiplications as nuw. Tim
Shen is working on a proper fix.
Original commit message:
[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.
llvm-svn: 335016
Summary:
Sending for presubmit review out of an abundance of caution; it would be
bad to mess this up.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48238
llvm-svn: 334875
Summary:
Obviates the need for mask/clear/setFlags helpers.
There are some expressions here which can be simplified, but to keep
this easy to review, I have not simplified them in this patch.
No functional change.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48237
llvm-svn: 334874
In particular, when asked to print a MemoryAccess, we'll now print where
defs are optimized to, and we'll print optimized access types.
This patch also introduces an operator<< to make printing AliasResults
easier.
Patch by Juneyoung Lee!
Differential Revision: https://reviews.llvm.org/D47860
llvm-svn: 334760
Summary:
Specifically, we transform
zext(2^K * (trunc X to iN)) to iM ->
2^K * (zext(trunc X to i{N-K}) to iM)<nuw>
This is helpful because pulling the 2^K out of the zext allows further
optimizations.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits, timshen
Differential Revision: https://reviews.llvm.org/D48158
llvm-svn: 334737
Summary:
Previously we would do this simplification only if it did not introduce
any new truncs (excepting new truncs which replace other cast ops).
This change weakens this condition: If the number of truncs stays the
same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's
still simpler, and it may open up additional transformations.
While we're here, also clean up some duplicated code.
Reviewers: sanjoy
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48160
llvm-svn: 334736
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all match - but we could probably accept all cases.
Differential Revision: https://reviews.llvm.org/D47986
llvm-svn: 334506
Summary:
Previously we would add them for adds, but not multiplies.
Reviewers: sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48038
llvm-svn: 334428
An expression like
(zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32)
will become zero exactly when the nested value becomes zero in its type.
Strip injective operations from the input value in howFarToZero to make
the value simpler.
Differential Revision: https://reviews.llvm.org/D47951
llvm-svn: 334318
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.
I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.
The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47908
llvm-svn: 334298
Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes.
Reviewed By: skatkov, vsk
Differential Revision: https://reviews.llvm.org/D47371
llvm-svn: 334285
Summary:
`%r = shl nuw i8 C, %x`
As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.
https://rise4fun.com/Alive/WMkhttps://rise4fun.com/Alive/udv
Was mentioned in D47428 review.
We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.
Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47883
llvm-svn: 334222
These weren't included in D19544 - probably just an oversight.
D40044 made it more likely that we'll have LLVM math intrinsics rather
than libcalls, so this bug was more easily exposed.
As the tests/code show, we already have the complete mappings for pow/exp/log.
I don't have any experience with SVML, so I don't know if anything else is
missing. It's also not clear to me that we should be doing this transform in
IR rather than DAG/isel, but that's a separate issue.
Differential Revision: https://reviews.llvm.org/D47610
llvm-svn: 334211
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.
llvm-svn: 334140
When checking a select to see if it matches an abs, allow the true/false values
to be a sign-extension of the comparison value instead of requiring that they're
directly the comparison value, as all the comparison cares about is the sign of
the value.
This fixes a regression due to r333702, where we were no longer generating ctlz
due to isKnownNonNegative failing to match such a pattern.
Differential Revision: https://reviews.llvm.org/D47631
llvm-svn: 333927
Both weakZeroSrcSIV and weakZeroDstSIV are currently giving the same
direction vectors. Fix weakZeroSrcSIVtest by flipping the directions
it gives.
Differential Revision: https://reviews.llvm.org/D46678
llvm-svn: 333658
Summary:
The isKnownNonZero() function have checks that abort the recursion when
it reaches the specified max depth. However one of the recursive calls
was placed before the max depth check was done, resulting in a endless
recursion that eventually triggered a segmentation fault.
Fixed the problem by moving the max depth check above the first
recursive call.
Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D47531
llvm-svn: 333557
Summary:
The atomic variants of the memcpy/memmove/memset intrinsics can be treated
the same was as the regular forms, with respect to aliasing. Update the
AliasSetTracker to treat the atomic forms the same was as the regular forms.
llvm-svn: 333551
Summary:
A simple change to derive mod/ref info from the atomic memcpy
intrinsic in the same way as from the regular memcpy intrinsic.
llvm-svn: 333454
Style guide says `else`s after returns are iffy, and I agree. I also
don't know what broke the comments here and in CFLAA, but *shrug*.
llvm-svn: 333332
The uint64_ts that we pass around AA to represent MemoryLocation sizes
are logically an Optional<uint64_t>. In D44748, we want to add an extra
'imprecise' bit to this Optional<uint64_t> to represent whether a given
MemoryLocation size is an upper-bound or an exact size. For more context
on why, please see D44748.
That patch is quite large, but reviewers seem to be OK with the
approach. In D45581 (my first attempt to split 'noise' out of D44748),
reames asked that I land a precursor that is solely replacing uint64_t
with LocationSize, which starts out as `using LocationSize = uint64_t;`.
He also gave me the OK to submit this rename without further review.
llvm-svn: 333314
Libfuzzer tests have been fixed to prevent being optimized.
Original commit message:
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. N
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333300
Summary:
Look past debug intrinsics when querying whether an instruction is the
first instruction in the header block. The commit includes a reproducer
for a case where LICM would not hoist an instruction, due to the presence
of the intrinsic.
A caveat with this commit is that the check will not work properly if
the instruction at hand is a debug intrinsic. I assume that no one
depends on isGuaranteedToExecute() to return true for debug intrinsics
for these cases (and that this might be an indication of another debug
invariant issue), so I thought that it was not worth adding that extra
bit of complexity.
Reviewers: reames, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47197
llvm-svn: 333274
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Summary: This patch adds a PDT constructor from Function and lets codes previously using a local class to do this use PostDominatorTree class directly.
Reviewers: davide, kuhar, grosser, dberlin
Reviewed By: kuhar
Author: NutshellySima
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46709
llvm-svn: 333102
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47218
llvm-svn: 333022
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
llvm-svn: 332969
Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous.
Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same.
Differential Revision: https://reviews.llvm.org/D47037
llvm-svn: 332865
Summary:
invariant.group.launder should not stop propagation
of nonnull and dereferenceable, because e.g. we would not be
able to hoist loads speculatively.
Reviewers: rsmith, amharc, kuhar, xbolva00, hfinkel
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D46972
llvm-svn: 332788
Summary:
This feature is not needed, but it might be usefull in the future
to use metadata to mark what which function should support it
(and strip it when not).
Reviewers: rsmith, sanjoy, amharc, kuhar
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45419
llvm-svn: 332787
Summary:
Memdep had funny bug related to invariant.groups - because it did not
invalidated cache, in some very rare cases it was possible to show memory
dependence of the instruction that was deleted, but because other
instruction took it's place it resulted in call to vtable!
Thanks @amharc for repro!.
Reviewers: dberlin, kuhar, amharc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45320
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 332781
This patch adds a remark which tells the user when a pass changes the number of
IR instructions in a module.
It can be enabled by using -Rpass-analysis=size-info.
The point of this is to make it easier to collect statistics on how passes
modify programs in terms of code size. This is similar in concept to timing
reports, but using a remark-based interface makes it easy to diff changes over
multiple compilations of the same program.
By adding functionality like this, we can see
* Which passes impact code size the most
* How passes impact code size at different optimization levels
* Which pass might have contributed the most to an overall code size
regression
The patch lives in the legacy pass manager, but since it's simply emitting
remarks, it shouldn't be too difficult to adapt the functionality to the new
pass manager as well. This can also be adapted to handle MachineInstr counts in
code gen passes.
https://reviews.llvm.org/D38768
llvm-svn: 332739
CanProveNotTakenFirstIteration utility does not handle the case when
condition of the branch is a constant. Add its handling.
Reviewers: reames, anna, mkazantsev
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46996
llvm-svn: 332695
Summary:
- Add wasm personality function
- Re-categorize the existing `isFuncletEHPersonality()` function into
two different functions: `isFuncletEHPersonality()` and
`isScopedEHPersonality(). This becomes necessary as wasm EH uses scoped
EH instructions (catchswitch, catchpad/ret, and cleanuppad/ret) but not
outlined funclets.
- Changed some callsites of `isFuncletEHPersonality()` to
`isScopedEHPersonality()` if they are related to scoped EH IR-level
stuff.
Reviewers: majnemer, dschuff, rnk
Subscribers: jfb, sbc100, jgravelle-google, eraman, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45559
llvm-svn: 332667
Summary:
There was some unfinished work started for offset tracking in CFLGraph by the author of implementation of Andersen algorithm. This work was completed and support for field sensitivity was added to the core of Andersen algorithm.
The performance results seem promising.
SPEC2006 int_base score was increased by 1.1 % (I compared clang 6.0 with clang 6.0 with this patch). The avergae compile time was increased by +- 1 % according my measures with small and medium C/C++ projects (I did not tested it on the large projects with milions of lines of code)
Reviewers: chandlerc, george.burgess.iv, rja
Reviewed By: rja
Subscribers: rja, llvm-commits
Differential Revision: https://reviews.llvm.org/D46282
llvm-svn: 332657
Summary:
Require DominatorTree when requiring/preserving LoopInfo in the old pass manager
BreakCriticalEdges tries to keep LoopInfo and DominatorTree updated if they
exist. However, since commit r321653 and r321805, to update LoopInfo we
must have a DominatorTree, or we will hit an assert.
To fix this we now make a couple of passes that only required/preserved
LoopInfo also require DominatorTree.
This solves PR37334.
Reviewers: eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D46829
llvm-svn: 332583
The existing comment said that the functions were available only
on GNU/Linux (and on certain Android versions), but only checked
T.isGNUEnvironment() which also is true on MinGW (for arch-windows-gnu
triplets), which doesn't have such functions.
Existing checks in the initialize function in TargetLibraryInfo.cpp
also use only T.isOSLinux() to check for glibc features.
This fixes use of stdio on MinGW.
Differential Revision: https://reviews.llvm.org/D47002
llvm-svn: 332581
r332057 introduced distance() for ranges. Based on post-commit feedback,
this renames distance() to size(). The new size() is also only enabled
when the operation is O(1).
Differential Revision: https://reviews.llvm.org/D46976
llvm-svn: 332551
Summary:
A recent patch ([[ https://reviews.llvm.org/rL331587 | rL331587 ]]) to Capture Tracking taught it that the `launder_invariant_group` intrinsic captures its argument only by returning it. Unfortunately, BasicAA still considered every call instruction as a possible escape source and hence concluded that the result of a `launder_invariant_group` call cannot alias any local non-escaping value. This led to [[ https://bugs.llvm.org/show_bug.cgi?id=37458 | bug 37458 ]].
This patch updates the relevant check for escape sources in BasicAA.
Reviewers: Prazek, kuhar, rsmith, hfinkel, sanjoy, xbolva00
Reviewed By: hfinkel, xbolva00
Subscribers: JDevlieghere, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D46900
llvm-svn: 332466
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja
Reviewed By: rja
Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 332452
Summary:
After r332167 we started to sort the IDF blocks inside IDF calculation, so
there is no need to re-sort them on the user site. The test changes are due to
a slightly different order we're using now (originally we used DFSInNumber and
now the blocks are sorted by a pair (LevelFromRoot, DFSInNumber)).
Reviewers: dberlin, mgrang
Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D46899
llvm-svn: 332385
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Summary:
Currently the order of blocks returned by `IDF::calculate` can be
non-deterministic. This was discovered in several attempts to enable
SSAUpdaterBulk for JumpThreading (which led to miscompare in bootstrap between
stage 3 and stage4). Originally, the blocks were put into a priority queue with
a depth level as their key, and this patch adds a DFSIn number as a second key
to specify a deterministic order across blocks from one level.
The solution was suggested by Daniel Berlin.
Reviewers: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46646
llvm-svn: 332167
If the sprintf function is static (as on mingw-w64, where many stdio
functions are static inline wrappers), earlier optimization passes
could optimize out the return value altogether, and make it void,
which could break optimizations of this libcall that touch the
return value.
This fixes the issue discussed in PR37408 for the sprintf function.
Differential Revision: https://reviews.llvm.org/D46752
llvm-svn: 332106
We found current sampleFDO had a performance issue when triaging a regression.
For a callsite with inline instance in the profile, even if hot callsite inliner
cannot inline it, it may still execute enough times and should not be treated as
cold in regular inliner later. However, currently if such callsite is not inlined
by hot callsite inliner, and the BB where the callsite locates doesn't get
samples from other instructions inside of it, the callsite will have no profile
metadata annotated. In regular inliner cost analysis, if the callsite has no
profile annotated and its caller has profile information, it will be treated as
cold.
The fix changes the isCallsiteHot check and chooses to compare
CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo.
Differential Revision: https://reviews.llvm.org/D45377
llvm-svn: 332058
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.
Differential Revision: https://reviews.llvm.org/D46668
llvm-svn: 332057
Summary:
Operand 0 is the condition, not the true value.
Use op 1 and op 2 as the correct values.
Reviewers: george.burgess.iv, nlopes, efriedma
Reviewed By: george.burgess.iv
Subscribers: craig.topper, rjmccall, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D46343
llvm-svn: 331976
During simplification umax we trigger isKnownPredicate twice. As a first attempt it
tries the induction. To do that it tries to get post increment of SCEV.
Re-writing the SCEV may result in simplification of umax. If the SCEV contains a lot
of umax operations this recursion becomes very slow.
The added test demonstrates the slow behavior.
To resolve this we use only simple ways to check whether the predicate is known.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: lebedev.ri, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46046
llvm-svn: 331949
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
Summary:
launder.invariant.group has the same rules of capturing as
bitcast, gep, etc - the original value is not captured
if the returned pointer is not captured.
With this patch, we mark 40% more functions as noalias when compiling with -fstrict-vtable-pointers;
1078 vs 1778 (39.37%)
Reviewers: sanjoy, davide, nlewycky, majnemer, mehdi_amini
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D32673
llvm-svn: 331587
This patch was temporarily reverted because it has exposed bug 37229 on
PowerPC platform. The bug is unrelated to the patch and was just a general
bug in the optimization done for PowerPC platform only. The bug was fixed
by the patch rL331410.
This patch returns the disabled commit since the bug was fixed.
llvm-svn: 331427
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
The invocation of getExact in ScalarEvolution::getBackedgeTakenInfo is used
only for getting statistic and for assert.
Even if statistics is disabled, the code related to it will be eliminated
the invocation to getExact itself will not be eliminated
because it may have side-effects like creation of new SCEVs.
So do invocation only when we collect statistics or executes asserts.
Reviewers: mkazantsev, sanjoy, javed.absar
Reviewed By: javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46178
llvm-svn: 331099
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
Add new umin creation method which accepts a list of operands.
SCEV does not represents umin which is required in getExact, so
it transforms umin to umax with not. As a result the transformation of
tree of max to max with several operands does not work.
We just use the new introduced method for creation umin from several operands.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46047
llvm-svn: 331015
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
llvm-svn: 330941
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893
Summary:
The PointerMayBeCapturedBefore function's DomTree arg should be
const instead of non-const. There are no non-const uses of it
in the function.
llvm-svn: 330769
I was reminded today that this patch got reverted in r301885. I can no
longer reproduce the failure that caused the revert locally (...almost
one year later), and the patch applied pretty cleanly, so I guess we'll
see if the bots still get angry about it.
The original breakage was InstSimplify complaining (in "assertion
failed" form) about getting passed some crazy IR when running `ninja
check-sanitizer`. I'm unable to find traces of what, exactly, said crazy
IR was. I suppose we'll find out pretty soon if that's still the case.
:)
Original commit:
Author: gbiv
Date: Mon May 1 18:12:08 2017
New Revision: 301880
URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev
Log:
[InstSimplify] Handle selects of GEPs with 0 offset
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 330667
Summary:
This change teaches DSE that the atomic memory intrinsics are stores
that can be eliminated, and can allow other stores to be eliminated.
This change specifically does not teach DSE that these intrinsics
can be partially eliminated (i.e. length reduced, and dest/src changed);
that will be handled in another change.
Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D45535
llvm-svn: 330629
In the function `simplifyOneLoop` we optimistically assume that changes in the
inner loop only affect this very loop and have no impact on its parents. In fact,
after rL329047 has been merged, we can now calculate exit counts for outer
loops which may depend on inner loops. Thus, we need to invalidate all parents
when we do something to a loop.
There is an evidence of incorrect behavior of `simplifyOneLoop`: when we insert
`SE->verify()` check in the end of this funciton, it fails on a bunch of existing
test, in particular:
LLVM :: Transforms/LoopUnroll/peel-loop-not-forced.ll
LLVM :: Transforms/LoopUnroll/peel-loop-pgo.ll
LLVM :: Transforms/LoopUnroll/peel-loop.ll
LLVM :: Transforms/LoopUnroll/peel-loop2.ll
Note that previously we have fixed issues of this variety, see rL328483.
This patch makes this function invalidate the outermost loop properly.
Differential Revision: https://reviews.llvm.org/D45937
Reviewed By: chandlerc
llvm-svn: 330576
This is the last step in getting constant pattern matchers to allow
undef elements in constant vectors.
I'm adding a dedicated m_ZeroInt() function and building m_Zero() from
that. In most cases, calling code can be updated to use m_ZeroInt()
directly when there's no need to match pointers, but I'm leaving that
efficiency optimization as a follow-up step because it's not always
clear when that's ok.
There are just enough icmp folds in InstSimplify that can be used for
integer or pointer types, that we probably still want a generic m_Zero()
for those cases. Otherwise, we could eliminate it (and possibly add a
m_NullPtr() as an alias for isa<ConstantPointerNull>()).
We're conservatively returning a full zero vector (zeroinitializer) in
InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify),
but I'm not sure if that's actually necessary in all cases. We may be
able to propagate an undef lane instead. One test where this happens is
marked with 'TODO'.
llvm-svn: 330550
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45631
llvm-svn: 330101
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.
Differential Revision: https://reviews.llvm.org/D45271
llvm-svn: 330062
Improve the alias analysis to account for cases where we
know that src/dst pairs cannot alias due to things like
TBAA. As we know they are noalias, we know no dependency
can occur. Also fixes issues around the size parameter
to AA being incorrect.
Differential Revision: https://reviews.llvm.org/D42381
llvm-svn: 329692
The caching walker used to hold its own caches, which made its `reset()`
function meaningful. Since caching has been moved out of it, there's no
reason to continue to have these cache-related methods.
Similarly, the EXPENSIVE_CHECKS block that's getting removed used to
rerun the query with caching disabled. Since that's how we always do
queries now, it's redundant.
llvm-svn: 329638
Fix PR36484, as suggested:
<quote>
during moves, mark the direct users of the erased things that were phis as "not to be optimized"
<quote>
llvm-svn: 329621
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329218
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329215
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.
The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.
Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna
llvm-svn: 329150
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 329085
The patch changes the usage of dominate to properlyDominate
to satisfy the condition !(a < a) while using std::max.
It is actually NFC due to set data structure is used to keep
the Loops and no two identical loops can be in collection.
So in reality there is no difference between usage of
dominate and properlyDominate in this particular case.
However it might be changed so it is better to fix it.
llvm-svn: 329051
Current implementation of `computeExitLimit` has a big piece of code
the only purpose of which is to prove that after the execution of this
block the latch will be executed. What it currently checks is actually a
subset of situations where the exiting block dominates latch.
This patch replaces all these checks for simple particular cases with
domination check over loop's latch which is the only necessary condition
of taking the exiting block into consideration. This change allows to
calculate exact loop taken count for simple loops like
for (int i = 0; i < 100; i++) {
if (cond) {...} else {...}
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44677
Reviewed By: efriedma
llvm-svn: 329047
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 328980
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: sanjoy, dexonsmith, hfinkel, RKSimon
Reviewed By: dexonsmith
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D44944
llvm-svn: 328925
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.
Reviewers: eraman
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45062
llvm-svn: 328909
Eli pointed out that variadic functions are totally a thing, so this
assert is incorrect.
No test-case is provided, since the only way this assert fires is if a
specific DenseMap falls back to doing `isEqual` checks, and that seems
fairly brittle (and requires a pyramid of growing
`call void (i8, ...) @varargs(i8 0)`).
llvm-svn: 328755
We use a `DenseMap<MemoryLocOrCall, MemlocStackInfo>` to keep track of
prior work when optimizing uses in MemorySSA. Because we weren't
accounting for callsite arguments in either the hash code or equality
tests for `MemoryLocOrCall`s, we optimized uses too aggressively in
some rare cases.
Fix by Daniel Berlin.
Should fix PR36883.
llvm-svn: 328748
Summary:
This is an NFC refactoring of the OptBisect class to split it into an optional pass gate interface used by LLVMContext and the Optional Pass Bisector (OptBisect) used for debugging of optional passes.
This refactoring is needed for D44464, which introduces setOptPassGate() method to allow implementations other than OptBisect.
Patch by Yevgeny Rouban.
Reviewers: andrew.w.kaylor, fedor.sergeev, vsk, dberlin, Eugene.Zelenko, reames, skatkov
Reviewed By: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D44821
llvm-svn: 328637
Currently, `getExact` fails if it sees two exit counts in different blocks. There is
no solid reason to do so, given that we only calculate exact non-taken count
for exiting blocks that dominate latch. Using this fact, we can simply take min
out of all exits of all blocks to get the exact taken count.
This patch makes the calculation more optimistic with enforcing our assumption
with asserts. It allows us to calculate exact backedge taken count in trivial loops
like
for (int i = 0; i < 100; i++) {
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44676
Reviewed By: fhahn
llvm-svn: 328611
This patch teaches `computeConstantDifference` handle calculation of constant
difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`.
Differential Revision: https://reviews.llvm.org/D43759
Reviewed By: anna
llvm-svn: 328609
MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for
each block, it computes the previous definition for each predecessor,
then takes those definitions and combines them. But currently it doesn't
remember results which it already computed; this means it can visit the
same block multiple times, which adds up to exponential time overall.
To fix this, this patch adds a cache. If we computed the result for a
block already, we don't need to visit it again because we'll come up
with the same result. Well, unless we RAUW a MemoryPHI; in that case,
the TrackingVH will be updated automatically.
This matches the original source paper for this algorithm.
The testcase isn't really a test for the bug, but it adds coverage for
the case where tryRemoveTrivialPhi erases an existing PHI node. (It's
hard to write a good regression test for a performance issue.)
Differential Revision: https://reviews.llvm.org/D44715
llvm-svn: 328577
Implement TTI interface for targets to indicate that the LSR should give
priority to post-incrementing addressing modes.
Combination of patches by Sebastian Pop and Brendon Cahoon.
Differential Revision: https://reviews.llvm.org/D44758
llvm-svn: 328490
Summary:
Revert r325687 workaround for PR36032 since
a fix was committed in r326154.
Reviewers: sbaranga
Differential Revision: http://reviews.llvm.org/D44768
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 328257
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
Most basic possible test for the logic used by LICM.
Also contains a speculative build fix for compiles which complain about a definition of a stuct K; followed by a declaration as class K;
llvm-svn: 328058
As suggested in the original review (https://reviews.llvm.org/D44524), use an annotation style printer instead.
Note: The switch from -analyze to -disable-output in tests was driven by the fact that seems to be the idiomatic style used in annoation passes. I tried to keep both working, but the old style pass API for printers really doesn't make this easy. It invokes (runOnFunction, print(Module)) repeatedly. I decided the extra state wasn't worth it given the old pass manager is going away soonish anyway.
llvm-svn: 328015
Many of our loop passes make use of so called "must execute" or "guaranteed to execute" facts to prove the legality of code motion. The basic notion is that we know (by assumption) an instruction didn't fault at it's original location, so if the location we move it to is strictly post dominated by the original, then we can't have introduced a new fault.
At the moment, the testing for this logic is somewhat adhoc and done mostly through LICM. Since I'm working on that code, I want to improve the testing. This patch is the first step in that direction. It doesn't actually test the variant used by the loop passes - I need to move that to the Analysis library first - but instead exercises an alternate implementation used by SCEV. (I plan on merging both implementations.)
Note: I'll be replacing the printing logic within this with an annotation based version in the near future. Anna suggested this in review, and it seems like a strictly better format.
Differential Revision: https://reviews.llvm.org/D44524
llvm-svn: 328004
This is re-land of https://reviews.llvm.org/rL327362 with a fix
and regression test.
The crash was due to it is possible that for found MDL loop,
LHS or RHS may contain an invariant unknown SCEV which
does not dominate the MDL. Please see regression
test for an example.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44553
llvm-svn: 327822
As shown in the code comment, we don't need all of 'fast',
but we do need reassoc + nsz + nnan.
Differential Revision: https://reviews.llvm.org/D43765
llvm-svn: 327796
a long time.
The key thing is that we need to create value handles for every function
that we create a `FunctionInfo` object around. Without this, when that
function is deleted we can end up creating a new function that collides
with its address and look up a stale AA result. With that AA result we
can in turn miscompile code in ways that break.
This is seriously one of the most absurd miscompiles I've seen. It only
reproduced for us recently and only when building a very large server
with both ThinLTO and PGO.
A *HUGE* shout out to Wei Mi who tracked all of this down and came up
with this patch. I'm just landing it because I happened to still by at
a computer.
He or I can work on crafting a test case to hit this (now that we know
what to target) but it'll take a while, and we've been chasing this for
a long time and need it fix Right Now.
llvm-svn: 327761
This matcher implementation appears to be slightly more efficient than
the generic constant check that it is replacing because every use was
for matching FP patterns, but the previous code would check int and
pointer type nulls too.
llvm-svn: 327627
From the LangRef definition for frem:
"The value produced is the floating-point remainder of the two operands.
This is the same output as a libm ‘fmod‘ function, but without any
possibility of setting errno. The remainder has the same sign as the
dividend. This instruction is assumed to execute in the default
floating-point environment."
llvm-svn: 327626
Methods `computeExitLimitFromCondCached` and `computeExitLimitFromCondImpl` take
true and false branches as parameters and only use them for asserts and for identifying
whether true/false branch belongs to the loop (which can be done once earlier). This fact
complicates generalization of exit limit computation logic on guards because the guards
don't have blocks to which they go in case of failure explicitly.
The motivation of this patch is that currently this part of SCEV knows nothing about guards
and only works with explicit branches. As result, it fails to prove that a loop
for (i = 0; i < 100; i++)
guard(i < 10);
exits after 10th iteration, while in the equivalent example
for (i = 0; i < 100; i++)
if (i >= 10) break;
SCEV easily proves this fact. We are going to change it in near future, and this is why
we need to make these methods operate on more abstract level.
This patch refactors this code to get rid of these parameters as meaningless and prepare
ground for teaching these methods to work with guards as well as they work with explicit
branching instructions.
Differential Revision: https://reviews.llvm.org/D44419
llvm-svn: 327615
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.
The fold was also dependent on 'ninf' but that's not necessary.
From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two
operands with like signs) is exactly zero, the sign of that sum (or difference)
shall be +0...However, x + x = x − (−x) retains the same sign as x even when
x is zero."
llvm-svn: 327575
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
%tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed. This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33357
Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar
Reviewed by: dberlin
Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34135
llvm-svn: 327432
isAvailableAtLoopEntry duplicates logic of `properlyDominates` after checking invariance.
This patch replaces this logic with invocation of this method which is more profitable
because it supports caching.
Differential Revision: https://reviews.llvm.org/D43997
llvm-svn: 327373
It is a revert of rL327362 which causes build bot failures with assert like
Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.
llvm-svn: 327363
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
Collect set S all loops on which either LHS or RHS depend.
If S is non-empty
a. Let PD be the element of S which is dominated by all other elements of S
b. Let E(LHS) be value of LHS on entry of PD.
To get E(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their entry values.
Define E(RHS) in the same way.
c. Let B(LHS) be value of L on backedge of PD.
To get B(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their backedge values.
Define B(RHS) in the same way.
d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
so we can assert on that.
e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507
llvm-svn: 327362
Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.
* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
if (x) fptr=f1; return f1(x);
}
int foo(int x) {
(*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}
(bar.c)
int bar(int x) {
return foo(x);
}
```
At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.
This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.
One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
1:100000
1: foo:100000
1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: tejohnson
Subscribers: inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44399
llvm-svn: 327358
There are six separate instances of getPointerOperand() utility.
LoopVectorize.cpp has one of them,
and I don't want to create a 7th one while I'm trying to move
LoopVectorizationLegality into a separate file
(eventual objective is to move it to Analysis tree).
See http://lists.llvm.org/pipermail/llvm-dev/2018-February/120999.html
for llvm-dev discussions
Closes D43323.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327173
Summary:
Building MemorySSA gathers alias information for Defs/Uses.
Store and expose this information when optimizing uses (when building MemorySSA),
and when optimizing defs or updating uses (getClobberingMemoryAccess).
Current patch does not propagate alias information through MemoryPhis.
Reviewers: gbiv, dberlin
Subscribers: Prazek, sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38569
llvm-svn: 327035
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:
from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];
DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:
maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
src = {0,+,1}<nuw><nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
subscript 1
src = {-6,+,3}<nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
Separable = {}
Coupled = {1}
With the current patch, DA will correctly work on only one dimension:
maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
dst = {%A,+,404}<%for.body>
class = 1
loops = {1}
Separable = {0}
Coupled = {}
This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.
The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.
I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.
The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.
Patch written by Sebastian Pop and Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D35430
llvm-svn: 326837
Most of the folds based on SelectPatternResult belong in InstSimplify rather than
InstCombine, so the helper code should be available to other passes/analysis.
llvm-svn: 326812
The 'hasOneUse' check is a giveaway that something's not right.
We never need to check that in InstSimplify because we don't
create new instructions here.
These are all handled as icmp simplifies which then trigger
existing select simplifies, so there's no need to duplicate
a composite fold of the two.
llvm-svn: 326750
This is NFC for the moment (and independent of any potential NaN semantic
controversy). Besides making the code in InstSimplify easier to read, the
motivation is to eventually allow undef elements in vector constants to
match too. A proposal to add the base logic for that is in D43792.
llvm-svn: 326600
The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN`
can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D43810
llvm-svn: 326418
Removes verifyDomTree, using assert(verify()) everywhere instead, and
changes verify a little to always run IsSameAsFreshTree first in order
to print good output when we find errors. Also adds verifyAnalysis for
PostDomTrees, which will allow checking of PostDomTrees it the same way
we check DomTrees and MachineDomTrees.
Differential Revision: https://reviews.llvm.org/D41298
llvm-svn: 326315
This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element.
Differential Revision: https://reviews.llvm.org/D43789
llvm-svn: 326237
It appears that there were many cases where we were directly (through
templates) calling the dtor of MemoryAccess, which is conceptually an
abstract class.
This hasn't been a problem, since the data members of all of the
subclasses of MemoryAccess have been POD. I'm planning on changing that.
:)
llvm-svn: 326175
Set default value for IgnoreOtherLoops of SCEVInitRewriter::rewrite to true
to be consistent with SCEVPostIncRewriter which does not have this parameter
but behaves as it would be true.
This is follow up for rL326067.
llvm-svn: 326174
The patch introduces the new function in ScalarEvolution to get
all loops used in specified SCEV.
This is a preparation for re-writing isKnownPredicate utility as
described in https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43504
llvm-svn: 326072
The patch introduces the SCEVPostIncRewriter rewriter which
is similar to SCEVInitRewriter but rewrites AddRec with post increment
value of this AddRec.
This is a preparation for re-writing isKnownPredicate utility as
described in https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43499
llvm-svn: 326071
The patch introduces an additional parameter IgnoreOtherLoops to SCEVInitRewriter.
if it is equal to true then rewriter will not invalidate result in case
SCEV depends on other loops then specified during creation.
The patch does not change the default behavior.
This is a preparation for re-writing isKnownPredicate utility as
described in https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43498
llvm-svn: 326067
If we have a loop like this:
int n = 0;
while (...) {
if (++n >= MAX) {
n = 0;
}
}
then the body of the 'if' statement will only be executed once every MAX
iterations. Detect this by looking for branches in loops where taking the branch
makes the branch condition evaluate to 'not taken' in the next iteration of the
loop, and reduce the probability of such branches.
This slightly improves EEMBC benchmarks on cortex-m4/cortex-m33 due to making
better choices in if-conversion, but has no effect on any other cpu/benchmark
that I could detect.
Differential Revision: https://reviews.llvm.org/D35804
llvm-svn: 325925
Summary:
The current integer representation of relative block frequency prevents
representing relative block frequencies below 1. This change uses a 8 of
the 29 bits to represent the decimal part by using a fixed scale of -8.
Reviewers: tejohnson, davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43520
llvm-svn: 325823
SCEV has multiple occurences of code when we need to prove some predicate on
every iteration of a loop and do it with invocations of couple `isLoopEntryGuardedByCond`,
`isLoopBackedgeGuardedByCond`. This patch factors out these two calls into a separate
method. It is a preparation step to extend this logic: it is not the only way how we can prove
such conditions.
Differential Revision: https://reviews.llvm.org/D43373
llvm-svn: 325745
of turning SCEVUnknowns of PHIs into AddRecExprs.
This feature is now hidden behind the -scev-version-unknown flag.
Fixes PR36032 and PR35432.
llvm-svn: 325687
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.
llvm-svn: 325630
Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not
assume that because an operand constant matches that it's safe to return it as a result).
So I'm making that change here too (that diff could be independent, but I'm not sure how
to reveal it before the matcher change).
This also seems like a good reason to *not* include matchers that capture the value.
We don't want to encourage the potential misstep of propagating undef values when it's
not allowed/intended.
I didn't include the capture variant option here or in the related rL325437 (m_One),
but it already exists for other constant matchers.
llvm-svn: 325466
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36133
Reviewers: sebpop, dberlin, kuhar
Reviewed by: sebpop, kuhar
Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov
Differential Revision: https://reviews.llvm.org/D42717
llvm-svn: 325356
There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that
does constant range check and few more additional checks. We use it some places (e.g.
when proving implications) and in some other places we only check constant ranges.
Currently, indvar simplifier fails to remove the check in following loop:
int inc = ...;
for (int i = inc, j = inc - 1; i < 200; ++i, ++j)
if (i > j) { ... }
This patch replaces all usages of `isKnownPredicateViaConstantRanges` with
`isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the
case above.
Reviewed-By: sanjoy
Differential Revision: https://reviews.llvm.org/D43175
llvm-svn: 325214
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
These intrinsic folds were added with D41381, but only allowed with isFast().
That's more than necessary because FMF has 'reassoc' to apply to these
kinds of folds after D39304, and that's all we need in these cases.
Differential Revision: https://reviews.llvm.org/D43160
llvm-svn: 324967
The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies
and expects that it always returns it a recurrency. But this is not guaranteed to happen if we
have reached max recursion depth or refused to make SCEV simplification for other reasons.
This patch changes its implementation so that now it always returns SCEVAddRec without
relying on `getAddExpr`.
Differential Revision: https://reviews.llvm.org/D42953
llvm-svn: 324866
The last assume in the test says that %B12 is 0.
The first assume says that %and1 is less than %B12.
Therefore, %and1 is unsigned less than 0...does not compute.
That means this line:
Known.Zero.setHighBits(RHSKnown.countMinLeadingZeros() + 1);
...tries to set more bits than exist.
Differential Revision: https://reviews.llvm.org/D43052
llvm-svn: 324610
The failures happened because of assert which was overconfident about
SCEV's proving capabilities and is generally not valid.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324473
Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly.
But it is a common situation when `a >= b` is known from ranges and `a != b` is
known from a dominating condition. Thia patch teaches SCEV to sum these facts
together and prove strict comparison via non-strict one.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324453
Before r324429 we essentially didn't have a verification of LCSSA, so
no wonder that it has been broken: currently loop-sink breaks it (the
attached test illustrates the failure).
It was detected during a stage2 RA build, so to unbreak it I'm disabling
the check for now.
llvm-svn: 324445
Generalize existing constant matching to work with non-uniform constant vectors as well.
Differential Revision: https://reviews.llvm.org/D42818
llvm-svn: 324369
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
To bugs additionally fixed:
assert is moved after the check whether loop is not a nullptr.
Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow
is guarded by isAvailableAtLoopEntry.
Reviewers: sanjoy, mkazantsev, anna, dorit, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42417
llvm-svn: 324204
This patch implements analysis for new-format TBAA access tags
with aggregate types as their final access types.
Differential Revision: https://reviews.llvm.org/D41501
llvm-svn: 324092
Summary:
D42698 adds child_edge_{begin|end} and children_edges to GraphTraits
which are used here. The reason for this change is to make it easy to
use count propagation on ModulesummaryIndex. As it stands,
CallGraphTraits is in Analysis while ModuleSummaryIndex is in IR.
Reviewers: davidxl, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42703
llvm-svn: 323994
Since r322087, glibc's finite lib calls are generated when possible.
However, they are not supported on Android. This change also
disables other functions not available on Android.
Differential Revision: http://reviews.llvm.org/D42668
llvm-svn: 323898
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the Lint
analysis to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323886
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
This prevents functions accessing varargs from being inlined if they
have the alwaysinline attribute.
Reviewers: efriedma, rnk, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42556
llvm-svn: 323619
Summary:
The intent of this is to allow the code to be used with ThinLTO. In
Thinlink phase, a traditional Callgraph can not be computed even though
all the necessary information (nodes and edges of a call graph) is
available. This is due to the fact that CallGraph class is closely tied
to the IR. This patch first extends GraphTraits to add a CallGraphTraits
graph. This is then used to implement a version of counts propagation
on a generic callgraph.
Reviewers: davidxl
Subscribers: mehdi_amini, tejohnson, llvm-commits
Differential Revision: https://reviews.llvm.org/D42311
llvm-svn: 323475
It was reverted after buildbot regressions.
Original commit message:
This allows relative block frequency of call edges to be passed
to the thinlink stage where it will be used to compute synthetic
entry counts of functions.
llvm-svn: 323460
Summary:
This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42212
llvm-svn: 323349
Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.
Reviewers: spatel, reames
Reviewed By: spatel
Subscribers: magabari, llvm-commits
Differential Revision: https://reviews.llvm.org/D42485
llvm-svn: 323343
We're getting bug reports:
https://bugs.llvm.org/show_bug.cgi?id=35807https://bugs.llvm.org/show_bug.cgi?id=35840https://bugs.llvm.org/show_bug.cgi?id=36045
...where we blow up the stack in value tracking because other passes are sending
in selects that have an operand that is itself the select.
We don't currently have a reliable way to avoid analyzing dead code that may take
non-standard forms, so bail out when things go too far.
This mimics the recursion depth limitations in other parts of value tracking.
Unfortunately, this pushes the underlying problems for other passes (jump-threading,
simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those
again, the first draft of this patch on Phab would do that (it would assert rather
than bail out).
Differential Revision: https://reviews.llvm.org/D42442
llvm-svn: 323331
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.
Reviewers: srhines, pirama
Reviewed By: srhines
Subscribers: kongyi, chh, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42288
llvm-svn: 323187
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165
llvm-svn: 323077
Summary:
In ModRefInfo "Must" was introduced to track presence of MustAlias, but we still want to return NoModRef when there is neither Mod or Ref, even when MustAlias is found. Patch has small fixes to ensure this happens.
Minor cleanup to remove nesting for 2 if statements when calling getModRefInfo for 2 ImmutableCallSites.
Reviewers: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42209
llvm-svn: 322932
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
An alternative (and probably better) fix would be that of
making `Scale` an APInt, and there's a patch floating around
to do this. As we're still discussing it, at least stop crashing
in the meanwhile (added bonus, we now have a regression test for
this situation).
Fixes PR35843.
Thanks to Eli for suggesting the fix and Simon for reporting and
reducing the bug.
llvm-svn: 322467
This doesn't handle the more complicated case in the bug report yet:
https://bugs.llvm.org/show_bug.cgi?id=35790
For that, we have to match / look through a cast.
llvm-svn: 322327
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on
smaller canonicalizations to get us to the goal.
As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.
We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603
An Alive proof for one variant of the pattern to show that the
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1
Name: min3_nots
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%cmpxyz = icmp slt i8 %minxz, %ny
%r = select i1 %cmpxyz, i8 %minxz, i8 %ny
Name: min3_nots_alt
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%xz = icmp sgt i8 %x, %z
%maxxz = select i1 %xz, i8 %x, i8 %z
%xyz = icmp sgt i8 %maxxz, %y
%maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
%r = xor i8 %maxxyz, -1
llvm-svn: 322283
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).
This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.
I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.
Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: uabelho, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D41898
llvm-svn: 322181
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.
As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578
llvm-svn: 322058
This patch was part of:
https://reviews.llvm.org/D41338
...but we can expose the bug in IR via constant propagation
as shown in the test. Unless the triple includes 'linux', we
should not fold these because the functions don't exist on
other platforms (yet?).
llvm-svn: 322010
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
Summary:
I basically copied this patch from here:
https://reviews.llvm.org/D1251
But I skipped some of the refactoring to make the patch more clean.
The new outer3/inner3 test case in ptr-diff.ll triggers the
following assert without this patch:
lib/IR/Constants.cpp:1834: static llvm::Constant *llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant *, llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed.
The other new test cases makes sure that there is code coverage
for all modifications in InlineCost.cpp (getting different values
due to not fetching sizes for address space zero). I only guarantee
code coverage for those tests. The tests are not written in a way
that they would break if not having the corrections in
InlineCost.cpp. I found it quite hard to fine tune the tests into
getting different results based on the pointer sizes (except for
the test case where we hit an assert if not teaching InlineCost
about address spaces).
Reviewers: chandlerc, arsenm, haicheng
Reviewed By: arsenm
Subscribers: wdng, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D40455
llvm-svn: 321809
In one case, we were handling out of bounds, but not undef indices. In the other, we were handling undef (with the comment making the analogy to out of bounds), but not out of bounds. Be consistent and treat both undef and constant out of bounds indices as producing undefined results.
As a side effect, this also protects instcombine from having to handle large constant indices as we always simplify first.
llvm-svn: 321575
This reverts r321138. It seems there are still underlying issues with
memdep. PR35519 seems to still be present if debug info is enabled. We
end up losing a memcpy. Somehow during store to memset merging, we
insert the memset after the memcpy or fail to update the memdep analysis
to account for the newly inserted memset of a pair.
Reduced test case:
#include <assert.h>
#include <stdio.h>
#include <string>
#include <utility>
#include <vector>
void do_push_back(
std::vector<std::pair<std::string, std::vector<std::string>>>* crls) {
crls->push_back(std::make_pair(std::string(), std::vector<std::string>()));
}
int __attribute__((optnone)) main() {
// Put some data in the vector and then remove it so we take the push_back
// fast path.
std::vector<std::pair<std::string, std::vector<std::string>>> crl_set;
crl_set.push_back({"asdf", {}});
crl_set.pop_back();
printf("first word in vector storage: %p\n", *(void**)crl_set.data());
// Do the push_back which may fail to initialize the data.
do_push_back(&crl_set);
auto* first = &crl_set.back().first;
printf("first word in vector storage (should be zero): %p\n",
*(void**)crl_set.data());
assert(first->empty());
puts("ok");
}
Compile with libc++, enable optimizations, and enable debug info:
$ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib
This program will assert with this change.
llvm-svn: 321510
Summary:
When using byval, the data is effectively copied as part of the call
anyway, so we aren't actually passing the pointer and thus there is no
reason to issue a warning.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40118
llvm-svn: 321478
InsertBinop tries to find an appropriate instruction instead of
creating a new instruction. When it checks whether instruction is
the same as we need to create it ignores nuw/nsw/exact flags.
It leads to invalid behavior when poison instruction can be used
when it was not expected. Specifically, for example Expander
expands the SCEV built for instruction
%a = add i32 %v, 1
It is possible that InsertBinop can find an instruction
% b = add nuw nsw i32 %v, 1
and will use it instead of version w/o nuw nsw.
It is incorrect.
The patch conservatively ignores all instructions with any of
poison flags installed.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41576
llvm-svn: 321475
This is fix for the crash caused by ScalarEvolution::getTruncateExpr.
It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in
the beginning that it will not be there inside this method.
However during recursion and transformation/simplification for sub expression,
it is possible that these modifications will end up with the same SCEV as we started from.
So we must always check whether SCEV is in cache and do not insert item if it is already there.
Reviewers: sanjoy, mkazantsev, craig.topper
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41380
llvm-svn: 321472
This is a preliminary step for the patch discussed in D41136 (and denoted here with the FIXME comment).
When we match an FP min/max that is cast to integer, any intermediate difference between +0.0 or -0.0
should be muted in the result by the conversion (either fptosi or fptoui) of the result. Thus, we can
enable 'nsz' for the purpose of matching fmin/fmax.
Note that there's probably room to generalize this more, possibly by fixing the current calls to the
weak version of isKnownNonZero() in matchSelectPattern() to the more powerful recursive version.
Differential Revision: https://reviews.llvm.org/D41333
llvm-svn: 321456
Summary:
Make MemorySSA allow reordering of two loads that may alias, when one is volatile.
This makes MemorySSA less conservative and behaving the same as the AliasSetTracker.
For more context, see D16875.
LLVM language reference: "The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior."
Reviewers: george.burgess.iv, dberlin
Subscribers: sanjoy, reames, hfinkel, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D41525
llvm-svn: 321382
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.
This patch checks for the long dependence chain, and give up if-conversion if find one.
Differential Revision: https://reviews.llvm.org/D39352
llvm-svn: 321377
Currently, inline cost model considers a binary operator as free only if both
its operands are constants. Some simple cases are missing such as a + 0, a - a,
etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without
going through simplifyInstruction() to get rid of the constant restriction.
Thus, visitAnd() and visitOr() are not needed.
Differential Revision: https://reviews.llvm.org/D41494
llvm-svn: 321366
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.
While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.
Differential Revision: https://reviews.llvm.org/D41522
llvm-svn: 321332
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331
Summary:
Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases.
Shift existing Mod/Ref/ModRef values to include an additional most
significant bit. Update wrappers that modify ModRefInfo values to
reflect the change.
Notes:
* ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it.
* Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA).
* GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost.
* There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location.
Reviewers: dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38862
llvm-svn: 321309
Summary:
The function section prefix for PGO based layout (e.g. hot/unlikely)
should look at the hotness of all blocks not just the entry BB.
A function with a cold entry but a very hot loop should be placed in the
hot section, for example, so that it is located close to other hot
functions it may call. For SamplePGO it was already looking at the
branch weights on calls, and I made that code conditional on whether
this is SamplePGO since it was essentially a noop for instrumentation
PGO anyway.
Reviewers: davidxl
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41395
llvm-svn: 321197
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
This is r319482 and r319483, along with fixes for PR35519: fix the
optimization that merges stores into memsets to preserve cached memdep
info, and fix memdep's non-local caching strategy to not assume that larger
queries are always more conservative than smaller ones.
Fixes PR28958 and PR35519.
Differential Revision: https://reviews.llvm.org/D40802
llvm-svn: 321138
There are cases when two tags with different base types denote
accesses to the same direct or indirect member of a structure
type. Currently, merging of such tags results in a tag that
represents an access to an object that has the type of that
member. This patch changes this so that if one of the accesses
encloses the other, then the generic tag is the one of the
enclosed access.
Differential Revision: https://reviews.llvm.org/D39557
llvm-svn: 321019
Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation.
Patch by Surya Kumari Jangala!
Differential Revision: https://reviews.llvm.org/D40886
llvm-svn: 320983
When unsafe algerbra is allowed calls to cabs(r) can be replaced by:
sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))
Patch by Paul Walker, thanks!
Differential Revision: https://reviews.llvm.org/D40069
llvm-svn: 320901
SROA analysis of InlineCost can figure out that some stores can be removed
after inlining and then the repeated loads clobbered by these stores are also
free. This patch finds these clobbered loads and adjust the inline cost
accordingly.
Differential Revision: https://reviews.llvm.org/D33946
llvm-svn: 320814
We cannot move the insertion point to header if SCEV contains div/rem
operations due to they may go over check for zero denominator.
Reviewers: sanjoy, mkazantsev, sebpop
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41229
llvm-svn: 320789
Most of the -Wsign-compare warnings are due to the fact that
enums are signed by default in the MS ABI, while the
tautological comparison warnings trigger on x86 builds where
sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max()
is always false.
Differential Revision: https://reviews.llvm.org/D41256
llvm-svn: 320750
Summary:
The function is meant to recurse until it comes upon the
phi it's looking for. However, with the current condition,
it will recurse until it finds anything _but_ the phi.
The function will even fail for simple cases like:
%i = phi i32 [ %inc, %loop ], ...
...
%inc = add i32 %i, 1
because the base condition will not happen when the phi
is recursed to, and the recursion will end with a 'false'
result since the previous instruction is a phi.
Reviewers: sanjoy, atrick
Reviewed By: sanjoy
Subscribers: Ka-Ka, bjope, llvm-commits
Committing on behalf of: Bevin Hansson (bevinh)
Differential Revision: https://reviews.llvm.org/D40946
llvm-svn: 320700
This patch fix this FIXME in visitPHI()
FIXME: We should potentially be tracking values through phi nodes,
especially when they collapse to a single value due to deleted CFG edges
during inlining.
Differential Revision: https://reviews.llvm.org/D38594
llvm-svn: 320699
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
This is the last step in addressing PR30654.
llvm-svn: 320672
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mgrang, dcaballe, hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 320548
CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate
which allows the PSCEV rewriter to rewrite this scev expression:
(zext i8 {0, + , (trunc i32 step to i8)} to i32)
into
{0, +, (sext i8 (trunc i32 step to i8) to i32)}
But then it adds the wrong Equal predicate:
%step == (zext i8 (trunc i32 %step to i8) to i32).
instead of:
%step == (sext i8 (trunc i32 %step to i8) to i32)
This is fixed here.
Differential Revision: https://reviews.llvm.org/D40641
llvm-svn: 320298
When the lowest bits of the operands to an integer multiply are known, the low bits of the result are deducible.
Code to deduce known-zero bottom bits already existed, but this change improves on that by deducing known-ones.
Patch by: Pedro Ferreira
Reviewers: craig.topper, sanjoy, efriedma
Differential Revision: https://reviews.llvm.org/D34029
llvm-svn: 320269
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
Causes unexpected memory issue with New PM this time.
The new PM invalidates BPI but not BFI, leaving the
reference to BPI from BFI invalid.
Abandon this patch. There is a more general solution
which also handles runtime infinite loop (but not statically).
llvm-svn: 320180
In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate
by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke
`computeShiftCompareExitLimit` with Values from which the SCEVs have been derived,
these Values have not been modified while `Cond` could be.
One of possible outcomes of this is that we may falsely prove that an infinite loop ends
within some finite number of iterations.
In this patch, we save the original `Cond` and pass it along with original operands.
This logic may be removed in future once `computeShiftCompareExitLimit` works
with SCEVs instead of value operands.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40953
llvm-svn: 320142
Summary:
Make enum ModRefInfo an enum class. Changes to ModRefInfo values should
be done using inline wrappers.
This should prevent future bit-wise opearations from being added, which can be more error-prone.
Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40933
llvm-svn: 320107
Summary:
An undef extract index can be arbitrarily chosen to be an
out-of-range index value, which would result in the instruction being undef.
This change closes a gap identified while working on lowering vector permute intrinsics
with variable index vectors to pure LLVM IR.
Reviewers: arsenm, spatel, majnemer
Reviewed By: arsenm, spatel
Subscribers: fhahn, nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D40231
llvm-svn: 319910
Lexicographical comparison of SCEV trees is potentially expensive for big
expression trees. We can define ordering between them for AddRecs and
N-ary operations by SCEV NoWrap flags to make non-equality check
cheaper.
This change does not prevent grouping eqivalent SCEVs together and is
not supposed to have any meaningful impact on behavior of any transforms.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40645
llvm-svn: 319889
Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s:
every time it sees one, it creates a new value cache and tries to prove equality of two values using it.
This cache reallocates and gets lost from SCEV to SCEV.
This patch changes this behavior: now we create one cache for all values and share it between SCEVs.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D40597
llvm-svn: 319880
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
llvm-svn: 319873
Summary:
The aim is to make ModRefInfo checks and changes more intuitive
and less error prone using inline methods that abstract the bit operations.
Ideally ModRefInfo would become an enum class, but that change will require
a wider set of changes into FunctionModRefBehavior.
Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel
Subscribers: nlopes, llvm-commits
Differential Revision: https://reviews.llvm.org/D40749
llvm-svn: 319821
Summary:
I don't think rL309080 is the right fix for PR33494 -- caching ExitLimit only
hides the problem[0]. The real issue is that because of how we forget SCEV
expressions ScalarEvolution::getBackedgeTakenInfo, in the test case for PR33494
computing the backedge for any loop invalidates the trip count for every other
loop. This effectively makes the SCEV cache useless.
I've instead made the SCEV expression invalidation in
ScalarEvolution::getBackedgeTakenInfo less aggressive to fix this issue.
[0]: One way to think about this is that rL309080 essentially augmented the
backedge-taken-count cache with another equivalent exit-limit cache. The bug
went away because we were explicitly not clearing the exit-limit cache in
getBackedgeTakenInfo. But instead of doing all of that, we can just avoid
clearing the backedge-taken-count cache.
Reviewers: mkazantsev, mzolotukhin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D39361
llvm-svn: 319678
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
Re-commit after fixing the failing testcase in rL319576, rL319577 and
rL319578.
llvm-svn: 319581
Summary:
Adding support for -print-module-scope similar to how it is
being done for function passes. This option causes loop-pass printer
to emit a whole-module IR instead of just a loop itself.
Reviewers: sanjoy, silvas, weimingz
Reviewed By: sanjoy
Subscribers: apilipenko, skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D40247
llvm-svn: 319566
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
llvm-svn: 319556
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
This teaches memcpyopt to make a non-local memdep query when a local query
indicates that the dependency is non-local. This notably allows it to
eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
Fixes PR28958.
Differential Revision: https://reviews.llvm.org/D38374
llvm-svn: 319482
Currently, we use a set of pairs to cache responces like `CompareValueComplexity(X, Y) == 0`. If we had
proved that `CompareValueComplexity(S1, S2) == 0` and `CompareValueComplexity(S2, S3) == 0`,
this cache does not allow us to prove that `CompareValueComplexity(S1, S3)` is also `0`.
This patch replaces this set with `EquivalenceClasses` that merges Values into equivalence sets so that
any two values from the same set are equal from point of `CompareValueComplexity`. This, in particular,
allows us to prove the fact from example above.
Differential Revision: https://reviews.llvm.org/D40429
llvm-svn: 319153
Currently, we use a set of pairs to cache responces like `CompareSCEVComplexity(X, Y) == 0`. If we had
proved that `CompareSCEVComplexity(S1, S2) == 0` and `CompareSCEVComplexity(S2, S3) == 0`,
this cache does not allow us to prove that `CompareSCEVComplexity(S1, S3)` is also `0`.
This patch replaces this set with `EquivalenceClasses` any two values from the same set are equal from
point of `CompareSCEVComplexity`. This, in particular, allows us to prove the fact from example above.
Differential Revision: https://reviews.llvm.org/D40428
llvm-svn: 319149
Summary:
For a given loop, getLoopLatch returns a non-null value
when a loop has only one latch block. In the modified
context adding an assertion to check that both the outgoing branches of
a terminator instruction (of latch) does not target same header.
+
few minor code reorganization.
Reviewers: jbhateja
Reviewed By: jbhateja
Subscribers: sanjoy
Differential Revision: https://reviews.llvm.org/D40460
llvm-svn: 318997
Summary:
For a given loop, getLoopLatch returns a non-null value
when a loop has only one latch block. In the modified
context a check on both the outgoing branches of a terminator instruction (of latch) to same header is redundant.
Reviewers: jbhateja
Reviewed By: jbhateja
Subscribers: sanjoy
Differential Revision: https://reviews.llvm.org/D40460
llvm-svn: 318991
Summary:
Loop-pass printing is somewhat deficient since it does not provide the
context around the loop (e.g. preheader). This context information becomes
pretty essential when analyzing transformations that move stuff out of the loop.
Extending printLoop to cover preheader and exit blocks (if any).
Reviewers: sanjoy, silvas, weimingz
Reviewed By: sanjoy
Subscribers: apilipenko, skatkov, llvm-commits
Differential Revision: https://reviews.llvm.org/D40246
llvm-svn: 318878
Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively.
When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if
`L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are
consecutive sibling loops within the parent loop. In this case, `AR2` is also varying
w.r.t. `L1`, but we don't correctly identify it.
It can lead, for exaple, to attempt of incorrect folding. Consider:
AR1 = {a,+,b}<L1>
AR2 = {c,+,d}<L2>
EXAR2 = sext(AR1)
MUL = mul AR1, EXAR2
If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to
construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect
because `AR2` is not available on entrance of `L1`.
Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled
with one check: "header of `L1` dominates header of `L2`". This patch replaces the old
insufficient check with this one.
Differential Revision: https://reviews.llvm.org/D39453
llvm-svn: 318819
Summary:
First step in adding MemorySSA as dependency for loop pass manager.
Adding the dependency under a flag.
New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null.
Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default.
Reviewers: sanjoy, davide, gberry
Subscribers: mehdi_amini, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D40274
llvm-svn: 318772
The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions:
FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans)
FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) | isnan(Y)
So we can simplify patterns like this:
(fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y
(fcmp uno (known NNAN), X) || (fcmp uno X, Y) --> fcmp uno X, Y
It might be better to split this into (X uno 0) | (Y uno 0) as a canonicalization, but that
would be another patch.
Differential Revision: https://reviews.llvm.org/D40130
llvm-svn: 318627