This make it consistent with STATISTIC which it will often appears near.
While there move one DEBUG_COUNTER instance out of an anonymous namespace. It's already declaring a static variable so the namespace is unnecessary.
llvm-svn: 310637
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
In the recursive call to isAMCompletelyFolded(), the passed offset should be
the sum of F.BaseOffset and Fixup.Offset.
Review: Quentin Colombet.
llvm-svn: 310462
When a new phi is generated for scalarpre of an expression, the phiTranslate cache
will become stale: Before PRE, the candidate expression must not be available in a
predecessor block, and phitranslate will cache the information. After PRE, the
expression will become available in all predecessor blocks, so the related entries
in phiTranslate cache becomes stale. The patch will simply remove the stale entries
so phiTranslate can be recomputed next time.
The stale entries in phitranslate cache will not affect correctness but will cause
missing PRE opportunity for later instructions.
Differential Revision: https://reviews.llvm.org/D36124
llvm-svn: 310421
results when a loop is completely removed.
This is very hard to manifest as a visible bug. You need to arrange for
there to be a subsequent allocation of a 'Loop' object which gets the
exact same address as the one which the unroll deleted, and you need the
LoopAccessAnalysis results to be significant in the way that they're
stale. And you need a million other things to align.
But when it does, you get a deeply mysterious crash due to actually
finding a stale analysis result. This fixes the issue and tests for it
by directly checking we successfully invalidate things. I have not been
able to get *any* test case to reliably trigger this. Changes to LLVM
itself caused the only test case I ever had to cease to crash.
I've looked pretty extensively at less brittle ways of fixing this and
they are actually very, very hard to do. This is a somewhat strange and
unusual case as we have a pass which is deleting an IR unit, but is not
running within that IR unit's pass framework (which is what handles this
cleanly for the normal loop unroll). And where there isn't a definitive
way to clear *all* of the stale cache entries. And where the pass *is*
updating the core analysis that provides the IR units!
For example, we don't have any of these problems with Function analyses
because it is easy to clear out function analyses when the functions
themselves may have been deleted -- we clear an entire module's worth!
But that is too heavy of a hammer down here in the LoopAnalysisManager
layer.
A better long-term solution IMO is to require that AnalysisManager's
make their keys durable to this kind of thing. Specifically, when
caching an analysis for one IR unit that is conceptually "owned" by
a higher level IR unit, the AnalysisManager should incorporate this into
its data structures so that we can reliably clear these results without
having to teach each and every pass to do so manually as we do here. But
that is a change for another day as it will be a fairly invasive change
to the AnalysisManager infrastructure. Until then, this fortunately
seems to be quite rare.
llvm-svn: 310333
The root cause of reverting was fixed - PR33514.
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).
llvm-svn: 310238
Summary:
The bug was uncovered after fix of PR23384 (part 3 of 3).
The patch restricts pointer multiplication in SCEV computaion for ICmpZero.
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D36170
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 310092
Summary:
This fixes PR31777.
If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.
In the future we should also support vectors of integers. And maybe
float/double if we can.
Reviewers: hfinkel, junbuml, jfb, RKSimon, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30703
llvm-svn: 310055
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.
When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.
*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).
Reviewers: davidxl
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36288
llvm-svn: 310005
Summary:
Peeling should not occur during the full unrolling invocation early
in the pipeline, but rather later with partial and runtime loop
unrolling. The later loop unrolling invocation will also eventually
utilize profile summary and branch frequency information, which
we would like to use to control peeling. And for ThinLTO we want
to delay peeling until the backend (post thin link) phase, just as
we do for most types of unrolling.
Ensure peeling doesn't occur during the full unrolling invocation
by adding a parameter to the shared implementation function, similar
to the way partial and runtime loop unrolling are disabled.
Performance results for ThinLTO suggest this has a neutral to positive
effect on some internal benchmarks.
Reviewers: chandlerc, davidxl
Subscribers: mzolotukhin, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36258
llvm-svn: 309966
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.
Reviewers: chandlerc
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36157
llvm-svn: 309886
Summary:
This patch makes LoopDeletion use the incremental DominatorTree API.
We modify LoopDeletion to perform the deletion in 5 steps:
1. Create a new dummy edge from the preheader to the exit, by adding a conditional branch.
2. Inform the DomTree about the new edge.
3. Remove the conditional branch and replace it with an unconditional edge to the exit. This removes the edge to the loop header, making it unreachable.
4. Inform the DomTree about the deleted edge.
5. Remove the unreachable block from the function.
Creating the dummy conditional branch is necessary to perform incremental DomTree update.
We should consider using the batch updater when it's ready.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin, grosser
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35391
llvm-svn: 309850
Summary:
Adding part of the changes in D30369 (needed to make progress):
Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA.
Original summary from D30369, by dberlin:
Currently, we have instructions which affect memory but have no memory
location. If you call, for example, MemoryLocation::get on a fence,
it asserts. This means things specifically have to avoid that. It
also means we end up with a copy of each API, one taking a memory
location, one not.
This starts to fix that.
We add MemoryLocation::getOrNone as a new call, and reimplement the
old asserting version in terms of it.
We make MemoryLocation optional in the (Instruction, MemoryLocation)
version of getModRefInfo, and kill the old one argument version in
favor of passing None (it had one caller). Now both can handle fences
because you can just use MemoryLocation::getOrNone on an instruction
and it will return a correct answer.
We use all this to clean up part of MemorySSA that had to handle this difference.
Note that literally every actual getModRefInfo interface we have could be made private and replaced with:
getModRefInfo(Instruction, Optional<MemoryLocation>)
and
getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>)
and delegating to the right ones, if we wanted to.
I have not attempted to do this yet.
Reviewers: dberlin, davide, dblaikie
Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D35441
llvm-svn: 309641
Summary:
Since r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.
Reviewers: MatzeB
Reviewed By: MatzeB
Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D35949
llvm-svn: 309553
Summary:
Without any information about the called function, we cannot be sure
that it is safe to interchange loops which contain function calls. For
example there could be dependences that prevent interchanging between
accesses in the called function and the loops. Even functions without any
parameters could cause problems, as they could access memory using
global pointers.
For now, I think it is only safe to interchange loops with calls marked
as readnone.
With this patch, the LLVM test suite passes with `-O3 -mllvm
-enable-loopinterchange` and LoopInterchangeProfitability::isProfitable
returning true for all loops. check-llvm and check-clang also pass when
bootstrapped in a similar fashion, although only 3 loops got
interchanged.
Reviewers: karthikthecool, blitz.opensource, hfinkel, mcrosier, mkuper
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35489
llvm-svn: 309547
Recommit after workaround the bug PR31652.
Three bugs fixed in previous recommits: The first one is to use CurrentBlock
instead of PREInstr's Parent as param of performScalarPREInsertion because
the Parent of a clone instruction may be uninitialized. The second one is stop
PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst
is defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 309397
JumpThreading claims to preserve LVI, but it doesn't preserve
the analyses which LVI holds a reference to (e.g. the Dominator).
In the current pass manager infrastructure, after JT runs, the
PM frees these analyses (including DominatorTree) but preserves
LVI.
CorrelatedValuePropagation runs immediately after and queries
a corrupted domtree, causing weird miscompiles.
This commit disables the preservation of LVI for the time being.
Eventually, we should either move LVI to a proper dependency
tracking mechanism (i.e. an analyses shouldn't hold references
to other analyses and compute them on demand if needed), or
we should teach all the passes preserving LVI to preserve the
analyses LVI depends on.
The new pass manager has a mechanism to invalidate LVI in case
one of the analyses it depends on becomes invalid, so this problem
shouldn't exist (at least not in this immediate form), but handling
of analyses holding references is still a very delicate subject.
Fixes PR33917 (and rustc).
llvm-svn: 309355
Summary:
It is possible for some passes to materialize a call to a libcall (ex: ldexp, exp2, etc),
but these passes will not mark the call as a gc-leaf-function. All libcalls are
actually gc-leaf-functions, so we change llvm::callsGCLeafFunction() to tell us that
available libcalls are equivalent to gc-leaf-function calls.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35840
llvm-svn: 309291
This is a workaround for the bug described in PR31652 and
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary
solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for
some simple patterns we can know the equality comparison may contains undef,
so we regard such comparison as unsafe and will not do loop-unswitching for
them. We also need to disable the select simplification when one of select
operand is undef and its result feeds into equality comparison.
The patch cannot clear the safety issue caused by the bug, but it can suppress
the issue from happening to some extent.
Differential Revision: https://reviews.llvm.org/D35811
llvm-svn: 309059
it when safe.
Very often the BE count is the trip count minus one, and the plus one
here should fold with that minus one. But because the BE count might in
theory be UINT_MAX or some such, adding one before we extend could in
some cases wrap to zero and break when we scale things.
This patch checks to see if it would be safe to add one because the
specific case that would cause this is guarded for prior to entering the
preheader. This should handle essentially all of the common loop idioms
coming out of C/C++ code once canonicalized by LLVM.
Before this patch, both forms of loop in the added test cases ended up
subtracting one from the size, extending it, scaling it up by 8 and then
adding 8 back onto it. This is really silly, and it turns out made it
all the way into generated code very often, so this is a surprisingly
important cleanup to do.
Many thanks to Sanjoy for showing me how to do this with SCEV.
Differential Revision: https://reviews.llvm.org/D35758
llvm-svn: 308968
Summary:
The remaining non range-based for loops do not iterate over full ranges,
so leave them as they are.
Reviewers: karthikthecool, blitz.opensource, mcrosier, mkuper, aemerson
Reviewed By: aemerson
Subscribers: aemerson, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35777
llvm-svn: 308872
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
Large CFGs can cause us to blow up the stack because we would have a
recursive step for each basic block in a region.
Instead, create a worklist and iterate it. This limits the stack usage
to something more manageable.
Differential Revision: https://reviews.llvm.org/D35609
llvm-svn: 308582
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: efriedma
Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35411
llvm-svn: 308422
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
In some particular cases eq/ne conditions can be turned into equivalent
slt/sgt conditions. This patch teaches parseLoopStructure to handle some
of these cases.
Differential Revision: https://reviews.llvm.org/D35010
llvm-svn: 308264
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.
Fixes PR33756
Reviewers: dberlin, davide
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D35317
llvm-svn: 308051
Add the following pattern to TryToUnfoldSelectInCurrBB()
bb:
%p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ...
%c = cmp %p, 0
%s = select %c, trueval, falseval
The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.
Differential Revision: https://reviews.llvm.org/D34762
llvm-svn: 308050
When iterating through loop
for (int i = INT_MAX; i > 0; i--)
We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.
In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.
Also added a test that ensures that a pre-loop is generated for such loops.
Differential Revision: https://reviews.llvm.org/D35347
llvm-svn: 308001
Summary:
LoopRotate manually updates the DoomTree by iterating over all predecessors of a basic block and computing the Nearest Common Dominator.
When a predecessor happens to be unreachable, `DT.findNearestCommonDominator` returns nullptr.
This patch teaches LoopRotate to handle this case and fixes [[ https://bugs.llvm.org/show_bug.cgi?id=33701 | PR33701 ]].
In the future, LoopRotate should be taught to use the new incremental API for updating the DomTree.
Reviewers: dberlin, davide, uabelho, grosser
Subscribers: efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35074
llvm-svn: 307828
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.
Fixes PR33720.
llvm-svn: 307699
Summary:
As metioned in https://reviews.llvm.org/D34576, checkings in
`collectConstantCandidates` can be replaced by using
`llvm::canReplaceOperandWithVariable`.
The only special case is that `collectConstantCandidates` return false for
all `IntrinsicInst` but it is safe for us to collect constant candidates from
`IntrinsicInst`.
Reviewers: pirama, efriedma, srhines
Reviewed By: efriedma
Subscribers: llvm-commits, javed.absar
Differential Revision: https://reviews.llvm.org/D34921
llvm-svn: 307587
InferAddressSpaces does not check address space in collectFlatAddressExpressions,
which causes values with non flat address space put into Postorder and causes
assertion in cloneValueWithNewAddressSpace.
This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space
subtest for amdgcn target.
Differential Revision: https://reviews.llvm.org/D34991
llvm-svn: 307349
Using profile information to guide consthoisting is generally helpful for
performance, so the patch turns it on by default. No compile time or perf
regression were found using spec2000 and spec2006 on x86. Some significant
improvement (>20%) was seen on internal benchmarks.
Differential Revision: https://reviews.llvm.org/D35063
llvm-svn: 307338
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.
Differential Revision: https://reviews.llvm.org/D35084
llvm-svn: 307328
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.
The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.
To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.
Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.
Differential Revision: https://reviews.llvm.org/D34583
llvm-svn: 307269
Summary: This makes it easier to find out which limitation prevented this pass from doing its work.
Reviewers: karthikthecool, mzolotukhin, efriedma, mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34940
llvm-svn: 307035
This reverts commit r306313. This breaks selfhost at -O3 and PR33652.
Let me know if you need additional information on reproducing the issue.
llvm-svn: 307021
Summary:
Indices for GEPs that index into a struct type should always be
constants. This added more checks in `collectConstantCandidates:` which make
sure constants for GEP pointer type are not hoisted.
This fixed Bug https://bugs.llvm.org/show_bug.cgi?id=33538
Reviewers: ributzka, rnk
Reviewed By: ributzka
Subscribers: efriedma, llvm-commits, srhines, javed.absar, pirama
Differential Revision: https://reviews.llvm.org/D34576
llvm-svn: 306704
A slightly more efficient way to get constant, we avoid resolving in getSCEV and excessive
invocations, and we don't create a ConstantInt if 'true' branch is taken.
Differential Revision: https://reviews.llvm.org/D34672
llvm-svn: 306503
SROA assumes alloca address space is 0, which causes assertion. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D34104
llvm-svn: 306440
This is based heavily on the work done ni D34285. I mostly wanted to do
test cleanup for the author to save them some time, but I had a really
hard time understanding why it was so hard to write better test cases
for these issues.
The problem is that because SROA does a second rewrite of the loads and
because we *don't* propagate !nonnull for non-pointer loads, we first
introduced invalid !nonnull metadata and then stripped it back off just
in time to avoid most ways of this PR manifesting. Moving to the more
careful utility only fixes this by changing the predicate to look at the
new load's type rather than the target type. However, that *does* fix
the bug, and the utility is much nicer including adding range metadata
to model the nonnull property after a conversion to an integer.
However, we have bigger problems because we don't actually propagate
*range* metadata, and the utility to do this extracted from instcombine
isn't really in good shape to do this currently. It *only* handles the
case of copying range metadata from an integer load to a pointer load.
It doesn't even handle the trivial cases of propagating from one integer
load to another when they are the same width! This utility will need to
be beefed up prior to using in this location to get the metadata to
fully survive.
And even then, we need to go and teach things to turn the range metadata
into an assume the way we do with nonnull so that when we *promote* an
integer we don't lose the information.
All of this will require a new test case that looks kind-of like
`preserve-nonnull.ll` does here but focuses on range metadata. It will
also likely require more testing because it needs to correctly handle
changes to the integer width, especially as SROA actively tries to
change the integer width!
Last but not least, I'm a little worried about hooking the range
metadata up here because the instcombine logic for converting from
a range metadata *to* a nonnull metadata node seems broken in the face
of non-zero address spaces where null is not mapped to the integer `0`.
So that probably needs to get fixed with test cases both in SROA and in
instcombine to cover it.
But this *does* extract the core PR fix from D34285 of preventing the
!nonnull metadata from being propagated in a broken state just long
enough to feed into promotion and crash value tracking.
On D34285 there is some discussion of zero-extend handling because it
isn't necessary. First, the new load size covers all of the non-undef
(ie, possibly initialized) bits. This may even extend past the original
alloca if loading those bits could produce valid data. The only way its
valid for us to zero-extend an integer load in SROA is if the original
code had a zero extend or those bits were undef. And we get to assume
things like undef *never* satifies nonnull, so non undef bits can
participate here. No need to special case the zero-extend handling, it
just falls out correctly.
The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to
save a few rounds of trivial edits fixing style issues and test case
formulation.
Differental Revision: D34285
llvm-svn: 306379
Summary:
EraseInst didn't report that it made IR changes through MadeChange.
It is essential that changes to the IR are reported correctly,
since for example ReassociatePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().
Reviewers: craig.topper, rnk, davide
Reviewed By: rnk, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34616
llvm-svn: 306368
The recommit fixes three bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
llvm-svn: 306313
Recommit NFC patch (rL306157) where I missed incrementing the basic block iterator,
which caused loop deletion tests to hang due to infinite loop.
Had reverted it in rL306162.
rL306157 commit message:
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
llvm-svn: 306254
This reverts commit r306157.
It caused some timeouts in clang tests. Perhaps unreachable loops have
far too many phi nodes.
Reverting and investigating.
llvm-svn: 306162
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
llvm-svn: 306157
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.
But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.
This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.
Differential Revision: https://reviews.llvm.org/D33262
llvm-svn: 306085
Summary:
Currently, we incorrectly update exit blocks of loops when there are multiple
edges from a single exiting block to the exit block. This can happen when we
have switches as the terminator of the exiting blocks.
The fix here is to correctly update the phi nodes in the exit block, and remove
all incoming values *except* for one which is from the preheader.
Note: Currently, this error can manifest only while deleting non-executed loops. However, it
is possible to trigger this error in invariant loops, once we enhance the logic
around the exit conditions for the loop check.
Reviewers: chandlerc, dberlin, sanjoy, efriedma
Reviewed by: efriedma
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34516
llvm-svn: 306048
We weren't actually checking for duplicated stores, as the condition
was always actually false. This was found by Coverity, and I have
no clue how to trigger this in real-world code (although I
tried for a bit).
llvm-svn: 305867
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.
> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 305720
Summary:
Currently we don't try to do anything with vector xors.
This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.
Reviewers: mcrosier, davide, resistor
Reviewed By: mcrosier
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34338
llvm-svn: 305704
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.
Reviewers: reames, dberlin, davide, sanjoy
Reviewed by: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34108
llvm-svn: 305699
Summary: use AA to tell whether a load can be moved before a call that writes to memory.
Reviewers: dberlin, davide, sanjoy, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D34115
llvm-svn: 305698
The recommit fixes two bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 305578
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
This way we end up not looking at PHI args already removed.
MemSSA now goes through the updater so we can prune
it to avoid having redundant MemoryPHI arguments, but that
doesn't quite work for the general case.
Discussed with Daniel Berlin, fixes PR33406.
llvm-svn: 305409
Summary:
After RS4GC, we should drop metadata that is no longer valid. These metadata
is used by optimizations scheduled after RS4GC, and can cause a miscompile.
One such metadata is invariant.load which is used by LICM sinking transform.
After rewriting statepoints, the address of a load maybe relocated. With
invariant.load metadata on a load instruction, LICM sinking assumes the
loaded value (from a dererenceable address) to be invariant, and
rematerializes the load operand and the load at the exit block.
This transforms the IR to have an unrelocated use of the
address after a statepoint, which is incorrect.
Other metadata we conservatively remove are related to
dereferenceability and noalias metadata.
This patch drops such metadata on store and load instructions after
rewriting statepoints.
Reviewers: reames, sanjoy, apilipenko
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33756
llvm-svn: 305234
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in
GEPOperator::accumulateConstantOffset.
Basically it does not consider the situation that the pointer operand of load or store
may be in a non-zero address space and its size may be different from the size of
a pointer in address space 0.
This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend.
Diffferential Revision: https://reviews.llvm.org/D33298
llvm-svn: 305107
Summary:
isSafeToSpeculativelyExecute is the wrong predicate to use here.
All that checks for is whether it is safe to hoist a value due to
unaligned/un-dereferencable accesses. However, not only are we doing
sinking rather than hoisting, our concern is that the location
we're loading from may have been modified. Instead forbid sinking
any load across a critical edge.
Reviewers: majnemer
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D33179
llvm-svn: 305102
This change adds an option disable-lftr to be able to disable Linear Function Test Replace optimization.
By default option is off so current behavior is not changed.
Reviewers: reames, sanjoy, wmi, andreadb, apilipenko
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33979
llvm-svn: 305055
The zero heuristic assumes that integers are more likely positive than negative,
but this also has the effect of assuming that strcmp return values are more
likely positive than negative. Given that for nonzero strcmp return values it's
the ordering of arguments that determines the sign of the result there's no
reason to assume that's true.
Fix this by inspecting the LHS of the compare and using TargetLibraryInfo to
decide if it's strcmp-like, and if so only assume that nonzero is more likely
than zero i.e. strings are more often different than the same. This causes a
slight code generation change in the spec2006 benchmark 403.gcc, but with no
noticeable performance impact. The intent of this patch is to allow better
optimisation of dhrystone on Cortex-M cpus, but currently it won't as there are
also some changes that need to be made to if-conversion.
Differential Revision: https://reviews.llvm.org/D33934
llvm-svn: 304970
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
1. When there is no perfect iteration order, we can't let phi nodes
put themselves in terms of things that come later in the iteration
order, or we will endlessly cycle (the normal RPO algorithm clears the
hashtable to avoid this issue).
2. We are sometimes erasing the wrong expression (causing pessimism)
because our equality says loads and stores are the same.
We introduce an exact equality function and use it when erasing to
make sure we erase only identical expressions, not equivalent ones.
llvm-svn: 304807
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
Patch by Daniel Neilson!
Reviewers: reames, anna, skatkov
Reviewed By: reames, anna
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33243
llvm-svn: 304806
Summary:
We were canonizalizing the pre loop (into loop-simplify form) before
the post loop blocks were added into parent loop. This is incorrect when IRCE is
done on a subloop. The post-loop blocks are created, but not yet added to the
parent loop. So, loop-simplification on the pre-loop incorrectly updates
LoopInfo.
This patch corrects the ordering so that pre and post loop blocks are added to
parent loop (if any), and then the loops are canonicalized to LCSSA and
LoopSimplifyForm.
Reviewers: reames, sanjoy, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33846
llvm-svn: 304800
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Summary:
The patch guard all instruction cost calculations with InsnCosts (-lsr-insns-cost) option.
Currently even if the option set to false we calculate and print (in debug mode) instruction costs.
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D33914
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304746
We'd called this "vm state" in the early days, but have long since standardized on calling it "deopt" in line with the operand bundle tag. Fix a few cases we'd missed.
llvm-svn: 304607
Summary:
As shown in the test case, SROA was crashing when trying to split
stores (to the alloca) of loads (from anywhere), because it assumed
the pointer operand to the loads and stores had to have the same
address space. This isn't the case. Make sure to use the correct
pointer type for both the load and the store.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D32593
llvm-svn: 304585
builtin_expect applied on && or || expressions were not
handled properly before. With this patch, the problem is fixed.
Differential Revision: http://reviews.llvm.org/D33164
llvm-svn: 304517
The lowerer wrongly assumes the ICMP instruction
1) always has a constant operand;
2) the operand has value 0.
It also assumes the expected value can only be one, thus
other values other than one will be considered 'zero'.
This leads to wrong profile annotation when other integer values
are used other than 0, 1 in the comparison or in the expect intrinsic.
Also missing is handling of equal predicate.
This patch fixes all the above problems.
Differential Revision: http://reviews.llvm.org/D33757
llvm-svn: 304453
Summary:
Fairly straightforward patch to fill in some of the holes in the
attributes API with respect to accessing parameter/argument attributes.
The patch aims to step further towards encapsulating the
idx+FirstArgIndex pattern to access these attributes to within the
AttributeList.
Patch by Daniel Neilson!
Reviewers: rnk, chandlerc, pete, javed.absar, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33355
llvm-svn: 304329
This reverts commit r304310.
It caused build failures in polly and mingw
due to undefined reference to
llvm::RTLIB::getMEMCPY_ELEMENT_ATOMIC.
llvm-svn: 304315
Summary:
Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy.
The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
Patch by Daniel Neilson!
Reviewers: reames, anna, skatkov
Reviewed By: reames
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33243
llvm-svn: 304310
The recommit is to fix a bug about ExtractValue and InsertValue ops. For those
ops, some varargs inside GVN::Expression are not value numbers but raw index
numbers. It is wrong to do phi-translate for raw index numbers, and the fix is
to stop doing that.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 304050
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 303923
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.
This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:
[ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ]
[ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ]
\ /
[ %e = phi i32 %a2, %c2 ]
[ add i32 %e, 4 ]
GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.
What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.
This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.
Differential revision: https://reviews.llvm.org/D24805
llvm-svn: 303850
pass.
The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.
The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.
Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.
In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.
Differential Revision: https://reviews.llvm.org/D32740
llvm-svn: 303843
having it internally allocate the loop.
This is a much more flexible API and necessary in the new loop unswitch
to reasonably support both new and old PMs in common code. It also just
seems like a cleaner separation of concerns.
NFC, this should just be a pure refactoring.
Differential Revision: https://reviews.llvm.org/D33528
llvm-svn: 303834
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.
Differential Revision: https://reviews.llvm.org/D33431
llvm-svn: 303773
Otherwise we don't revisit an instruction that could be simplified,
and when we verify, we discover there's something that changed, i.e.
what we had wasn't a maximal fixpoint.
Fixes PR32836.
llvm-svn: 303715
Instead of using the SCCP homegrown one. We should eventually
make the private SCCP version disappear, but that wont' be today.
PR33143 tracks this issue.
Add braces for consistency while here. No functional change intended.
llvm-svn: 303706
This patch builds over https://reviews.llvm.org/rL303349 and replaces
the use of the condition only if it is safe to do so.
We should not blindly RAUW the condition if experimental.guard or assume
is a use of that
condition. This is because LVI may have used the guard/assume to
identify the
value of the condition, and RUAWing will fold the guard/assume and uses
before the guards/assumes.
Reviewers: sanjoy, reames, trentxintong, mkazantsev
Reviewed by: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33257
llvm-svn: 303633
In the case where we have an operand defined by a lod of the
same memory location. Historically this was a VariableExpression
because we wanted to make sure they ended up in the same class,
but if we create the right expression, they end up in the same
class anyway.
Fixes PR32897. Thanks to Dan for the detailed discussion and the
fix suggestion.
llvm-svn: 303475
This was here because we don't want to switch leaders too much,
in order to avoid fixpoint(ing) issue, but it's not sure if it
matters in practice.
A first step towards fixing PR32897.
llvm-svn: 303473
This is a complicated bug involving two issues:
1. What do we do with phi nodes when we prove all arguments are not
live?
2. When is it safe to use value leaders to determine if we can ignore
an argumnet?
llvm-svn: 303453
Summary:
NewGVN: Handle equivalence between phi of ops and op of phis.
This makes our GVN mostly-complete. It would be complete, modulo some
deliberate choices we make. This means it detects roughly all herband
equivalences in polynomial time, including cases notoriously hard for
other GVN's to detect. It also detects a very large swath of the
cases we currently rely on instcombine to detect that involve folding
upwards through phis.
Fixes PR 31125, 31463, PR 31868
Reviewers: davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D32151
llvm-svn: 303444
Summary:
This NFC simply refactors the return value of LoopIdiomRecognize::isLegalStore() from bool to an enumeration, and
removes the return-through-parameter mechanism that the function was using. This function is constructed such that it will
only ever recognize a single store idiom (memset, memset_pattern, or memcpy), and never a combination of these. As such it
makes much more sense for the return value to be the single idiom that the store matches, rather than
having a separate argument-return for each idiom -- it's cleaner, and makes it clearer that
only a single idiom can be matched.
Patch by Daniel Neilson!
Reviewers: anna, sanjoy, davide, haicheng
Reviewed By: anna, haicheng
Subscribers: haicheng, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33359
llvm-svn: 303434
We can have cycles between PHIs and this causes singleReachablePhi()
to call itself indefintely (until we run out of stack). The proper
solution would be that of computing SCCs, but it's not worth for
now, so just keep a visited set and give up when we find a cycle.
Thanks to Dan for the discussion/help with this.
Fixes PR33014.
llvm-svn: 303393
Summary:
Implements PR889
Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:
https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing
This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal. However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue. I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.
I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.
Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv
Reviewed By: chandlerc
Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D31261
llvm-svn: 303362
The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same
loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign
extension to the ScaledReg of the Formula, and it will break the assertion that every
Formula to be inserted is canonical.
The fix is to call canonicalize for the raw Formula generated by GenerateTruncates
before inserting it.
llvm-svn: 303361
Summary:
We have a bug when RAUWing the condition if experimental.guard or assumes is a use of that
condition. This is because LazyValueInfo may have used the guards/assumes to identify the
value of the condition at the end of the block. RAUW replaces the uses
at the guard/assume as well as uses before the guard/assume. Both of
these are incorrect.
For now, disable RAUW for conditions and fix the logic as a next
step: https://reviews.llvm.org/D33257
Reviewers: sanjoy, reames, trentxintong
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33279
llvm-svn: 303349
Summary:
There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways:
MaxNumFoo = std::max(MaxNumFoo, NumFoo);
or
MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo;
The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare. But we have no way of knowing if the value was changed by another thread between the reads and the writes.
This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again.
This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ
Reviewers: dberlin, chandlerc, hfinkel, dblaikie
Reviewed By: chandlerc
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D33301
llvm-svn: 303318
I believe this technically fixes a multithreaded race condition in this code. But my primary concern was as part of looking at removing the ability to treat Statistics like a plain unsigned. There are many weird operations on Statistics in the codebase.
llvm-svn: 303314
CTLZ idiom recognition (r303102).
Summary:
The following case:
i = 1;
if(n)
while (n >>= 1)
i++;
use(i);
Was converted to:
i = 1;
if(n)
i += builtin_ctlz(n >> 1, false);
use(i);
Which is not correct. The patch make it:
i = 1;
if(n)
i += builtin_ctlz(n >> 1, true);
use(i);
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303212
Summary:
The following loops should be recognized:
i = 0;
while (n) {
n = n >> 1;
i++;
body();
}
use(i);
And replaced with builtin_ctlz(n) if body() is empty or
for CPUs that have CTLZ instruction converted to countable:
for (j = 0; j < builtin_ctlz(n); j++) {
n = n >> 1;
i++;
body();
}
use(builtin_ctlz(n));
Reviewers: rengolin, joerg
Differential Revision: http://reviews.llvm.org/D32605
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 303102
verifyMemoryCongruency() filters out trivially dead MemoryDef(s),
as we find them immediately dead, before moving from TOP to a new
congruence class.
This fixes the same problem for PHI(s) skipping MemoryPhis if all
the operands are dead.
Differential Revision: https://reviews.llvm.org/D33044
llvm-svn: 303100
This code was missing a check for stores, so we were thinking the
congruency class didn't have any memory members, and reset the
memory leader.
Differential Revision: https://reviews.llvm.org/D33056
llvm-svn: 302905
invariant PHI inputs and to rewrite PHI nodes during the actual
unswitching.
The checking is quite easy, but rewriting the PHI nodes is somewhat
surprisingly challenging. This should handle both branches and switches.
I think this is now a full featured trivial unswitcher, and more full
featured than the trivial cases in the old pass while still being (IMO)
somewhat simpler in how it works.
Next up is to verify its correctness in more widespread testing, and
then to add non-trivial unswitching.
Thanks to Davide and Sanjoy for the excellent review. There is one
remaining question that I may address in a follow-up patch (see the
review thread for details) but it isn't related to the functionality
specifically.
Differential Revision: https://reviews.llvm.org/D32699
llvm-svn: 302867
This pass doesn't correctly handle testing for when it is legal to hoist
arbitrary instructions. The whitelist happens to make it safe, so before
it is removed the pass's legality checks will need to be enhanced.
Details have been added to the code review thread for the patch.
llvm-svn: 302640
The way we currently define congruency for two PHIExpression(s) is:
1) The operands to the phi functions are congruent
2) The PHIs are defined in the same BasicBlock.
NewGVN works under the assumption that phi operands are in predecessor
order, or at least in some consistent order. OTOH, is valid IR:
patatino:
%meh = phi i16 [ %0, %winky ], [ %conv1, %tinky ]
%banana = phi i16 [ %0, %tinky ], [ %conv1, %winky ]
br label %end
and the in-memory representations of the two SSA registers have an
inconsistent order. This violation of NewGVN assumptions results into
two PHIs found congruent when they're not. While we think it's useful
to have always a consistent order enforced, let's fix this in NewGVN
sorting uses in predecessor order before creating a PHI expression.
Differential Revision: https://reviews.llvm.org/D32990
llvm-svn: 302552
Loop Idiom recognition was generating memset in a case that
would result generating a division operation to an unsafe location.
Differential Revision: https://reviews.llvm.org/D32674
llvm-svn: 302238
Compares always return a scalar integer or vector of integers. isIntegerTy returns false for vectors, but that's not completely obvious. So using isVectorTy is less confusing.
llvm-svn: 302198
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
Summary:
Currently, loop deletion deletes loop where the only values
that are used outside the loop are loop-invariant.
This patch adds logic to delete loops where the loop is proven to be
never executed (i.e. the only predecessor of the loop preheader has a
constant conditional branch as terminator, and the preheader is not the
taken target). This will remove loops that become dead after
loop-unswitching generates constant conditional branches.
The next steps are:
1. moving the loop deletion implementation to LoopUtils.
2. Add logic in loop-simplifyCFG which will support changing conditional
constant branches to unconditional branches. If loops become unreachable in this
process, they can be removed using `deleteDeadLoop` function.
Reviewers: chandlerc, efriedma, sanjoy, reames
Reviewed by: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32494
llvm-svn: 302015
In the testcase attached, we believe %tmp1 implies %tmp4.
where:
br i1 %tmp1, label %bb2, label %bb7
br i1 %tmp4, label %bb5, label %bb7
because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.
Differential Revision: https://reviews.llvm.org/D32718
llvm-svn: 301849
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.
llvm-svn: 301816
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32407
llvm-svn: 301804
Summary:
programUndefinedIfPoison makes more sense, given what the function
does; and I'm about to add a function with a name similar to
isKnownNotFullPoison (so do the rename to avoid confusion).
Reviewers: broune, majnemer, bjarke.roune
Reviewed By: broune
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30444
llvm-svn: 301776
I fixed my miscompile in r301722 and I hope I don't have to take
a look at this code again now that Chandler has a new LoopUnswitch
pass, but maybe this could be of use for somebody else in the
meanwhile.
llvm-svn: 301723
This broke the Clang build. (Clang-side patch missing?)
Original commit message:
> [IR] Make add/remove Attributes use AttrBuilder instead of
> AttributeList
>
> This change cleans up call sites and avoids creating temporary
> AttributeList objects.
>
> NFC
llvm-svn: 301712
While looking at pure addressing expressions, it's possible
for the value to appear later in Postorder.
I haven't been able to come up with a testcase where this
exhibits an actual issue, but if you insert a dump before
the value map lookup, a few testcases crash.
llvm-svn: 301705
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.
llvm-svn: 301704
While debugging a miscompile I realized loopunswitch doesn't
put newlines when printing the instruction being replacement.
Ending up with a single line with many instruction replaced isn't
the best for readability and/or mental sanity.
llvm-svn: 301692
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.
Avoids confusing code like:
IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
Alignment = CS->getParamAlignment(ArgIdx + 1);
Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.
This is a potentially breaking change for out-of-tree backends that do
their own call lowering.
llvm-svn: 301682
EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions.
Reviewers: sanjoy, reames, apilipenko, anna, skatkov
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32482
llvm-svn: 301625
If a condition is calculated only once, and there are multiple guards on this condition, we should be able
to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition
of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and
mark its condition as 'true' for future consideration.
Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin
Reviewed By: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32476
llvm-svn: 301623
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
and instead incrementally updates it.
I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.
My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.
This pass makes two very significant assumptions that enable most of these
improvements:
1) Focus on *full* unswitching -- that is, completely removing whatever
control flow construct is being unswitched from the loop. In the case
of trivial unswitching, this means removing the trivial (exiting)
edge. In non-trivial unswitching, this means removing the branch or
switch itself. This is in opposition to *partial* unswitching where
some part of the unswitched control flow remains in the loop. Partial
unswitching only really applies to switches and to folded branches.
These are very similar to full unrolling and partial unrolling. The
full form is an effective canonicalization, the partial form needs
a complex cost model, cannot be iterated, isn't canonicalizing, and
should be a separate pass that runs very late (much like unrolling).
2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
dates from a time when a great deal of LLVM's loop infrastructure was
missing, ineffective, and/or unreliable. As a consequence, a lot of
complexity was added which we no longer need.
With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.
Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.
One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.
Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.
Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.
Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.
I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.
Differential Revision: https://reviews.llvm.org/D32409
llvm-svn: 301576
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
llvm-svn: 301424
This reverts commit r300732. This breaks a few tests.
I think the problem is related to adding more uses of
the condition that don't yet exist at this point.
llvm-svn: 301242
Summary:
Instead of keeping a variable indicating whether there are early exits
in the loop. We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.
I am going to update compilation time as well soon.
Reviewers: hfinkel, sanjoy, efriedma, mkuper
Reviewed By: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D32433
llvm-svn: 301196
Summary:
In case all predecessor go to a single successor of current BB. We want to fold (not thread).
I failed to update the phi nodes properly in the last patch https://reviews.llvm.org/rL300657.
Phi nodes values are per predecessor in LLVM.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32400
llvm-svn: 301139
Fixes leaving intermediate flat addressing computations
where a GEP instruction's source is a constant expression.
Still leaves behind a trivial addrspacecast + gep pair that
instcombine is able to handle, which ideally could be folded
here directly.
llvm-svn: 301044
places based on it.
Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.
The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.
Differential Revision: https://reviews.llvm.org/D28962
llvm-svn: 300989
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.
This produces nicer to read structurizer output.
This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.
llvm-svn: 300732
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).
Reviewers: efriedma, sanjoy
Reviewed By: sanjoy
Subscribers: dberlin, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D30869
llvm-svn: 300657
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.
Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.
llvm-svn: 300367
It is cleaner to have a callback based system where the logic of
whether an add recurrence is normalized or not lives on IVUsers.
This is one step in a multi-step cleanup.
llvm-svn: 300330
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=27065
Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide
Reviewed By: davide
Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31032
llvm-svn: 300034
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299949
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
llvm-svn: 299925
When allowed, we can hoist a division out of a loop in favor of a
multiplication by the reciprocal. Fixes PR32157.
Patch by vit9696!
Differential Revision: https://reviews.llvm.org/D30819
llvm-svn: 299911
This re-lands r299875.
I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:
// Collect any return attributes from the call.
- if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
- newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
- oldAttrs.getRetAttributes()));
+ newAttrs.push_back(oldAttrs.getRetAttributes());
Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)
That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.
llvm-svn: 299899
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
llvm-svn: 299888
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.
I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.
My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.
I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.
Reviewers: pete, chandlerc, sanjoy, hfinkel
Reviewed By: pete
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31198
llvm-svn: 299875
Summary:
Resolve indirect branch target when possible.
This potentially eliminates more basicblocks and result in better evaluation for phi and other things.
Reviewers: davide, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30322
llvm-svn: 299830
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
memorydefs, not just stores. Along the way, we audit and fixup issues
about how we were tracking memory leaders, and improve the verifier
to notice more memory congruency issues.
llvm-svn: 299682
Summary:
Depends on D30928.
This adds support for coercion of stores and memory instructions that do not require insertion to process.
Another few tests down.
I added the relevant tests from rle.ll
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30929
llvm-svn: 299330
processing the congruence class of the store.
Because we use the stored value of a store as the def, it isn't dead
just because it appears as a def when it comes from a store.
Note: I have not hit any cases with the memory code as it is where
this breaks anything, just because of what memory congruences we
actually allow. In a followup that improves memory congruence,
this bug actually breaks real stuff (but the verifier catches it).
llvm-svn: 299300
Summary:
Triggered by commit r298620: "[LV] Vectorize GEPs".
If we encounter a vector GEP with scalar arguments, we splat the scalar
into a vector of appropriate size before we scatter the argument.
Reviewers: arsenm, mehdi_amini, bkramer
Reviewed By: arsenm
Subscribers: bjope, mssimpso, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D31416
llvm-svn: 299186
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
NFCI.
Summary:
This is ground work for the changes to enable coercion in NewGVN.
GVN doesn't care if they end up constant because it eliminates as it goes.
NewGVN cares.
IRBuilder and ConstantFolder deliberately present the same interface,
so we use this to our advantage to templatize our functions to make
them either constant only or not.
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30928
llvm-svn: 298262
Summary: This Idom check seems unnecessary. The immediate children of a node on the Dominator Tree should always be the IDom of its immediate children in this case.
Reviewers: hfinkel, majnemer, dberlin
Reviewed By: dberlin
Subscribers: dberlin, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D26954
llvm-svn: 298232
Summary:
In case we are loading on a phi-load in SimplifyPartiallyRedundantLoad.
Try to phi translate it into incoming values in the predecessors before
we search for available loads.
This needs https://reviews.llvm.org/D30524
Reviewers: davide, sanjoy, efriedma, dberlin, rengolin
Reviewed By: dberlin
Subscribers: junbuml, llvm-commits
Differential Revision: https://reviews.llvm.org/D30543
llvm-svn: 298217
Summary:
iterateOnFunction creates a ReversePostOrderTraversal object which does a post order traversal in its constructor and stores the results in an internal vector. Iteration over it just reads from the internal vector in reverse order.
The GVN code seems to be unaware of this and iterates over ReversePostOrderTraversal object and makes a copy of the vector into a local vector. (I think at one point in time we used a DFS here instead which would have required the local vector).
The net affect of this is that we have two vectors containing the basic block list. As I didn't want to expose the implementation detail of ReversePostOrderTraversal's constructor to GVN, I've changed the code to do an explicit post order traversal storing into the local vector and then reverse iterate over that.
I've also removed the reserve(256) since the ReversePostOrderTraversal wasn't doing that. I can add it back if we thinks it important. Though it seemed weird that it wasn't based on the size of the function.
Reviewers: davide, anemet, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31084
llvm-svn: 298191
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.
Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.
Differential Revision: https://reviews.llvm.org/D30796
llvm-svn: 298104
We were not handling getelemenptr instructions of vector type before.
Since getelemenptr instructions for vector types follow the same rule as
getelementptr instructions for non-vector types, we can just handle them
in the same way.
llvm-svn: 298028
Summary:
In commit r289548 ([ADCE] Add code to remove dead branches) a redundant loop
nest was accidentally introduced, which implements exactly the same
functionality as has already been available right after. This redundancy has
been found when inspecting the ADCE code in the context of our recent
discussions on post-dominator modeling. This redundant code was also eliminated
by r296535 (which sparked the discussion), but only as part of a larger semantic
change of the post-dominance modeling. As this redundency in [ADCE] is really
just an oversight completely independent of the post-dominance changes under
discussion, we remove this redundancy independently.
Reviewers: dberlin, david2050
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31023
llvm-svn: 297929
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
Reviewers: davide
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30478
llvm-svn: 297438
Summary: Use AA when scanning to find an available load value.
Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin
Reviewed By: rengolin, dberlin
Subscribers: aemerson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30352
llvm-svn: 297284
Recommitting patch which was previously reverted in r297159. These
changes should address the casting issues.
The original patch enables dbg.value intrinsics to be attached to
newly inserted PHI nodes.
Differential Review: https://reviews.llvm.org/D30701
llvm-svn: 297269
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Patch by Max Kazantsev!
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Reviewed By: mkuper
Subscribers: mkuper, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30243
llvm-svn: 296897
and also "clang-format GenericDomTreeConstruction.h, since the current
formatting makes it look like their is a bug in the loop indentation, and there
is not"
This reverts commit r296535.
There are still some open design questions which I would like to discuss. I
revert this for Daniel (who gave the OK), as he is on vacation.
llvm-svn: 296812
Now that terminators can be EH pads, this code needs to iterate over the
immediate dominators of the EH pad to find a valid insertion point.
Fix for PR32107
Patch by Robert Olliff!
Differential Revision: https://reviews.llvm.org/D30511
llvm-svn: 296698
Summary:
Currently, our post-dom tree tries to ignore and remove the effects of
infinite loops. It fails miserably at this, because it tries to do it
ahead of time, and thus can only detect self-loops, and any other type
of infinite loop, it pretends doesn't exist at all.
This can, in a bunch of cases, lead to wrong answers and a completely
empty post-dom tree.
Wrong answer:
```
declare void foo()
define internal void @f() {
entry:
br i1 undef, label %bb35, label %bb3.i
bb3.i:
call void @foo()
br label %bb3.i
bb35.loopexit3:
br label %bb35
bb35:
ret void
}
```
We get:
```
Inorder PostDominator Tree:
[1] <<exit node>> {0,7}
[2] %bb35 {1,6}
[3] %bb35.loopexit3 {2,3}
[3] %entry {4,5}
```
This is a trivial modification of the testcase for PR 6047
Note that we pretend bb3.i doesn't exist.
We also pretend that bb35 post-dominates entry.
While it's true that it does not exit in a theoretical sense, it's not
really helpful to try to ignore the effect and pretend that bb35
post-dominates entry. Worse, we pretend the infinite loop does
nothing (it's usually considered a side-effect), and doesn't even
exist, even when it calls a function. Sadly, this makes it impossible
to use when you are trying to move code safely. All compilers also
create virtual or real single exit nodes (including us), and connect
infinite loops there (which this patch does). In fact, others have
worked around our behavior here, to the point of building their own
post-dom trees:
https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing
out the region infrastructure is near-useless for them with postdom in
this state :(
Completely empty post-dom tree:
```
define void @spam() #0 {
bb:
br label %bb1
bb1: ; preds = %bb1, %bb
br label %bb1
bb2: ; No predecessors!
ret void
}
```
Printing analysis 'Post-Dominator Tree Construction' for function 'foo':
=============================--------------------------------
Inorder PostDominator Tree:
[1] <<exit node>> {0,1}
:(
(note that even if you ignore the effects of infinite loops, bb2
should be present as an exit node that post-dominates nothing).
This patch changes post-dom to properly handle infinite loops and does
root finding during calculation to prevent empty tress in such cases.
We match gcc's (and the canonical theoretical) behavior for infinite
loops (find the backedge, connect it to the exit block).
Testcases coming as soon as i finish running this on a ton of random graphs :)
Reviewers: chandlerc, davide
Subscribers: bryant, llvm-commits
Differential Revision: https://reviews.llvm.org/D29705
llvm-svn: 296535
This is a fix for a loop predication bug which resulted in malformed IR generation.
Loop invariant side of the widened condition is not guaranteed to be available in the preheader as is, so we need to expand it as well. See added unsigned_loop_0_to_n_hoist_length test for example.
Reviewed By: sanjoy, mkazantsev
Differential Revision: https://reviews.llvm.org/D30099
llvm-svn: 296345
Summary:
BranchInst, SwitchInst (with non-default case) with Undef as input is not
possible at this point. As we always default-fold terminator to one target in
ResolvedUndefsIn and set the input accordingly.
So we should only have constantint/blockaddress here.
If ConstantFoldTerminator fails, that could mean 2 things.
1. ConstantFoldTerminator is doing something unexpected, i.e. not folding on constantint
or blockaddress and not making blocks that should be dead dead.
2. This is not a terminator on constantint or blockaddress. Its on a constant or
overdefined, then this block should not be dead.
In both cases, we should assert.
Reviewers: davide, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30381
llvm-svn: 296281
LoopUnswitch/simplify-with-nonvalness.ll is the test case for this.
The LIC has 2 users and deleting the 1st user when it can be simplified
invalidated the iterator for the 2nd user.
llvm-svn: 296069
Summary: In case we do not know what the condition is in an unswitched loop, but we know its definitely NOT a known constant. We can perform simplifcations based on this information.
Reviewers: sanjoy, hfinkel, chenli, efriedma
Reviewed By: efriedma
Subscribers: david2050, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D28968
llvm-svn: 296041
While not CVP's fault, this caused miscompiles (PR31181). Reverting
until those are resolved.
(This also reverts the follow-ups r288154 and r288161 which removed the
flag.)
llvm-svn: 296030
In OptimizeAdd, we scan the operand list to see if there are any common factors
between operands that can be factored out to reduce the number of multiplies
(e.g., 'A*A+A*B*C+D' -> 'A*(A+B*C)+D'). For each operand of the operand list, we
only consider unique factors (which is tracked by the Duplicate set). Now if we
find a factor that is a negative constant, we add the negated value as a factor
as well, because we can percolate the negate out. However, we mistakenly don't
add this negated constant to the Duplicates set.
Consider the expression A*2*-2 + B. Obviously, nothing to factor.
For the added value A*2*-2 we over count 2 as a factor without this change,
which causes the assert reported in PR30256. The problem is that this code is
assuming that all the multiply operands of the add are already reassociated.
This change avoids the issue by making OptimizeAdd tolerate multiplies which
haven't been completely optimized; this sort of works, but we're doing wasted
work: we'll end up revisiting the add later anyway.
Another possible approach would be to enforce RPO iteration order more strongly.
If we have RedoInsts, we process them immediately in RPO order, rather than
waiting until we've finished processing the whole function. Intuitively, it
seems like the natural approach: reassociation works on expression trees, so
the optimization only works in one direction. That said, I'm not sure how
practical that is given the current Reassociate; the "optimal" form for an
expression depends on its use list (see all the uses of "user_back()"), so
Reassociate is really an iterative optimization of sorts, so any changes here
would probably get messy.
PR30256
Differential Revision: https://reviews.llvm.org/D30228
llvm-svn: 296003
Summary:
Depends on D29606 and D29682
Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).
Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29747
llvm-svn: 295889
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.
Summary: Mostly cleanup
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30221
llvm-svn: 295887
After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs.
Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg.
But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg.
Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant
parts RegA and RegB are not grouped together and cannot be promoted outside of loop.
With this patch, it will ensure lsr.iv to be generated later in the expr:
RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop.
Differential Revision: https://reviews.llvm.org/D26781
llvm-svn: 295884
This enables peeling of loops with low dynamic iteration count by default,
when profile information is available.
Differential Revision: https://reviews.llvm.org/D27734
llvm-svn: 295796
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).
Summary:
The method is based on registers number mathematical expectation and should be
generally closer to optimal solution.
Please see details in comments to
"LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
(in lib/Transforms/Scalar/LoopStrengthReduce.cpp).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D29862
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
Summary: This begins using the predicateinfo pass in NewGVN.
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D29682
llvm-svn: 295583
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:
1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.
The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.
This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).
Patch by Max Kazantsev!
Reviewers: anna, sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30036
llvm-svn: 295410
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops.
// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j);
long cond;
void foo(long N) {
long i = 0;
long j = 0;
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
}
The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.
Differential Revision: https://reviews.llvm.org/D30021
llvm-svn: 295378
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.
Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.
It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
load i128 ...
if (can lower i128 atomic)
store atomic i128 ...
else
store i128
}
It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically.
It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.
Differential Revision: https://reviews.llvm.org/D15592
llvm-svn: 295015
it is dead or unreachable, as it should be.
This also makes the leader of INITIAL undef, enabling us to handle
irreducibility properly.
Summary:
This lets us verify, more than we do now, that we didn't screw up
value numbering.
Reviewers: davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D29842
llvm-svn: 294844
Summary:
The patch adds instructions number generated by a solution
to LSR cost under "-lsr-insns-cost" option.
Reviewers: qcolombet, hfinkel
Differential Revision: http://reviews.llvm.org/D28307
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
The recommit includes some changes of testcases. No functional change to the patch.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
llvm-svn: 294814
This was marking the loop for deletion after the loop was deleted. This
almost works, except that when we do any kind of debug logging it starts
reading the name of the loop from deleted memory or otherwise blowing
up. This can fail in a bunch of ways. I recently added a test that
*always* does this, and it started failing on the sanitizer bots.
The fix is to mark the loop as deleted in the loop PM infrastructure
before we remove the loop. We can do this by passing the updater into
the routine. That also lets us simplify a bunch of other interface
components here for a net win.
llvm-svn: 294810
Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass. Given this code has been checked in, but off for over a year, it makes sense to just remove it for now.
Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should.
llvm-svn: 294715
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:
if (cond) { do something } else {deoptimize}
Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:
if (cond1) {
// code A
} else {
// code B
}
// code C
guard(cond2)
// code D
If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:
if (cond1) {
// code A
// code C
} else {
// code B
// code C
guard(cond2)
}
// code D
Thus, removing the guard from one of execution branches.
Patch by Max Kazantsev!
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29620
llvm-svn: 294617
Summary:
After the DFS order change for LVI, i have a few testcases that now
take forever.
The TL;DR - This is mainly due to the overdefined cache, but that
requires predicateinfo to fix[1]
In order to maximize reuse of the LVI cache for now, change the order
we iterate in.
This reduces my testcase from 5 minutes to 4 seconds.
I have verified cases like gmic do not get slower.
I am playing with whether the order should be postorder or idf.
[1] In practice, overdefined anywhere should be overdefined
everywhere, so this cache should be global. That also fixes this bug.
The problem, however, is that LVI relies on this cache being filled in
per-block because it wants different values in different blocks due to
precisely the naming issue that predicateinfo fixes. With
predicateinfo, making the cache global works fine on individual
passes, and also resolves this issue.
Reviewers: davide, sanjoy, chandlerc
Subscribers: llvm-commits, djasper
Differential Revision: https://reviews.llvm.org/D29679
llvm-svn: 294398
Currently IRCE relies on the loops it transforms to be (semantically) of
the form:
for (i = START; i < END; i++)
...
or
for (i = START; i > END; i--)
...
However, we were not verifying the presence of the START < END entry
check (i.e. check before the first iteration). We were only verifying
that the backedge was guarded by (i + 1) < END.
Usually this would work "fine" since (especially in Java) most loops do
actually have the START < END check, but of course that is not
guaranteed.
llvm-svn: 294375
This reverts commit r294250. It caused PR31891.
Add a test case that shows that inlinable calls retain location
information with an accurate scope.
llvm-svn: 294317
Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.
Reviewers: mcrosier, rengolin, reames, davidxl, haicheng
Reviewed By: rengolin
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29200
llvm-svn: 293896
Summary:
We can hoist out loads that are dominated by invariant.start, to the preheader.
We conservatively assume the load is variant, if we see a corresponding
use of invariant.start (it could be an invariant.end or an escaping
call).
Reviewers: mkuper, sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29331
llvm-svn: 293887
Summary: No need to try to ease BB from LoopHeaders as we already know that BB is not in LoopHeaders.
Reviewers: hsung, majnemer, mcrosier, haicheng, rengolin
Reviewed By: rengolin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29232
llvm-svn: 293802
This tries to address what Hal defined (in the post-commit review of
r293727) a long-standing problem with noinline, where we end up
de facto inlining trivial functions e.g.
__attribute__((noinline)) int patatino(void) { return 5; }
because of return value propagation.
llvm-svn: 293799
For targets with different addressing modes in each address space,
if this is dropped querying isLegalAddressingMode later with this
will give a nonsense result, breaking the isLegalUse assertions.
This is a candidate for the 4.0 release branch.
llvm-svn: 293542
This reverts commit r293196
Besides making things look nicer, ATM, we'd like to preserve analysis
more than we'd like to destroy the CFG. We'll probably revisit in the future
llvm-svn: 293501
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
In r292621, the recommit fixes a bug related with live interval update
after the partial redundent copy is moved.
This recommit solves an additional bug related to the lack of update of
subranges.
The original patch is to solve the performance problem described in
PR27827. Register coalescing sometimes cannot remove a copy because of
interference. But if we can find a reverse copy in one of the predecessor
block of the copy, the copy is partially redundent and we may remove the
copy partially by moving it to the predecessor block without the
reverse copy.
Differential Revision: https://reviews.llvm.org/D28585
Re-apply r292621
Revert "Revert rL292621. Caused some internal build bot failures in apple."
This reverts commit r292984.
Original patch: Wei Mi <wmi@google.com>
Subrange fix: Mostly Matthias Braun <matze@braunis.de>
llvm-svn: 293353
skip sub-subloops.
The logic to skip subloops dated from when this code was shared with the
cached case. Once it was factored out to only run in the case of
recomputed subloops it became a dangerous bug. If a subsubloop contained
an interfering instruction it would be silently skipped from the alias
sets for LICM.
With the old pass manager this was extremely hard to trigger as it would
require failing to visit these subloops with the LICM pass but then
visiting the outer loop somehow. I've not yet contrived any test case
that actually manages to trigger this.
But with the new pass manager we don't do the cross-loop caching hack
that the old PM does and so we recompute alias set information from
first principles. While this seems much cleaner and simpler it exposed
this bug and would subtly miscompile code due to failing to correctly
model the aliasing constraints of deeply nested loops.
llvm-svn: 293273
Summary:
This adds basic dead and redundant store elimination to
NewGVN. Unlike our current DSE, it will happily do cross-block DSE if
it meets our requirements.
We get a bunch of DSE's simple.ll cases, and some stuff it doesn't.
Unlike DSE, however, we only try to eliminate stores of the same value
to the same memory location, not just general stores to the same
memory location.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29149
llvm-svn: 293258
the main pipeline.
This is a very straight forward port. Nothing weird or surprising.
This brings the number of missing passes from the new PM's pipeline down
to three.
llvm-svn: 293249
Summary:
This does not actually fix the testcase in PR31761 (discussion is
ongoing on the testcase), but does fix a bug it exposes, where stores
were not properly clobbering loads.
We accomplish this by unifying the memory equivalence infratructure
back into the normal congruence infrastructure, and then properly
destroying congruence classes when memory state leaders disappear.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29195
llvm-svn: 293216
factory functions for the two modes the loop unroller is actually used
in in-tree: simplified full-unrolling and the entire thing including
partial unrolling.
I've also wired these up to nice names so you can express both of these
being in a pipeline easily. This is a precursor to actually enabling
these parts of the O2 pipeline.
Differential Revision: https://reviews.llvm.org/D28897
llvm-svn: 293136
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert
for (i = 0; i < n; i++) {
guard(i < len);
...
}
to
for (i = 0; i < n; i++) {
guard(n - 1 < len);
...
}
After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition:
if (n - 1 < len)
for (i = 0; i < n; i++) {
...
}
else
deoptimize
This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062).
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D29034
llvm-svn: 293064
loops.
We do this by reconstructing the newly added loops after the unroll
completes to avoid threading pass manager details through all the mess
of the unrolling infrastructure.
I've enabled some extra assertions in the LPM to try and catch issues
here and enabled a bunch of unroller tests to try and make sure this is
sane.
Currently, I'm manually running loop-simplify when needed. That should
go away once it is folded into the LPM infrastructure.
Differential Revision: https://reviews.llvm.org/D28848
llvm-svn: 293011
Summary:
GVNHoist performs all the optimizations that MLSM does to loads, in a
more general way, and in a faster time bound (MLSM is N^3 in most
cases, N^4 in a few edge cases).
This disables the load portion.
Note that the way ld_hoist_st_sink.ll is written makes one think that
the loads should be moved to the while.preheader block, but
1. Neither MLSM nor GVNHoist do it (they both move them to identical places).
2. MLSM couldn't possibly do it anyway, as the while.preheader block
is not the head of the diamond, while.body is. (GVNHoist could do it
if it was legal).
3. At a glance, it's not legal anyway because the in-loop load
conflict with the in-loop store, so the loads must stay in-loop.
I am happy to update the test to use update_test_checks so that
checking is tighter, just was going to do it as a followup.
Note that i can find no particular benefit to the store portion on any
real testcase/benchmark i have (even size-wise). If we really still
want it, i am happy to commit to writing a targeted store sinker, just
taking the code from the MemorySSA port of MergedLoadStoreMotion
(which is N^2 worst case, and N most of the time).
We can do what it does in a much better time bound.
We also should be both hoisting and sinking stores, not just sinking
them, anyway, since whether we should hoist or sink to merge depends
basically on luck of the draw of where the blockers are placed.
Nonetheless, i have left it alone for now.
Reviewers: chandlerc, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29079
llvm-svn: 292971
a lazy-asserting PoisoningVH.
AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.
This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.
The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.
The rest is straight cleanup.
I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.
Differential Revision: https://reviews.llvm.org/D29006
llvm-svn: 292928
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
invalidation of deleted functions in GlobalDCE.
This was always testing a bug really triggered in GlobalDCE. Right now
we have analyses with asserting value handles into IR. As long as those
remain, when *deleting* an IR unit, we cannot wait for the normal
invalidation scheme to kick in even though it was designed to work
correctly in the face of these kinds of deletions. Instead, the pass
needs to directly handle invalidating the analysis results pointing at
that IR unit.
I've tought the Inliner about this and this patch teaches GlobalDCE.
This will handle the asserting VH case in the existing test as well as
other issues of the same fundamental variety. I've moved the test into
the GlobalDCE directory and added a comment explaining what is going on.
Note that we cannot simply require LVI here because LVI is too lazy.
llvm-svn: 292773
become unavailable.
The AssumptionCache is now immutable but it still needs to respond to
DomTree invalidation if it ended up caching one.
This lets us remove one of the explicit invalidates of LVI but the
other one continues to avoid hitting a latent bug.
llvm-svn: 292769
Don't call `isTriviallyDeadInstructions()` once we discover that
an instruction is dead. Instead, set DFS number zero (as suggested
by Danny) and forget about it (this also speeds up things as we
won't try to reprocess that block).
Differential Revision: https://reviews.llvm.org/D28930
llvm-svn: 292676
Summary:
This rewrites store expression/leader handling. We no longer use the
value operand as the leader, instead, we store it separately. We also
now store the stored value as part of the expression, and compare it
when comparing stores for equality. This enables us to get rid of a
bunch of our previous hacks and machinations, as the existing
machinery takes care of everything *except* updating the stored value
on classes. The only time we have to update it is if the storecount
goes to 0, and when we do, we destroy it.
Since we no longer use the value operand as the leader, during elimination, we have to use the value operand. Doing this also fixes a bunch of store forwarding cases we were missing.
Any value operand we use is guaranteed to either be updated by previous eliminations, or minimized by future ones.
(IE the fact that we don't use the most dominating value operand when it's not a constant does not affect anything).
Sadly, this change also exposes that we didn't pay attention to the
output of the pr31594.ll test, as it also very clearly exposes the
same store leader bug we are fixing here.
(I added pr31682.ll anyway, but maybe we think that's too large to be useful)
On the plus side, propagate-ir-flags.ll now passes due to the
corrected store forwarding.
This change was 3 stage'd on darwin and linux, with the full test-suite.
Reviewers:
davide
Subscribers:
llvm-commits
llvm-svn: 292648
Like several other loop passes (the vectorizer, etc) this pass doesn't
really fit the model of a loop pass. The critical distinction is that it
isn't intended to be pipelined together with other loop passes. I plan
to add some documentation to the loop pass manager to make this more
clear on that side.
LoopSink is also different because it doesn't really need a lot of the
infrastructure of our loop passes. For example, if there aren't loop
invariant instructions causing a preheader to exist, there is no need to
form a preheader. It also doesn't need LCSSA because this pass is
only involved in sinking invariant instructions from a preheader into
the loop, not reasoning about live-outs.
This allows some nice simplifications to the pass in the new PM where we
can directly walk the loops once without restructuring them.
Differential Revision: https://reviews.llvm.org/D28921
llvm-svn: 292589
Part of the assert has been left active for further debugging.
The other part has been turned into a stat for tracking for the
moment.
llvm-svn: 292583
This can prove that:
extern int f;
int g() {
int x = 0;
for (int i = 0; i < 365; ++i) {
x /= f;
}
return x;
}
always returns zero. Thanks to Sanjoy for confirming this
transformation actually made sense (bugs are mine).
llvm-svn: 292531
Summary:
In case of non-alloca pointers, we check for whether it is a pointer
from malloc-like calls and it is not captured. In such case, we can
promote the pointer, as the caller will have no way to access this pointer
even if there is unwinding in middle of the loop.
Reviewers: hfinkel, sanjoy, reames, eli.friedman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28834
llvm-svn: 292510
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
unique exit block if available rather than rolling it ourselves.
This is a little disappointing because that helper doesn't do anything
clever to short-circuit the (surprisingly expensive) computation of all
exit blocks. What's worse is that the way we compute this is hopelessly,
hilariously inefficient. We're literally computing the same information
two different ways and multiple times each way:
- hasDedicatedExits computes the exit block set and then looks at the
predecessors of each
- getExitingBlocks computes the set of loop blocks which have exiting
successors
- getUniqueExitBlock(s) computes the set of non-loop blocks reached from
loop blocks (sound familiar?)
Anyways, at some point we should clean all of this up in the LoopInfo
API, but for now just simplifying the user I'm about to touch.
llvm-svn: 292282
I hope that for any code, it is changed only with good reason and only
when the author knows what they are doing...
There is of course good reason to comment here about the subtlety of the
process, and I've left that comment in tact.
llvm-svn: 292275
instead of members.
No state was being provided by the object so this seems strictly
simpler.
I've also tried to improve the name and comments for the functions to
more thoroughly document what they are doing.
llvm-svn: 292274
that we know has exactly one element when all we are going to do is get
that one element out of it.
Instead, pass around that one element.
There are more simplifications to come in this code...
llvm-svn: 292273
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
Summary:
This is a testcase where phi node cycling happens, and because we do
not order the leaders by domination or anything similar, the leader
keeps changing.
Using std::set for the members is too expensive, and we actually don't
need them sorted all the time, only at leader changes.
We could keep both a set and a vector, and keep them mostly sorted and
resort as necessary, or use a set and a fibheap, but all of this seems
premature.
After running some statistics, we are able to avoid the vast majority
of sorting by keeping a "next leader" field. Most congruence classes only have
leader changes once or twice during GVN.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28594
llvm-svn: 291968
It was always zero. When we move a store from `initial` to its
own congruency class, we end up with a negative store count, which
is obviously wrong.
Also, while here, change StoreCount to be signed so that the assertions
actually fire.
Ack'ed by Daniel Berlin.
llvm-svn: 291725
classes, and updating checking to allow for equivalence through
reachability.
(Sadly, the checking here is not perfect, and can't be made perfect,
so we'll have to disable it after we are satisfied with correctness.
Right now it is just "very unlikely" to happen.)
llvm-svn: 291698
the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current
plan is to have the PM itself establish and enforce both loop simplified
form and LCSSA. This would be a layering violation in the analysis
library.
Fundamentally, the idea behind the loop PM is to *transform* loops in
addition to running passes over them, so it really seemed like the most
natural place to sink this was into the transforms library.
We can't just move *everything* because we also have loop analyses that
rely on a subset of the invariants. So this patch splits the the loop
infrastructure into the analysis management that has to be part of the
analysis library, and the transform-aware pass manager.
This also required splitting the loop analyses' printer passes out to
the transforms library, which makes sense to me as running these will
transform the code into LCSSA in theory.
I haven't split the unittest though because testing one component
without the other seems nearly intractable.
Differential Revision: https://reviews.llvm.org/D28452
llvm-svn: 291662
arguments much like the CGSCC pass manager.
This is a major redesign following the pattern establish for the CGSCC layer to
support updates to the set of loops during the traversal of the loop nest and
to support invalidation of analyses.
An additional significant burden in the loop PM is that so many passes require
access to a large number of function analyses. Manually ensuring these are
cached, available, and preserved has been a long-standing burden in LLVM even
with the help of the automatic scheduling in the old pass manager. And it made
the new pass manager extremely unweildy. With this design, we can package the
common analyses up while in a function pass and make them immediately available
to all the loop passes. While in some cases this is unnecessary, I think the
simplicity afforded is worth it.
This does not (yet) address loop simplified form or LCSSA form, but those are
the next things on my radar and I have a clear plan for them.
While the patch is very large, most of it is either mechanically updating loop
passes to the new API or the new testing for the loop PM. The code for it is
reasonably compact.
I have not yet updated all of the loop passes to correctly leverage the update
mechanisms demonstrated in the unittests. I'll do that in follow-up patches
along with improved FileCheck tests for those passes that ensure things work in
more realistic scenarios. In many cases, there isn't much we can do with these
until the loop simplified form and LCSSA form are in place.
Differential Revision: https://reviews.llvm.org/D28292
llvm-svn: 291651
These are interesting again because the user may not be aware that this
is a common reason preventing LICM.
A const is removed from an instruction pointer declaration in order to
pass it to ORE.
Differential Revision: https://reviews.llvm.org/D27940
llvm-svn: 291649
In some cases StructurizeCfg updates root node, but dominator info
remains unchanges, it causes crash when expensive checks are enabled.
To cope with this problem a new method was added to DominatorTreeBase
that allows adding new root nodes, it is called in StructurizeCfg to
put dominator tree in sync.
This change fixes PR27488.
Differential Revision: https://reviews.llvm.org/D28114
llvm-svn: 291530
Summary: LLVM's non-standard notion of phi nodes means we can't both try to substitute for undef in phi nodes *and* use phi nodes as leaders all the time. This changes NewGVN to use the same semantics as SimplifyPHINode to decide which phi nodes are equivalent.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28312
llvm-svn: 291308
This is fixing a bug where Loop Vectorization is widening a load but
with a lower alignment. Hoisting the load without propagating the alignment
will allow inst-combine to later deduce a higher alignment that what the pointer
actually is.
Differential Revision: https://reviews.llvm.org/D28408
llvm-svn: 291281
order to avoid jumpy line tables. Calls are left alone because they may be inlined.
Differential Revision: https://reviews.llvm.org/D28390
llvm-svn: 291258
Promotion is always legal when a store within the loop is guaranteed to execute.
However, this is not a necessary condition - for promotion to be memory model
semantics-preserving, it is enough to have a store that dominates every exit
block. This is because if the store dominates every exit block, the fact the
exit block was executed implies the original store was executed as well.
Differential Revision: https://reviews.llvm.org/D28147
llvm-svn: 291171
Summary:
Preheader instruction's operands will always be invariant w.r.t. the loop which its the preheader
for.
Memory aliases are handled in canSinkOrHoistInst.
Reviewers: danielcdh, davidxl
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28270
llvm-svn: 291132
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.
Differential Revision: https://reviews.llvm.org/D27857
llvm-svn: 291037
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.
llvm-svn: 290822
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.
This resolves a number of infinite loop cases, test reductions coming.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28193
llvm-svn: 290816
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation. This does not work. Now, we change the equivalences during congruence finding, where it belongs. We also initialize the equivalence table to give a maximal answer.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28192
llvm-svn: 290815
CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results.
I noticed this via inspection and don't have a concrete example which points to the issue.
llvm-svn: 290760
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.
Differential Revision: https://reviews.llvm.org/D28170
llvm-svn: 290738
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
"Changed" doesn't actually change within the loop, so there's
no reason to keep track of it - we always return false during
analysis and true after the transformation is made.
llvm-svn: 290735
This moves the exit block and insertion point computation to be eager,
instead of after seeing the first scalar we can promote.
The cost is relatively small (the computation happens anyway, see discussion
on D28147), and the code is easier to follow, and can bail out earlier
if there's a catchswitch present.
llvm-svn: 290729
We would check whether we have a prehader *or* dedicated exit blocks,
and go into the promotion loop. Then, for each alias set we'd check
if we have a preheader *and* dedicated exit blocks, and bail if not.
Instead, bail immediately if we don't have both.
llvm-svn: 290728
We want to recompute LCSSA only when we actually promoted a value.
This means we only need to look at changes made by promotion when
deciding whether to recompute it or not, not at regular sinking/hoisting.
(This was what the code was documented as doing, just not what it did)
Hopefully NFC.
llvm-svn: 290726
Summary:
The optimal iteration order for this problem is RPO order. We want to
process as many preds of a backedge as we can before we process the
backedge.
At the same time, as we add predicate handling, we want to be able to
touch instructions that are dominated by a given block by
ranges (because a change in value numbering a predicate possibly
affects all users we dominate that are using that predicate).
If we don't do it this way, we can't do value inference over
backedges (the paper covers this in depth).
The newgvn branch currently overshoots the last part, and guarantees
that it will touch *at least* the right set of instructions, but it
does touch more. This is because the bitvector instruction ranges are
currently generated in RPO order (so we take the max and the min of
the ranges of dominated blocks, which means there are some in the
middle we didn't have to touch that we did).
We can do better by sorting the dominator tree, and then just using
dominator tree order.
As a preliminary, the dominator tree has some RPO guarantees, but not
enough. It guarantees that for a given node, your idom must come
before you in the RPO ordering. It guarantees no relative RPO ordering
for siblings. We add siblings in whatever order they appear in the module.
So that is what we fix.
We sort the children array of the domtree into RPO order, and then use
the dominator tree for ordering, instead of RPO, since the dominator
tree is now a valid RPO ordering.
Note: This would help any other pass that iterates a forward problem
in dominator tree order. Most of them are single pass. It will still
maximize whatever result they compute. We could also build the
dominator tree in this order, but our incremental updates would still
put it out of sort order, and recomputing the sort order is almost as
hard as general incremental updates of the domtree.
Also note that the sorting does not affect any tests, etc. Nothing
depends on domtree order, including the verifier, the equals
functions for domtree nodes, etc.
How much could this matter, you ask?
Here are the current numbers.
This is generated by running NewGVN over all files in LLVM.
Note that once we propagate equalities, the differences go up by an
order of magnitude or two (IE instead of 29, the max ends up in the
thousands, since the worst case we add a factor of N, where N is the
number of branch predicates). So while it doesn't look that stark for
the default ordering, it gets *much much* worse. There are also
programs in the wild where the difference is already pretty stark
(2 iterations vs hundreds).
RPO ordering:
759040 Number of iterations is 1
112908 Number of iterations is 2
Default dominator tree ordering:
755081 Number of iterations is 1
116234 Number of iterations is 2
603 Number of iterations is 3
27 Number of iterations is 4
2 Number of iterations is 5
1 Number of iterations is 7
Dominator tree sorted:
759040 Number of iterations is 1
112908 Number of iterations is 2
<yay!>
Really bad ordering (sort domtree siblings in postorder. not quite the
worst possible, but yeah):
754008 Number of iterations is 1
21 Number of iterations is 10
8 Number of iterations is 11
6 Number of iterations is 12
5 Number of iterations is 13
2 Number of iterations is 14
2 Number of iterations is 15
3 Number of iterations is 16
1 Number of iterations is 17
2 Number of iterations is 18
96642 Number of iterations is 2
1 Number of iterations is 20
2 Number of iterations is 21
1 Number of iterations is 22
1 Number of iterations is 29
17266 Number of iterations is 3
2598 Number of iterations is 4
798 Number of iterations is 5
273 Number of iterations is 6
186 Number of iterations is 7
80 Number of iterations is 8
42 Number of iterations is 9
Reviewers: chandlerc, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28129
llvm-svn: 290699
emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the
same type that the one stored in container. It is ugly and it might be even slower (see
Scott Meyers presentation about emplacement).
llvm-svn: 290685
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28111
llvm-svn: 290550
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
Differential Revision: https://reviews.llvm.org/D25848
llvm-svn: 290427
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...
The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.
Differential Revision: https://reviews.llvm.org/D26224
llvm-svn: 290346
from the old pass manager in the new one.
I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.
I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.
Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.
One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.
I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.
I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.
Differential Revision: https://reviews.llvm.org/D28042
llvm-svn: 290325
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).
As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.
clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.
This changes opt's default to off as well to match clang. The tests are
modified to explicitly enable the transformation.
llvm-svn: 290235
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
Summary:
This patch will add loop metadata on the pre and post loops generated by IRCE.
Currently, we have metadata for disabling optimizations such as vectorization,
unrolling, loop distribution and LICM versioning (and confirmed that these
optimizations check for the metadata before proceeding with the transformation).
The pre and post loops generated by IRCE need not go through loop opts (since
these are slow paths).
Added two test cases as well.
Reviewers: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26806
llvm-svn: 289588
Summary:
This is last in of a series of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.
This patch adds the code to update the control and data flow graphs
to remove the dead control flow.
Also update unit tests to test the capability to remove dead,
may-be-infinite loop which is enabled by the switch
-adce-remove-loops.
Previous patches:
D23824 [ADCE] Add handling of PHI nodes when removing control flow
D23559 [ADCE] Add control dependence computation
D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)
Reviewers: dberlin, majnemer, nadav, mehdi_amini
Subscribers: llvm-commits, david2050, freik, twoh
Differential Revision: https://reviews.llvm.org/D24918
llvm-svn: 289548
The motivating example is:
extern int patatino;
int goo() {
int x = 0;
for (int i = 0; i < 1000000; ++i) {
x *= patatino;
}
return x;
}
Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:
0000000000000000 <g>:
xor %eax,%eax
retq
llvm-svn: 289175
The fix committed in r288851 doesn't cover all the cases.
In particular, if we have an instruction with side effects
which has a no non-dbg use not depending on the bits, we still
perform RAUW destroying the dbg.value's first argument.
Prevent metadata from being replaced here to avoid the issue.
Differential Revision: https://reviews.llvm.org/D27534
llvm-svn: 288987
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.
Differential Revision: https://reviews.llvm.org/D27468
llvm-svn: 288903
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.
Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17
but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.
Differential Revision: https://reviews.llvm.org/D27471
llvm-svn: 288851
There are two cases handled here:
1) a branch on undef
2) a switch with an undef condition.
Both cases are currently handled by ResolvedUndefsIn. If we have
a branch on undef, we force its value to false (which is trivially
foldable). If we have a switch on undef, we force to the first
constant (which is also foldable).
llvm-svn: 288725
so we can stop using DW_OP_bit_piece with the wrong semantics.
The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html
The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.
Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.
Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809
llvm-svn: 288683
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.
Differential Revision: https://reviews.llvm.org/D27122
llvm-svn: 288464
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html
This is for a couple of reasons:
- Values of type PointerType are unlike the other SequentialTypes (arrays
and vectors) in that they do not hold values of the element type. By moving
PointerType we can unify certain aspects of how the other SequentialTypes
are handled.
- PointerType will have no place in the SequentialType hierarchy once
pointee types are removed, so this is a necessary step towards removing
pointee types.
Differential Revision: https://reviews.llvm.org/D26595
llvm-svn: 288462
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
This just extracts out the transfer rules for constant ranges into a single shared point. As it happens, neither bit of code actually overlaps in terms of the handled operators, but with this change that could easily be tweaked in the future.
I also want to have this separated out to make experimenting with a eager value info implementation and possibly a ValueTracking-like fixed depth recursion peephole version. There's no reason all four of these can't share a common implementation which reduces the chances of bugs.
Differential Revision: https://reviews.llvm.org/D27294
llvm-svn: 288413
[recommitting after the fix in r288307]
This requires some changes to the opt-diag API. Hal and I have
discussed this at the Dev Meeting and came up with a streaming delimiter
(setExtraArgs) to solve this.
Arguments after this delimiter are only included in the optimization
records and not in the remarks printed in the compiler output. (Note,
how in the test the content of the YAML file changes but the remarks on
the compiler output don't.)
This implements the green GVN message with a bug fix at line
http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
The fix is that now we properly include the constant value in the
message: "load of type i32 eliminated in favor of 7"
Differential Revision: https://reviews.llvm.org/D26489
llvm-svn: 288380
If LoopInfo is available during GVN, BasicAA will use it. However
MergeBlockIntoPredecessor does not update LI as it merges blocks.
This didn't use to cause problems because LI was freed before
GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is
extended so LI needs to be kept up-to-date during GVN.
Differential Revision: https://reviews.llvm.org/D27288
llvm-svn: 288307
Summary:
Fix a case when first register in a search has maximum
RegUses.getUsedByIndices(Reg).count()
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D26877
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 288278
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
llvm-svn: 288274
Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based
for loops", introduced a bug into rebuildSSA, wherein we were iterating
over an instruction's use list while modifying it, without taking care
to do this correctly.
llvm-svn: 288200
The flag was introduced because the optimization controlled by the flag initially caused regressions. All the regressions were fixed some time ago and the flag has been false for quite a while.
llvm-svn: 288154
Enable scalar hoisting at -Oz as it is safe to hoist scalars to a place
where they are partially needed.
Differential Revision: https://reviews.llvm.org/D27111
llvm-svn: 288141
Preserving lifetime markers isn't as important as allowing promotion,
so just drop the lifetime markers if necessary.
This also fixes an assertion failure where other parts of SROA assumed
that lifetime markers never block promotion.
Fixes https://llvm.org/bugs/show_bug.cgi?id=29139.
Differential Revision: https://reviews.llvm.org/D24854
llvm-svn: 288074
Summary:
As far as I can tell, doing our own computations in
NearestCommonDominator is a false optimization -- DomTree will build up
what appears to be exactly this data when it decides it's worthwhile.
Moreover, by building the cache ourselves, we cannot take advantage of
the cache that the domtree might have available.
In addition, I am not convinced of the correctness of the original code.
In particular, setting ResultIndex = 1 on the first addBlock instead of
setting it to 0 is quite fishy. Similarly, it's not clear to me that
setting IndexMap[Node] = 0 for every node as we walk up the tree finding
a common parent is correct. But rather than ponder over these
questions, I'd rather just make the code do the obviously-correct thing.
This patch also changes the NearestCommonDominator API a bit, improving
the names and getting rid of the boolean parameter in addBlock -- see
http://jlebar.com/2011/12/16/Boolean_parameters_to_API_functions_considered_harmful..html
Reviewers: arsenm
Subscribers: aemerson, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26998
llvm-svn: 288050
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.
Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.
Differential Revision: http://reviews.llvm.org/D26299
llvm-svn: 287925
Summary:
No need to copy the RPOT vector before using it. Switch from std::map
to SmallDenseMap. Get rid of an unused variable (TempVisited). Get rid
of a typedef, RNVector, which is now used only once.
Differential Revision: https://reviews.llvm.org/D26997
llvm-svn: 287721
Summary:
"addRequired" and "addPreserved" look very similar when squished up next
to each other -- without the newline this code looked to me like it was
addRequired'ing DominatorTreeWrapperPass twice.
Reviewers: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26996
llvm-svn: 287720
Summary: Lets us get rid of one member variable too.
Reviewers: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26992
llvm-svn: 287716
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.
llvm-svn: 287709
Allow using an instruction other than a mul or phi as the base for
root-finding. For example, the included testcase includes a loop
which requires using a getelementptr as the base for root-finding.
Differential Revision: https://reviews.llvm.org/D26529
llvm-svn: 287588
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.
llvm-svn: 287206
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Reviewers: davidxl, mzolotukhin
Subscribers: sanjoy, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26527
llvm-svn: 287186
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
llvm-svn: 286999
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.
Differential Revision: https://reviews.llvm.org/D26059
llvm-svn: 286987
Summary:
Unfolding selects was previously done with the help of a vector
of pointers that was then sorted to be able to remove duplicates.
As this sorting depends on the memory addresses, it was
non-deterministic. A SetVector is used now so that duplicates are
removed without the need of sorting first.
Reviewers: mgrang, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26450
llvm-svn: 286807
All existing callers were manually extracting information out of an existing
GEP instruction and passing it to getGEPExpr(). Simplify the interface by
changing it to take a GEPOperator instead.
llvm-svn: 286751
No testcase included because I can't figure out how to reduce it.
(It's easy to write a testcase where rotation clones an assume,
but that doesn't actually seem to trigger the crash in opt on
its own; maybe an issue with the laziness?)
Differential Revision: https://reviews.llvm.org/D26434
llvm-svn: 286410
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 286389
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Differential Revision: https://reviews.llvm.org/D26185
llvm-svn: 286347
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.
Reviewers: davidxl, hfinkel
Subscribers: mehdi_amini, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26155
llvm-svn: 286325
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Patch by James Molloy and Pablo Barrio.
Reviewers: rengolin, haicheng, sebpop
Subscribers: jojo, jmolloy, llvm-commits
Differential Revision: https://reviews.llvm.org/D26391
llvm-svn: 286236
Summary:
In some specific scenarios with well understood operand bundle types
(like `"deopt"`) it may be possible to go ahead and convert recursion to
iteration, but TailRecursionElimination does not have that logic today
so avoid doing the right thing for now.
I need some input on whether `"funclet"` operand bundles should also
block tail recursion elimination. If not, I'll allow TRE across calls
with `"funclet"` operand bundles and add a test case.
Reviewers: rnk, majnemer, nlewycky, ahatanak
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26270
llvm-svn: 286147
Argument evaluation order is one of the edge cases where Clang differs
from GCC, yielding different IR depending on which compiler LLVM was
built with. Make the order deterministic and tune the test to actually
verify the order instead of trying to hide it.
llvm-svn: 286126
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.
It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range. But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.
Fixes PR30914.
Reviewers: jeremyhu
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D26323
llvm-svn: 286038
This condition is trivially always true prior to the change. The comment
at the call site makes it clear that we expect *all* of these to be '=',
'S', or 'I' so fix the code.
We have a bug I will update to track the fact that Clang doesn't warn on
this: http://llvm.org/PR13101
llvm-svn: 285930
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).
The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.
Fixes https://llvm.org/bugs/show_bug.cgi?id=30818
Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini
Subscribers: dberlin
Differential Revision: https://reviews.llvm.org/D26154
llvm-svn: 285793
Fixes PR 30784. Discussed with Justin, who pointed out that
in the new PassManager infrastructure we can have more fine-grained
control on which analyses we want to preserve, but this is the
best we can do with the current infrastructure.
llvm-svn: 285380
Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.
Reviewers: hfinkel, davidxl, chandlerc
Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D22778
llvm-svn: 285308
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Differential Revision: https://reviews.llvm.org/D25953
llvm-svn: 285185
When indvars widened an induction variable, the debug location for the loop
increment computation was incorrectly set equal to the debug loc of the loop
latch terminator.
This patch fixes the issue by propagating the correct location from the
original loop increment instruction to the new widened increment.
Differential Revision: https://reviews.llvm.org/D25872
llvm-svn: 285083
Now that MemorySSA keeps track of whether MemoryUses are optimized, use
getClobberingMemoryAccess() to check MemoryUse memory dependencies since
it should no longer be so expensive.
This is a follow-up change to https://reviews.llvm.org/D25881
llvm-svn: 285080
Summary:
When using MemorySSA, re-optimize MemoryPhis when removing a store since
this may create MemoryPhis with all identical arguments.
Also, when using MemorySSA to check if two MemoryUses are reading from
the same version of the heap, use the defining access instead of calling
getClobberingAccess, since the latter can currently result in many more
AA calls. Once the MemorySSA use optimization tracking changes are
done, we can remove this limitation, which should result in more loads
being CSE'd.
Reviewers: dberlin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25881
llvm-svn: 284984
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Patch by James Molloy and Pablo Barrio.
Reviewers: reames, bkramer, mcrosier, gberry, haicheng, jmolloy, sebpop
Subscribers: jojo, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D25477
llvm-svn: 284971
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
There's no agreement about this patch. I personally find the
PRE machinery of the current GVN hard enough to reason about
that I'm not sure I'll try to land this again, instead of working
on the rewrite).
llvm-svn: 284796
This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example.
for.body:
%i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ]
%within_limits = icmp ult i32 %i, 64
br i1 %within_limits, label %continue, label %for.end
continue:
%i.i64 = zext i32 %i to i64
%arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64
%val = load i32, i32* %arrayidx, align 4
br label %for.inc
for.inc:
%i.inc = add nsw nuw i32 %i, 1
%cmp = icmp slt i32 %i.inc, %limit
br i1 %cmp, label %for.body, label %for.end
There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D25738
llvm-svn: 284629
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Differential Revision: https://reviews.llvm.org/D25464
llvm-svn: 284290
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).
PR30577
Differential Revision: https://reviews.llvm.org/D25119
llvm-svn: 284013
This implements the cleanup that Danny asked to commit separately from the
previous fix to GVN-hoist in https://reviews.llvm.org/D25476#inline-219818
Tested with ninja check on x86_64-linux.
llvm-svn: 283967