This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
llvm-svn: 297163
Summary:
We should check if loop size allows us to peel at least one iteration
before we do so.
Patch by Max Kazantsev!
Reviewers: sanjoy, mkuper, efriedma
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30632
llvm-svn: 297122
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.
Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).
All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.
llvm-svn: 297016
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Patch by Max Kazantsev!
Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames
Reviewed By: mkuper
Subscribers: mkuper, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30161
llvm-svn: 296898
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Patch by Max Kazantsev!
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Reviewed By: mkuper
Subscribers: mkuper, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30243
llvm-svn: 296897
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.
With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:
1) Both operands are known to be short enough. Then, the long division can be
simply replaced with a short one without CFG modification.
2) If a division is unsigned and the dividend is known to be short then the
long division is not needed at all. Because if the divisor is too big for
short division then the quotient is obviously zero (and the remainder is
equal to the dividend). Actually, the division is not needed when
(divisor > dividend).
Differential Revision: https://reviews.llvm.org/D29897
llvm-svn: 296832
The most important goal of the patch is to break large insertFastDiv function
into separate pieces, so that later a different fast insertion logic can be
implemented using some of these pieces.
Differential Revision: https://reviews.llvm.org/D29896
llvm-svn: 296828
The LLVM backend cannot produce any debug info for an llvm::Function
without a DISubprogram attachment. When inlining a debug-info-carrying
function into a nodebug function, there is therefore no reason to keep
any debug info intrinsic calls or debug locations on the instructions.
This fixes a problem discovered in PR32042.
rdar://problem/30679307
llvm-svn: 296488
This was suggested in D27855: have the inliner add assumptions, so we don't
lose nonnull info provided by argument attributes.
This still doesn't solve PR28430 (dyn_cast), but this gets us closer.
https://reviews.llvm.org/D29999
llvm-svn: 296366
Summary:
Depends on D29606 and D29682
Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).
Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29747
llvm-svn: 295889
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.
Summary: Mostly cleanup
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30221
llvm-svn: 295887
Summary:
This lets one add aliasing stores to the updater.
(i'm next going to move the creation/etc functions to the updater)
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30154
llvm-svn: 295677
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:
1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.
The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.
This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).
Patch by Max Kazantsev!
Reviewers: anna, sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30036
llvm-svn: 295410
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
Summary:
This adds support for placing predicateinfo such that it affects critical edges.
This fixes the issues mentioned by Nuno on the mailing list.
Depends on D29519
Reviewers: davide, nlopes
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29606
llvm-svn: 294921
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html
When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.
The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.
Reviewers: probinson, aprantl, davidxl, hfinkel, echristo
Reviewed By: hfinkel
Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26420
llvm-svn: 294782
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:
if (cond) { do something } else {deoptimize}
Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:
if (cond1) {
// code A
} else {
// code B
}
// code C
guard(cond2)
// code D
If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:
if (cond1) {
// code A
// code C
} else {
// code B
// code C
guard(cond2)
}
// code D
Thus, removing the guard from one of execution branches.
Patch by Max Kazantsev!
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29620
llvm-svn: 294617
Summary:
This patch adds a utility to build extended SSA (see "ABCD: eliminating
array bounds checks on demand"), and an intrinsic to support it. This
is then used to get functionality equivalent to propagateEquality in
GVN, in NewGVN (without having to replace instructions as we go). It
would work similarly in SCCP or other passes. This has been talked
about a few times, so i built a real implementation and tried to
productionize it.
Copies are inserted for operands used in assumes and conditional
branches that are based on comparisons (see below for more)
Every use affected by the predicate is renamed to the appropriate
intrinsic result.
E.g.
%cmp = icmp eq i32 %x, 50
br i1 %cmp, label %true, label %false
true:
ret i32 %x
false:
ret i32 1
will become
%cmp = icmp eq i32, %x, 50
br i1 %cmp, label %true, label %false
true:
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 }
%x.0 = call @llvm.ssa_copy.i32(i32 %x)
ret i32 %x.0
false:
ret i23 1
(you can use -print-predicateinfo to get an annotated-with-predicateinfo dump)
This enables us to easily determine what operations are affected by a
given predicate, and how operations affected by a chain of
predicates.
Reviewers: davide, sanjoy
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D29519
Update for review comments
Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong
Update for review comments
llvm-svn: 294351
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.
I will remove all importing support from ModuleLinker in a separate
change.
Differential Revision: https://reviews.llvm.org/D29468
llvm-svn: 294014
Summary:
I have a similar patch up for review already (D29173). If you prefer I
can squash them both together.
Also I think there more potential for code sharing between
LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for
that would be worthwhile?
Reviewers: mkuper, mzolotukhin
Reviewed By: mkuper, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29311
llvm-svn: 293758
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.
@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.
Reviewers: mzolotukhin, chandlerc, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits, mkuper
Differential Revision: https://reviews.llvm.org/D29173
llvm-svn: 293615
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
Reviewers: aprantl, andreadb, danielcdh
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29062
llvm-svn: 293372
Summary: Extend the MemorySSAUpdater API to allow movement to arbitrary places
Reviewers: davide, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29239
llvm-svn: 293363