Commit Graph

17383 Commits

Author SHA1 Message Date
Chandler Carruth 20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Adam Nemet 8c83386f89 [SLP] Mark values in Dot that need to be extracted
llvm-svn: 297361
2017-03-09 05:48:03 +00:00
Peter Collingbourne 0152c8156b WholeProgramDevirt: Implement importing for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29854

llvm-svn: 297350
2017-03-09 01:11:15 +00:00
Peter Collingbourne 6d284fab20 WholeProgramDevirt: Implement importing for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29844

llvm-svn: 297333
2017-03-09 00:21:25 +00:00
Teresa Johnson d820447212 Perform symbol binding for .symver versioned symbols
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".

While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.

E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.

The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.

Reviewers: rafael, pcc

Reviewed By: pcc

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30485

llvm-svn: 297332
2017-03-09 00:19:49 +00:00
Evgeniy Stepanov 8537d9994d Don't merge global constants with non-dbg metadata.
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.

The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.

llvm-svn: 297327
2017-03-09 00:03:37 +00:00
George Burgess IV ecb95f58a2 [MemCpyOpt] clang-format + trim the legacy pass. NFC.
None of the declarations below `// Helper functions` seem to have
definitions anymore.

llvm-svn: 297309
2017-03-08 21:28:19 +00:00
Adam Nemet 95da05c3f5 [SLP] Visualize SLP trees with -view-slp-tree
Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure.  We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.

I decorated the graph where a value needs to be gathered for one reason or
another.  These are the red nodes.

There are other improvement I am planning to make as I work through my case
here.  For example, I would also like to mark nodes that need to be extracted.

Differential Revision: https://reviews.llvm.org/D30731

llvm-svn: 297303
2017-03-08 18:47:50 +00:00
Matthew Simpson 3388de1349 [LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.

llvm-svn: 297302
2017-03-08 18:18:20 +00:00
Jun Bum Lim ac170872b2 [JumpThread] Use AA in SimplifyPartiallyRedundantLoad()
Summary: Use AA when scanning to find an available load value.

Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin

Reviewed By: rengolin, dberlin

Subscribers: aemerson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30352

llvm-svn: 297284
2017-03-08 15:22:30 +00:00
Sanjay Patel 62906af379 [InstCombine] avoid crashing on shuffle shrinkage when input type is not same as result type
llvm-svn: 297280
2017-03-08 15:02:23 +00:00
Sam Parker 0f4db38c20 [LoopRotate] Propagate dbg.value intrinsics
Recommitting patch which was previously reverted in r297159. These
changes should address the casting issues.

The original patch enables dbg.value intrinsics to be attached to
newly inserted PHI nodes.

Differential Review: https://reviews.llvm.org/D30701

llvm-svn: 297269
2017-03-08 09:56:22 +00:00
Davide Italiano b6ddd7a437 [SCCP] Merge markOverdefined and markAnythingOverdefined.
There's no need to have two separate APIs.

llvm-svn: 297253
2017-03-08 01:26:37 +00:00
Sanjay Patel fe9705149b [InstCombine] shrink truncated insertelement into undef vector
This is the 2nd part of solving:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. 
We're limiting this transform to undef rather than any constant to avoid backend problems.

Differential Revision: https://reviews.llvm.org/D30137

llvm-svn: 297242
2017-03-07 23:27:14 +00:00
Evgeniy Stepanov 7a5cfa9a11 Fix one-after-the-end type metadata handling in globalsplit.
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.

Differential Revision: https://reviews.llvm.org/D30716

llvm-svn: 297236
2017-03-07 22:18:48 +00:00
Sanjay Patel 53fa17a014 [InstCombine] shrink truncated splat shuffle (2nd try)
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.

This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297232
2017-03-07 21:45:16 +00:00
Gor Nishanov c52006ab09 [coroutines] Add handling for unwind coro.ends
Summary:
The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
and should not be present in resume and destroy parts.

In landing pads coro.end is replaced with an appropriate instruction to unwind to
caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.

For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:

```
    ehcleanup:
      %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
      br i1 %InResumePart, label %eh.resume, label %cleanup.cont

    cleanup.cont:
      ; rest of the cleanup

    eh.resume:
      %exn = load i8*, i8** %exn.slot, align 8
      %sel = load i32, i32* %ehselector.slot, align 4
      %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
      %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
      resume { i8*, i32 } %lpad.val29

```
The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
thus leading to immediate unwind to the caller, whereas in start function it
is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
code that is only needed during initial invocation of the coroutine.

For Windows Exception handling model, a frontend should attach a funclet bundle
referring to an enclosing cleanuppad as follows:

```
    ehcleanup:
      %tok = cleanuppad within none []
      %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
      cleanupret from %tok unwind label %RestOfTheCleanup
```

The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.

Reviewers: majnemer

Reviewed By: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25543

llvm-svn: 297223
2017-03-07 21:00:54 +00:00
Xin Tong ac2b5767af [JumpThread] Simplify CmpInst-as-Condition branch-folding a bit.
Summary: Simplify CmpInst-as-Condition branch-folding a bit.

Reviewers: sanjoy, efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30429

llvm-svn: 297186
2017-03-07 18:59:09 +00:00
Matthew Simpson c86b2134c7 [LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.

llvm-svn: 297179
2017-03-07 18:47:30 +00:00
Sanjay Patel 6d30606168 revert r297155 because there's a clang test that depends on InstCombine:
tools/clang/test/CodeGen/zvector.c

llvm-svn: 297166
2017-03-07 17:41:45 +00:00
Adrian Prantl d4056501fb Revert "Strip debug info when inlining into a nodebug function."
This reverts commit r296488.

As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.

llvm-svn: 297163
2017-03-07 17:28:57 +00:00
Nico Weber 3b2f0094d7 Revert r297132, it caused PR32171
llvm-svn: 297159
2017-03-07 17:23:52 +00:00
Sanjay Patel defdb7bed5 [InstCombine] shrink truncated splat shuffle
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts, 
but changing splat is ok. The transform is very similar to the existing 
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297155
2017-03-07 16:10:36 +00:00
Sam Parker 6ec5fdbc94 [LoopRotate] Update dbg.value intrinsics
Propagate debug info through the newly inserted PHI nodes.

Differential Revision: https://reviews.llvm.org/D30190

llvm-svn: 297132
2017-03-07 09:34:25 +00:00
Sanjoy Das 30c3538e2e [LoopUnrolling] Fix loop size check for peeling
Summary:
We should check if loop size allows us to peel at least one iteration
before we do so.

Patch by Max Kazantsev!

Reviewers: sanjoy, mkuper, efriedma

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30632

llvm-svn: 297122
2017-03-07 06:03:15 +00:00
Michael Kuperstein 768d013a03 [SLP] Revert r296863 due to miscompiles.
Details and reproducer are on the email thread for r296863.

llvm-svn: 297103
2017-03-06 23:54:51 +00:00
Sanjay Patel c3b4735b6f [InstCombine] use dyn_cast instead of isa+cast; NFCI
llvm-svn: 297092
2017-03-06 23:25:28 +00:00
Hans Wennborg 254f5fa5f2 Disable gvn-hoist (PR32153)
llvm-svn: 297075
2017-03-06 21:10:40 +00:00
Daniel Berlin 343576a6f4 NewGVN: Remove DebugUnknownExprs, just mark the instructions as unused
llvm-svn: 297047
2017-03-06 18:42:39 +00:00
Daniel Berlin 856fa14e06 NewGVN: Only call isInstructionTrivially dead once per instruction.
llvm-svn: 297046
2017-03-06 18:42:27 +00:00
Dehao Chen c632a393b7 Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.

Reviewers: davidxl, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D30658

llvm-svn: 297038
2017-03-06 17:49:59 +00:00
Michael Kruse 811de8a619 [BasicBlockUtils] Check for nullptr before updating LoopInfo.
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.

Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).

All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.

llvm-svn: 297016
2017-03-06 15:33:05 +00:00
Craig Topper b9dbd4d596 [SimplifyCFG] Use APInt::operator| instead of APInt::Or. NFC
I'm looking to improve operator| to support rvalue references and may remove APInt::Or.

llvm-svn: 296982
2017-03-05 01:08:19 +00:00
Evgeny Stupachenko d6aa0d02c2 Set option enabling LSR alternative way to resolve complex solution to false.
Differential Revision: http://reviews.llvm.org/D29862

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 296959
2017-03-04 03:14:05 +00:00
Peter Collingbourne f0bb90b126 Fix build.
llvm-svn: 296949
2017-03-04 01:38:05 +00:00
Peter Collingbourne 77a8d563a3 WholeProgramDevirt: Implement exporting for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29846

llvm-svn: 296948
2017-03-04 01:34:53 +00:00
Peter Collingbourne 2325bb34c1 WholeProgramDevirt: Implement exporting for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29811

llvm-svn: 296945
2017-03-04 01:31:01 +00:00
Peter Collingbourne b406baaeef WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
Any unsuccessful llvm.type.checked.load devirtualizations will be translated
into uses of llvm.type.test, so we need to add the resulting llvm.type.test
intrinsics to the function summaries so that the LowerTypeTests pass will
export them.

Differential Revision: https://reviews.llvm.org/D29808

llvm-svn: 296939
2017-03-04 01:23:30 +00:00
Daniel Berlin 0350a87ce3 NewGVN: Be consistent in what order we compare operands for swapping.
NFC.

llvm-svn: 296935
2017-03-04 00:44:43 +00:00
Sanjoy Das 0a4ec554c1 Fix a compiler warning
llvm-svn: 296903
2017-03-03 18:53:09 +00:00
Sanjoy Das 664c925a57 [LoopUnrolling] Peel loops with invariant backedge Phi input
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.

Patch by Max Kazantsev!

Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames

Reviewed By: mkuper

Subscribers: mkuper, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30161

llvm-svn: 296898
2017-03-03 18:19:15 +00:00
Sanjoy Das eed71b9e1c [LoopUnrolling] Re-prioritize Peeling and Partial unrolling
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.

This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.

Patch by Max Kazantsev!

Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Reviewed By: mkuper

Subscribers: mkuper, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D30243

llvm-svn: 296897
2017-03-03 18:19:10 +00:00
Simon Pilgrim e938a152c5 Use APInt::getLowBitsSet instead of APInt::getBitsSet for lower bit mask creation
llvm-svn: 296882
2017-03-03 16:56:33 +00:00
Benjamin Kramer 9528f8c2fb Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline.""
This reverts commit r296759. Miscompiles bash.

llvm-svn: 296872
2017-03-03 14:27:53 +00:00
Mohammad Shahid bdac9f30c0 [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.

Reviewers: mkuper

Differential Revision: https://reviews.llvm.org/D30159

Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
llvm-svn: 296863
2017-03-03 10:02:47 +00:00
Evgeniy Stepanov d0285f21d0 [msan] Handle x86_sse_stmxcsr and x86_sse_ldmxcsr.
llvm-svn: 296848
2017-03-03 01:12:43 +00:00
Evgeniy Stepanov c00e45eada [msan] Remove stale comments.
ClStoreCleanOrigin flag was removed back in 2014.

llvm-svn: 296844
2017-03-03 00:25:56 +00:00
Peter Collingbourne 3baa72af7d ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when looking for virtual functions.
Such edges may otherwise result in infinite recursion if a pointer to a vtable
is reachable from the vtable itself. This can happen in practice if a TU
defines the ABI types used to implement RTTI, and is itself compiled with RTTI.

Fixes PR32121.

llvm-svn: 296839
2017-03-02 23:10:17 +00:00
Daniel Berlin dcb004fdf1 Move defClobbersUseOrDef to being a protected member of a class since we don't want anyone else using it
llvm-svn: 296838
2017-03-02 23:06:46 +00:00
Nikolai Bozhenov 4a04fb9e90 [BypassSlowDivision] Use ValueTracking to simplify run-time checks
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.

With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:

 1) Both operands are known to be short enough. Then, the long division can be
    simply replaced with a short one without CFG modification.

 2) If a division is unsigned and the dividend is known to be short then the
    long division is not needed at all. Because if the divisor is too big for
    short division then the quotient is obviously zero (and the remainder is
    equal to the dividend). Actually, the division is not needed when
    (divisor > dividend).

Differential Revision: https://reviews.llvm.org/D29897

llvm-svn: 296832
2017-03-02 22:12:15 +00:00