Commit Graph

323 Commits

Author SHA1 Message Date
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Max Kazantsev b9e65cbddf Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.

We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:

- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
  and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
  as regular throwing call and have no idea that the condition of guard may be used to prove
  something. One well-known place which had caused us troubles in the past is explicit loop
  iteration count calculation in SCEV. Another example is new loop unswitching which is not
  aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:


    %widenable_condition = call i1 @llvm.experimental.widenable.condition()
    %new_condition = and i1 %cond, %widenable_condition
    br i1 %new_condition, label %guarded, label %deopt

  guarded:
    ; Guarded instructions

  deopt:
    call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.

Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207

llvm-svn: 348593
2018-12-07 14:39:46 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Vitaly Buka 4493fe1c1b [stack-safety] Empty local passes for Stack Safety Local Analysis
Reviewers: eugenis, vlad.tsyrklevich

Subscribers: mgorny, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D54502

llvm-svn: 347602
2018-11-26 21:57:47 +00:00
Mikael Holmen b6f76002d9 [PM] Port Scalarizer to the new pass manager.
Patch by: markus (Markus Lavin)

Reviewers: chandlerc, fedor.sergeev

Reviewed By: fedor.sergeev

Subscribers: llvm-commits, Ka-Ka, bjope

Differential Revision: https://reviews.llvm.org/D54695

llvm-svn: 347392
2018-11-21 14:00:17 +00:00
Xin Tong 642c8d3575 [LTO] Load sample profile in LTO link step.
Summary:
Load sample profile in LTO link step.
ThinLTO calls populateModulePassManager to load the profile

Reviewers: tejohnson, davidxl, danielcdh

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D54564

llvm-svn: 346971
2018-11-15 18:06:42 +00:00
Philip Pfaffe 2d4effb25c Add an OptimizerLast EP
Summary:
It turns out that we need an OptimizerLast PassBuilder extension point
after all. I missed the relevance of this EP the first time. By legacy PM magic,
function passes added at this EP get added to the last _Function_ PM, which is a
feature we lost when dropping this EP for the new PM.

A key difference between this and the legacy PassManager's OptimizerLast
callback is that this extension point is not triggered at O0. Extensions
to the O0 pipeline should append their passes to the end of the overall
pipeline.

Differential Revision: https://reviews.llvm.org/D54374

llvm-svn: 346645
2018-11-12 11:17:07 +00:00
Fedor Sergeev 412ed34744 [LoopUnroll] allow customization for new-pass-manager version of LoopUnroll
Unlike its legacy counterpart new pass manager's LoopUnrollPass does
not provide any means to select which flavors of unroll to run
(runtime, peeling, partial), relying on global defaults.

In some cases having ability to run a restricted LoopUnroll that
does more than LoopFullUnroll is needed.

Introduced LoopUnrollOptions to select optional unroll behaviors.
Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing.

Reviewers: chandlerc, tejohnson
Differential Revision: https://reviews.llvm.org/D53440

llvm-svn: 345723
2018-10-31 14:33:14 +00:00
Leonard Chan eebecb3214 Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer passes with -fsanitize=address"
This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit
b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times
by initializing the AddressSanitizer on every function run.

The corresponding revisions are https://reviews.llvm.org/D52814 and
https://reviews.llvm.org/D52739.

llvm-svn: 345433
2018-10-26 22:51:51 +00:00
Teresa Johnson d725335bd1 [hot-cold-split] Only perform splitting in ThinLTO backend post-link
Summary:
Fix the new PM to only perform hot cold splitting once during ThinLTO,
by skipping it in the pre-link phase.

This was already fixed in the old PM by the move of the hot cold split
pass later (after the early return when PrepareForThinLTO) by r344869.

Reviewers: vsk, sebpop, hiraditya

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D53611

llvm-svn: 345096
2018-10-23 22:57:40 +00:00
Aditya Kumar d9e2e383a9 Schedule Hot Cold Splitting pass after most optimization passes
Summary:
In the new+old pass manager, hot cold splitting was schedule too early.
Thanks to Vedant for pointing this out.

Reviewers: sebpop, vsk

Reviewed By: sebpop, vsk

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D53437

llvm-svn: 344869
2018-10-21 18:11:56 +00:00
Fedor Sergeev bd6b2138b9 [NewPM] teach -passes= to emit meaningful error messages
All the PassBuilder::parse interfaces now return descriptive StringError
instead of a plain bool. It allows to make -passes/aa-pipeline parsing
errors context-specific and thus less confusing.

TODO: ideally we should also make suggestions for misspelled pass names,
but that requires some extensions to PassBuilder.

Reviewed By: philip.pfaffe, chandlerc
Differential Revision: https://reviews.llvm.org/D53246

llvm-svn: 344685
2018-10-17 10:36:23 +00:00
Fedor Sergeev a01be0f217 Revert "[NewPM] teach -passes= to emit meaningful error messages"
This reverts r344519 due to failures in pipeline-parsing test.

llvm-svn: 344524
2018-10-15 15:36:08 +00:00
Fedor Sergeev 4155a77e98 [NewPM] teach -passes= to emit meaningful error messages
Summary:
All the PassBuilder::parse interfaces now return descriptive StringError
instead of a plain bool. It allows to make -passes/aa-pipeline parsing
errors context-specific and thus less confusing.

TODO: ideally we should also make suggestions for misspelled pass names,
but that requires some extensions to PassBuilder.

Reviewed By: philip.pfaffe, chandlerc
Differential Revision: https://reviews.llvm.org/D53246

llvm-svn: 344519
2018-10-15 15:00:18 +00:00
Leonard Chan 64e21b5cfd [PassManager/Sanitizer] Port of AddresSanitizer pass from legacy to new PassManager
This patch ports the legacy pass manager to the new one to take advantage of
the benefits of the new PM. This involved moving a lot of the declarations for
`AddressSantizer` to a header so that it can be publicly used via
PassRegistry.def which I believe contains all the passes managed by the new PM.

This patch essentially decouples the instrumentation from the legacy PM such
hat it can be used by both legacy and new PM infrastructure.

Differential Revision: https://reviews.llvm.org/D52739

llvm-svn: 344274
2018-10-11 18:31:51 +00:00
Richard Smith 6c67662816 Add a flag to remap manglings when reading profile data information.
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.

The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.

Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.

Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.

This is the LLVM side of Clang r344199.

Reviewers: davidxl, tejohnson, dlj, erik.pilkington

Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51249

llvm-svn: 344200
2018-10-10 23:13:47 +00:00
Aditya Kumar 9e20ade72a Add support for new pass manager
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.

Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki

llvm-svn: 343662
2018-10-03 05:55:20 +00:00
Eric Christopher dcf1d97c5c Temporarily revert "[GVNHoist] Re-enable GVNHoist by default"
This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.

llvm-svn: 343522
2018-10-01 18:57:08 +00:00
Alexandros Lamprineas 8a1c374b2e [GVNHoist] Re-enable GVNHoist by default
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.

Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
  compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.

llvm-svn: 342387
2018-09-17 12:24:55 +00:00
Alexandros Lamprineas fe0512d575 Revert "[GVNHoist] Re-enable GVNHoist by default"
This reverts rL341954.

The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:

[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill

llvm-svn: 342001
2018-09-11 22:10:57 +00:00
Alexandros Lamprineas db18e972d7 [GVNHoist] Re-enable GVNHoist by default
Rebase rL340922 since https://bugs.llvm.org/show_bug.cgi?id=38807
has been fixed by rL341947.

llvm-svn: 341954
2018-09-11 15:55:45 +00:00
Hiroshi Yamauchi 9775a620b0 [PGO] Control Height Reduction
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.

if (hot_cond1) { // Likely true.
  do_stg_hot1();
}
if (hot_cond2) { // Likely true.
  do_stg_hot2();
}

->

if (hot_cond1 && hot_cond2) { // Hot path.
  do_stg_hot1();
  do_stg_hot2();
} else { // Cold path.
  if (hot_cond1) {
    do_stg_hot1();
  }
  if (hot_cond2) {
    do_stg_hot2();
  }
}

This speeds up some internal benchmarks up to ~30%.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D50591

llvm-svn: 341386
2018-09-04 17:19:13 +00:00
Alexandros Lamprineas f6db5bcd38 Revert r340922 "[GVNHoist] Re-enable GVNHoist by default"
Another sanitizer buildbot failed this time at bootstrap when
compiling SemaTemplateInstantiate.cpp with this assertion:
`dominates(MD, U) && "Memory Def does not dominate it's uses"'.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/15047

llvm-svn: 340925
2018-08-29 13:00:55 +00:00
Alexandros Lamprineas c03b9b8854 [GVNHoist] Re-enable GVNHoist by default
Rebase rL338240 since the excessive memory usage observed when using
GVNHoist with UBSan has been fixed by rL340818.

Differential Revision: https://reviews.llvm.org/D49858

llvm-svn: 340922
2018-08-29 11:58:34 +00:00
Vlad Tsyrklevich 1c7160e85f Revert "[GVNHoist] Re-enable GVNHoist by default"
This reverts commit r338240 because it was causing OOMs on the UBSan
buildbot when building clang/lib/Sema/SemaChecking.cpp

llvm-svn: 338297
2018-07-30 20:07:33 +00:00
Alexandros Lamprineas de3ca964c1 [GVNHoist] Re-enable GVNHoist by default
My initial motivation for this came from https://reviews.llvm.org/D48122,
where it was pointed out that my change didn't fit well in SimplifyCFG and
therefore using GVNHoist was a better way to go. GVNHoist has been disabled
for a while as there was a list of bugs related to it.

I have fixed the following bugs:

https://bugs.llvm.org/show_bug.cgi?id=37808 -> https://reviews.llvm.org/D48372 (rL337149)
https://bugs.llvm.org/show_bug.cgi?id=36787 -> https://reviews.llvm.org/D49555 (rL337674)
https://bugs.llvm.org/show_bug.cgi?id=37445 -> https://reviews.llvm.org/D49425 (rL337680)

The next two bugs no longer occur, and it's unclear which commit fixed them:

https://bugs.llvm.org/show_bug.cgi?id=36635
https://bugs.llvm.org/show_bug.cgi?id=37791

I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn:

https://bugs.llvm.org/show_bug.cgi?id=37660

To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM.
Merging this change now in order to make it to the LLVM 7.0.0 branch.

Differential Revision: https://reviews.llvm.org/D49858

llvm-svn: 338240
2018-07-30 10:50:18 +00:00
Teresa Johnson 28023dbed7 [ThinLTO] Enable ThinLTO WholeProgramDevirt and LowerTypeTests in new PM
Summary:
Enable these passes for CFI and WPD in ThinLTO and LTO with the new pass
manager. Add a couple of tests for both PMs based on the clang tests
tools/clang/test/CodeGen/thinlto-distributed-cfi*.ll, but just test
through llvm-lto2 and not with distributed ThinLTO.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49429

llvm-svn: 337461
2018-07-19 14:51:32 +00:00
Michael J. Spencer 7bb2767fba Recommit r335794 "Add support for generating a call graph profile from Branch Frequency Info." with fix for removed functions.
llvm-svn: 337140
2018-07-16 00:28:24 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Chandler Carruth 7c557f804d [instsimplify] Move the instsimplify pass to use more obvious file names
and diretory.

Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.

Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.

This is in preparation for doing some internal cleanups to the pass.

Differential Revision: https://reviews.llvm.org/D47352

llvm-svn: 336028
2018-06-29 23:36:03 +00:00
John Brawn bdbbd8381f Add a PhiValuesAnalysis pass to calculate the underlying values of phis
This pass is being added in order to make the information available to BasicAA,
which can't do caching of this information itself, but possibly this information
may be useful for other passes.

Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm.

Differential Revision: https://reviews.llvm.org/D47893

llvm-svn: 335857
2018-06-28 14:13:06 +00:00
Benjamin Kramer 269eb21e1c Revert "Add support for generating a call graph profile from Branch Frequency Info."
This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost.

llvm-svn: 335851
2018-06-28 13:15:03 +00:00
Michael J. Spencer 5bf1ead377 Add support for generating a call graph profile from Branch Frequency Info.
=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794
2018-06-27 23:58:08 +00:00
Chandler Carruth 71fd27043e [PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.

This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.

I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.

Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.

Differential Revision: https://reviews.llvm.org/D47408

llvm-svn: 333493
2018-05-30 02:46:45 +00:00
David Green aee7ad0cde Revert 333358 as it's failing on some builders.
I'm guessing the tests reply on the ARM backend being built.

llvm-svn: 333359
2018-05-27 12:54:33 +00:00
David Green 3034281b43 [UnrollAndJam] Add a new Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 333358
2018-05-27 12:11:21 +00:00
Chandler Carruth e6c30fdda7 Restore the LoopInstSimplify pass, reverting r327329 that removed it.
The plan had always been to move towards using this rather than so much
in-pass simplification within the loop pipeline, but we never got around
to it.... until only a couple months after it was removed due to disuse.
=/

This commit is just a pure revert of the removal. I will add tests and
do some basic cleanup in follow-up commits. Then I'll wire it into the
loop pass pipeline.

Differential Revision: https://reviews.llvm.org/D47353

llvm-svn: 333250
2018-05-25 01:32:36 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
David Blaikie 4fe1fe1418 Fix Layering, move instrumentation transform headers into Instrumentation subdirectory
llvm-svn: 328379
2018-03-23 22:11:06 +00:00
David Blaikie 301627f875 Move SampleProfile.h into IPO along with the rest of the IPO pass headers
llvm-svn: 328262
2018-03-22 22:42:44 +00:00
Fedor Sergeev 194a407bda [New PM][IRCE] port of Inductive Range Check Elimination pass to the new pass manager
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
  so the code to add new loops was factored out

* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
  within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
  sequence (e.g. LoopSimplify) might invalidate BPI results.

  Complete solution for BPI will likely take some time to discuss and figure out,
  so for now this was partially solved by making BPI optional in IRCE
  (skipping a couple of profitability checks if it is absent).

Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.

Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795

llvm-svn: 327619
2018-03-15 11:01:19 +00:00
Vedant Kumar 3a408538f0 Remove the LoopInstSimplify pass (-loop-instsimplify)
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.

It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.

Differential Revision: https://reviews.llvm.org/D44053

llvm-svn: 327329
2018-03-12 20:49:42 +00:00
Amjad Aboud f1f57a3137 Another try to commit 323321 (aggressive instruction combine).
llvm-svn: 323416
2018-01-25 12:06:32 +00:00
Amjad Aboud d53504e379 Reverted 323321.
llvm-svn: 323326
2018-01-24 14:48:49 +00:00
Amjad Aboud e4453233d7 [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine).
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.

For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.

It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.

Differential Revision: https://reviews.llvm.org/D38313

llvm-svn: 323321
2018-01-24 12:42:42 +00:00
Malcolm Parsons 21e545d08d Fix typos of occurred and occurrence
llvm-svn: 323318
2018-01-24 10:33:39 +00:00
David Blaikie 0c64f5a2eb NewPM: Add an extension point for the start of the pipeline.
This applies to most pipelines except the LTO and ThinLTO backend
actions - it is for use at the beginning of the overall pipeline.

This extension point will be used to add the GCOV pass when enabled in
Clang.

llvm-svn: 323166
2018-01-23 01:25:20 +00:00
Easwaran Raman bdf20261d8 Add a pass to generate synthetic function entry counts.
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.

The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)

Reviewers: davidxl, silvas

Subscribers: mgorny, mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D41604

llvm-svn: 322110
2018-01-09 19:39:35 +00:00
Fedor Sergeev 02e7f0247b [PM] pass -debug-pass-manager flag into FunctionToLoopPassAdaptor's canonicalization PM
Summary:
New pass manager driver passes DebugPM (-debug-pass-manager) flag into
individual PassManager constructors in order to enable debug logging.
FunctionToLoopPassAdaptor has its own internal LoopCanonicalizationPM
which never gets its debug logging enabled and that means canonicalization
passes like LoopSimplify are never present in -debug-pass-manager output.

Extending FunctionToLoopPassAdaptor's constructor and
createFunctionToLoopPassAdaptor wrapper with an optional
boolean DebugLogging argument.

Passing debug-logging flags there as appropriate.

Reviewers: chandlerc, davide

Reviewed By: davide

Subscribers: mehdi_amini, eraman, llvm-commits, JDevlieghere

Differential Revision: https://reviews.llvm.org/D41586

llvm-svn: 321548
2017-12-29 08:16:06 +00:00
Fedor Sergeev 4b86d79048 [PM] port Rewrite Statepoints For GC to the new pass manager.
Summary:
The port is nearly straightforward.
The only complication is related to the analyses handling,
since one of the analyses used in this module pass is domtree,
which is a function analysis. That requires asking for the results
of each function and disallows a single interface for run-on-module
pass action.

Decided to copy-paste the main body of this pass.
Most of its code is requesting analyses anyway, so not that much
of a copy-paste.

The rest of the code movement is to transform all the implementation
helper functions like stripNonValidData into non-member statics.

Extended all the related LLVM tests with new-pass-manager use.
No failures.

Reviewers: sanjoy, anna, reames

Reviewed By: anna

Subscribers: skatkov, llvm-commits

Differential Revision: https://reviews.llvm.org/D41162

llvm-svn: 320796
2017-12-15 09:32:11 +00:00
Sanjay Patel 0ab0c1a201 [SimplifyCFG] don't sink common insts too soon (PR34603)
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.

Differential Revision: https://reviews.llvm.org/D38566

llvm-svn: 320749
2017-12-14 22:05:20 +00:00
Michael Zolotukhin d8920b1c44 Remove redundant includes from various places.
llvm-svn: 320629
2017-12-13 21:31:03 +00:00
Chandler Carruth c34f789e38 Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.
The core idea is to (re-)introduce some redundancies where their cost is
hidden by the cost of materializing immediates for constant operands of
PHI nodes. When the cost of the redundancies is covered by this,
avoiding materializing the immediate has numerous benefits:
1) Less register pressure
2) Potential for further folding / combining
3) Potential for more efficient instructions due to immediate operand

As a motivating example, consider the remarkably different cost on x86
of a SHL instruction with an immediate operand versus a register
operand.

This pattern turns up surprisingly frequently, but is somewhat rarely
obvious as a significant performance problem.

The pass is entirely target independent, but it does rely on the target
cost model in TTI to decide when to speculate things around the PHI
node. I've included x86-focused tests, but any target that sets up its
immediate cost model should benefit from this pass.

There is probably more that can be done in this space, but the pass
as-is is enough to get some important performance on our internal
benchmarks, and should be generally performance neutral, but help with
more extensive benchmarking is always welcome.

One awkward part is that this pass has to be scheduled after
*everything* that can eliminate these kinds of redundancies. This
includes SimplifyCFG, GVN, etc. I'm open to suggestions about better
places to put this. We could in theory make it part of the codegen pass
pipeline, but there doesn't really seem to be a good reason for that --
it isn't "lowering" in any sense and only relies on pretty standard cost
model based TTI queries, so it seems to fit well with the "optimization"
pipeline model. Still, further thoughts on the pipeline position are
welcome.

I've also only implemented this in the new pass manager. If folks are
very interested, I can try to add it to the old PM as well, but I didn't
really see much point (my use case is already switched over to the new
PM).

I've tested this pretty heavily without issue. A wide range of
benchmarks internally show no change outside the noise, and I don't see
any significant changes in SPEC either. However, the size class
computation in tcmalloc is substantially improved by this, which turns
into a 2% to 4% win on the hottest path through tcmalloc for us, so
there are definitely important cases where this is going to make
a substantial difference.

Differential revision: https://reviews.llvm.org/D37467

llvm-svn: 319164
2017-11-28 11:32:31 +00:00
Sanjay Patel 3e29890a7f [(new) Pass Manager] instantiate SimplifyCFG with the same options as the old PM
This is a recommit of r316869 which was speculatively reverted with r317444 and 
subsequently shown to not be the cause of PR35210. That crash should be fixed
after r318237.

Original commit message:

The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.

FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.

Differential Revision: https://reviews.llvm.org/D39407

llvm-svn: 318299
2017-11-15 16:33:11 +00:00
Hans Wennborg e1ecd61b98 Rename CountingFunctionInserter and use for both mcount and cygprofile calls, before and after inlining
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.

This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)

LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.

Differential Revision: https://reviews.llvm.org/D39287

llvm-svn: 318195
2017-11-14 21:09:45 +00:00
Chandler Carruth 00a301d568 [PM] Port BoundsChecking to the new PM.
Registers it and everything, updates all the references, etc.

Next patch will add support to Clang's `-fexperimental-new-pass-manager`
path to actually enable BoundsChecking correctly.

Differential Revision: https://reviews.llvm.org/D39084

llvm-svn: 318128
2017-11-14 01:30:04 +00:00
David L. Jones 82b22e0327 [PassManager, SimplifyCFG] Revert r316908 and r316869.
These cause Clang to crash with a segfault. See PR35210 for details.

llvm-svn: 317444
2017-11-06 00:32:01 +00:00
Jun Bum Lim 0c99007db1 Recommit r317351 : Add CallSiteSplitting pass
This recommit r317351 after fixing a buildbot failure.

Original commit message:

    Summary:
    This change add a pass which tries to split a call-site to pass
    more constrained arguments if its argument is predicated in the control flow
    so that we can expose better context to the later passes (e.g, inliner, jump
    threading, or IPA-CP based function cloning, etc.).
    As of now we support two cases :

    1) If a call site is dominated by an OR condition and if any of its arguments
    are predicated on this OR condition, try to split the condition with more
    constrained arguments. For example, in the code below, we try to split the
    call site since we can predicate the argument (ptr) based on the OR condition.

    Split from :
          if (!ptr || c)
            callee(ptr);
    to :
          if (!ptr)
            callee(null ptr)  // set the known constant value
          else if (c)
            callee(nonnull ptr)  // set non-null attribute in the argument

    2) We can also split a call-site based on constant incoming values of a PHI
    For example,
    from :
          BB0:
           %c = icmp eq i32 %i1, %i2
           br i1 %c, label %BB2, label %BB1
          BB1:
           br label %BB2
          BB2:
           %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
           call void @bar(i32 %p)
    to
          BB0:
           %c = icmp eq i32 %i1, %i2
           br i1 %c, label %BB2-split0, label %BB1
          BB1:
           br label %BB2-split1
          BB2-split0:
           call void @bar(i32 0)
           br label %BB2
          BB2-split1:
           call void @bar(i32 1)
           br label %BB2
          BB2:
           %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

llvm-svn: 317362
2017-11-03 20:41:16 +00:00
Jun Bum Lim 0eb1c2d63a Revert "Add CallSiteSplitting pass"
Revert due to Buildbot failure.

This reverts commit r317351.

llvm-svn: 317353
2017-11-03 19:17:11 +00:00
Jun Bum Lim 2a58933519 Add CallSiteSplitting pass
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :

1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.

Split from :
      if (!ptr || c)
        callee(ptr);
to :
      if (!ptr)
        callee(null ptr)  // set the known constant value
      else if (c)
        callee(nonnull ptr)  // set non-null attribute in the argument

2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
      BB0:
       %c = icmp eq i32 %i1, %i2
       br i1 %c, label %BB2, label %BB1
      BB1:
       br label %BB2
      BB2:
       %p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
       call void @bar(i32 %p)
to
      BB0:
       %c = icmp eq i32 %i1, %i2
       br i1 %c, label %BB2-split0, label %BB1
      BB1:
       br label %BB2-split1
      BB2-split0:
       call void @bar(i32 0)
       br label %BB2
      BB2-split1:
       call void @bar(i32 1)
       br label %BB2
      BB2:
       %p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]

Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide

Reviewed By: davidxl

Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D39137

llvm-svn: 317351
2017-11-03 19:01:57 +00:00
Sanjay Patel adf38911d8 [(new) Pass Manager] instantiate SimplifyCFG with the same options as the old PM
The old PM sets the options of what used to be known as "latesimplifycfg" on the 
instantiation after the vectorizers have run, so that's what we'redoing here.

FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not 
set the "late" options. I'm not sure if that's intentional or not.

Differential Revision: https://reviews.llvm.org/D39407

llvm-svn: 316869
2017-10-29 20:49:31 +00:00
Matthew Simpson cb58558c2f Add CalledValuePropagation pass
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.

Differential Revision: https://reviews.llvm.org/D37355

llvm-svn: 316576
2017-10-25 13:40:08 +00:00
Rong Xu e1f4245f8d [PM] Add pgo-memop-opt pass to the new pass manager
This pass adds pgo-memop-opt pass to the new pass manager.
It is in the old pass manager but somehow left out in the new pass manager.

Differential Revision: http://reviews.llvm.org/D39145

llvm-svn: 316384
2017-10-23 22:21:29 +00:00
Adam Nemet 0965da2055 Rename OptimizationDiagnosticInfo.* to OptimizationRemarkEmitter.*
Sync it up with the name of the class actually defined here.  This has been
bothering me for a while...

llvm-svn: 315249
2017-10-09 23:19:02 +00:00
Davide Italiano e070721308 [NewPassManager] Run global dead code elimination after the inliner.
This is the same exact change we did for the current pass manager
in rL314997, but the new pass manager pipeline already happened
to run GlobalOpt after the inliner, so we just insert a run of
GDCE here.

llvm-svn: 315003
2017-10-05 18:36:01 +00:00
Dehao Chen d26dae0d34 Separate the logic when handling indirect calls in SamplePGO ThinLTO compile phase and other phases.
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.

Reviewers: tejohnson

Reviewed By: tejohnson

Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion

Differential Revision: https://reviews.llvm.org/D38094

llvm-svn: 314619
2017-10-01 05:24:51 +00:00
Sanjay Patel 6fd4391ddd [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 

llvm-svn: 312862
2017-09-09 13:38:18 +00:00
Chandler Carruth 19913b22c0 [PM] Switch the CGSCC debug messages to use the standard LLVM debug
printing techniques with a DEBUG_TYPE controlling them.

It was a mistake to start re-purposing the pass manager `DebugLogging`
variable for generic debug printing -- those logs are intended to be
very minimal and primarily used for testing. More detailed and
comprehensive logging doesn't make sense there (it would only make for
brittle tests).

Moreover, we kept forgetting to propagate the `DebugLogging` variable to
various places making it also ineffective and/or unavailable. Switching
to `DEBUG_TYPE` makes this a non-issue.

llvm-svn: 310695
2017-08-11 05:47:13 +00:00
Dehao Chen 2f4e2e2758 Revert part of r310296 to make it really NFC for instrumentation PGO.
Summary: Part of r310296 will disable PGOIndirectCallPromotion in ThinLTO backend if PGOOpt is None. However, as PGOOpt is not passed down to ThinLTO backend for instrumentation based PGO, that change would actually disable ICP entirely in ThinLTO backend, making it behave differently in instrumentation PGO mode. This change reverts that change, and only disable ICP there when it is SamplePGO.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36566

llvm-svn: 310550
2017-08-10 05:10:32 +00:00
Dehao Chen 08f8831e57 Move the SampleProfileLoader right after EarlyFPM.
Summary: SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.

Reviewers: chandlerc, davidxl, tejohnson

Reviewed By: chandlerc, tejohnson

Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36333

llvm-svn: 310296
2017-08-07 20:23:20 +00:00
Teresa Johnson ecd901314d [PM] Split LoopUnrollPass and make partial unroller a function pass
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.

*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.

Reviewers: chandlerc

Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36157

llvm-svn: 309886
2017-08-02 20:35:29 +00:00
Dehao Chen 89d3226019 Update the new PM pipeline to make ICP aware if it is SamplePGO build.
Summary: In ThinLTO backend compile, OPTOptions are not set so that the ICP in ThinLTO backend does not know if it is a SamplePGO build, in which profile count needs to be annotated directly on call instructions. This patch cleaned up the PGOOptions handling logic and passes down PGOOptions to ThinLTO backend.

Reviewers: chandlerc, tejohnson, davidxl

Reviewed By: chandlerc

Subscribers: sanjoy, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D36052

llvm-svn: 309780
2017-08-02 01:28:31 +00:00
Dehao Chen 95f003003d Refactor the build{Module|Function}SimplificationPipeline to expose optimization phase.
Summary: This is in preparation of https://reviews.llvm.org/D36052

Reviewers: chandlerc, davidxl, tejohnson

Reviewed By: chandlerc

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D36053

llvm-svn: 309500
2017-07-30 04:55:39 +00:00
Dehao Chen ce0842ce9c Refine the PGOOpt and SamplePGOSupport handling.
Summary:
Now that SamplePGOSupport is part of PGOOpt, there are several places that need tweaking:
1. AddDiscriminator pass should *not* be invoked at ThinLTOBackend (as it's already invoked in the PreLink phase)
2. addPGOInstrPasses should only be invoked when either ProfileGenFile or ProfileUseFile is non-empty.
3. SampleProfileLoaderPass should only be invoked when SampleProfileFile is non-empty.
4. PGOIndirectCallPromotion should only be invoked in ProfileUse phase, or in ThinLTOBackend of SamplePGO.

Reviewers: chandlerc, tejohnson, davidxl

Reviewed By: chandlerc

Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36040

llvm-svn: 309478
2017-07-29 04:10:24 +00:00
Dehao Chen 641f387cd0 Update the assertion to meet with the changes in r309121. (NFC)
llvm-svn: 309125
2017-07-26 15:47:00 +00:00
Dehao Chen e90d0153ca Make new PM honor -fdebug-info-for-profiling
Summary: The new PM needs to invoke add-discriminator pass when building with -fdebug-info-for-profiling.

Reviewers: chandlerc, davidxl

Reviewed By: chandlerc

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D35744

llvm-svn: 309121
2017-07-26 15:01:20 +00:00
Philip Pfaffe 730f2f9bb6 [PM] Enable registration of out-of-tree passes with PassBuilder
Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.

Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.

Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.

Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.

Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D33464

llvm-svn: 307532
2017-07-10 10:57:55 +00:00
Dehao Chen 3a9861420c Add sample PGO support to ThinLTO new pass manager.
Summary:
For SamplePGO + ThinLTO, because profile annotation is done twice at both PrepareForThinLTO pipeline and backend compiler, the following changes are needed at the PrepareForThinLTO phase to ensure the IR is not changed dramatically. Otherwise the profile annotation will be inaccurate in the backend compiler.

* disable hot-caller heuristic
* disable loop unrolling
* disable indirect call promotion

This will unblock the new PM testing for sample PGO (tools/clang/test/CodeGen/pgo-sample-thinlto-summary.c), which will be covered in another cfe patch.

Reviewers: chandlerc, tejohnson, davidxl

Reviewed By: tejohnson

Subscribers: sanjoy, mehdi_amini, Prazek, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D34895

llvm-svn: 307437
2017-07-07 20:53:10 +00:00
Dehao Chen 2f31d0d86e Hook the sample PGO machinery in the new PM
Summary: This patch hooks up SampleProfileLoaderPass with the new PM.

Reviewers: chandlerc, davidxl, davide, tejohnson

Reviewed By: chandlerc, tejohnson

Subscribers: tejohnson, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D34720

llvm-svn: 306763
2017-06-29 23:33:05 +00:00
Tim Shen 664706916b [ThinkLTO] Invoke build(Thin)?LTOPreLinkDefaultPipeline.
Previously it doesn't actually invoke the designated new PM builder
functions.

This patch moves NameAnonGlobalPass out from PassBuilder, as Chandler
points out that PassBuilder is used for non-O0 builds, and for
optimizations only.

Differential Revision: https://reviews.llvm.org/D34728

llvm-svn: 306756
2017-06-29 23:08:38 +00:00
Easwaran Raman 8249fac52d Create inliner params based on size and opt levels.
Differential revision: https://reviews.llvm.org/D34309

llvm-svn: 306542
2017-06-28 13:33:49 +00:00
Geoff Berry 2573a19fe6 [EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of EarlyCSE.
llvm-svn: 306477
2017-06-27 22:25:02 +00:00
Xinliang David Li b67530e9b9 [PGO] Implementate profile counter regiser promotion
Differential Revision: http://reviews.llvm.org/D34085

llvm-svn: 306231
2017-06-25 00:26:43 +00:00
Eric Christopher 5a7c2f1700 Remove the LoadCombine pass. It was never enabled and is unsupported.
Based on discussions with the author on mailing lists.

llvm-svn: 306067
2017-06-22 22:58:12 +00:00
Geoff Berry 3cca1da20c [EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default).
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline.  We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.

This is turned off by default.  A follow-up change will turn it on to
allow for easier reversion in case it breaks something.

llvm-svn: 305146
2017-06-10 15:20:03 +00:00
Davide Italiano be1b6a963e [PM] Add GVNSink to the pipeline.
With this, the two pipelines should be in sync again (modulo
LoopUnswitch, but Chandler is actively working on that).

Differential Revision:  https://reviews.llvm.org/D33810

llvm-svn: 304671
2017-06-03 23:18:29 +00:00
Davide Italiano c368831580 Move GVNHoist to the right position in the new pass manager pipeline.
GVNHoist was moved as part of simplification passes for the current
pass manager (but not for the new), so they're out-of-sync.

Differential Revision:  https://reviews.llvm.org/D33806

llvm-svn: 304490
2017-06-01 23:08:14 +00:00
Chandler Carruth 8b3be4e59d [PM/ThinLTO] Port the ThinLTO pipeline (both components) to the new PM.
Based on the original patch by Davide, but I've adjusted the API exposed
to just be different entry points rather than exposing more state
parameters. I've factored all the common logic out so that we don't have
any duplicate pipelines, we just stitch them together in different ways.
I think this makes the build easier to reason about and understand.

This adds a direct method for getting the module simplification pipeline
as well as a method to get the optimization pipeline. While not my
express goal, this seems nice and gives a good place comment about the
restrictions that are imposed on them.

I did make some minor changes to the way the pipelines are structured
here, but hopefully not ones that are significant or controversial:

1) I sunk the PGO indirect call promotion to only be run when we have
   PGO enabled (or as part of the special ThinLTO pipeline).

2) I made the extra GlobalOpt run in ThinLTO just happen all the time
   and at a slightly more powerful place (before we remove available
   externaly functions). This seems like general goodness and not a big
   compile time sink, so it didn't make sense to *only* use it in
   ThinLTO. Fewer differences in the pipeline makes everything simpler
   IMO.

3) I hoisted the ThinLTO stop point pre-link above the the RPO function
   attr inference. The RPO inference won't infer anything terribly
   meaningful pre-link (recursiveness?) so it didn't make a lot of
   sense. But if the placement of RPO inference starts to matter, we
   should move it to the canonicalization phase anyways which seems like
   a better place for it (and there is a FIXME to this effect!). But
   that seemed a bridge too far for this patch.

If we ever need to parameterize these pipelines more heavily, we can
always sink the logic to helper functions with parameters to keep those
parameters out of the public API. But the changes above seemed minor
that we could possible get away without the parameters entirely.

I added support for parsing 'thinlto' and 'thinlto-pre-link' names in
pass pipelines to make it easy to test these routines and play with them
in larger pipelines. I also added a really basic manifest of passes test
that will show exactly how the pipelines behave and work as well as
making updates to them clear.

Lastly, this factoring does introduce a nesting layer of module pass
managers in the default pipeline. I don't think this is a big deal and
the flexibility of decoupling the pipelines seems easily worth it.

Differential Revision: https://reviews.llvm.org/D33540

llvm-svn: 304407
2017-06-01 11:39:39 +00:00
Chandler Carruth 86248d5632 [PM] Enable the new simple loop unswitch pass in the new pass manager
(where it is the only realistic option).

This passes the LLVM test suite for me, but I'm clearly still hammering
on this.

llvm-svn: 303952
2017-05-26 01:24:11 +00:00
Chandler Carruth f4d62c480c [PM] Teach the PGO instrumentation pasess to run GlobalDCE before
instrumenting code.

This is important in the new pass manager. The old pass manager's
inliner has a small DCE routine embedded within it. The new pass manager
relies on the actual GlobalDCE pass for this.

Without this patch, instrumentation profiling with the new PM results in
massive code bloat in the object files because the instrumentation
itself ends up preventing DCE from working to remove the code.

We should probably change the instrumentation (and/or DCE) so that we
can eliminate dead code even if instrumented, but we shouldn't even
spend the time generating instrumentation for that code so this still
seems like a good patch.

Differential Revision: https://reviews.llvm.org/D33535

llvm-svn: 303845
2017-05-25 07:15:09 +00:00
Davide Italiano 8e7d11ab2b [NewPM] Fix an innocent but silly typo. Reported by Craig Topper.
llvm-svn: 303587
2017-05-22 23:47:11 +00:00
Davide Italiano 8a09b8eba9 [NewPM] Add a temporary cl::opt() to test NewGVN.
llvm-svn: 303586
2017-05-22 23:41:40 +00:00
Xinliang David Li 126157c3b4 [PartialInlining] Add internal options to enable partial inlining in pass pipeline (off by default)
1. Legacy: -mllvm -enable-partial-inlining
2. New:  -mllvm -enable-npm-partial-inlining -fexperimental-new-pass-manager

Differential Revision: http://reviews.llvm.org/D33382

llvm-svn: 303567
2017-05-22 16:41:57 +00:00
Easwaran Raman 5e6f9bd4f8 [PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline.
Differential revision: https://reviews.llvm.org/D32768

llvm-svn: 302170
2017-05-04 16:58:45 +00:00
Chandler Carruth 1353f9a48b [PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass.
Currently, this pass only focuses on *trivial* loop unswitching. At that
reduced problem it remains significantly better than the current loop
unswitch:
- Old pass is worse than cubic complexity. New pass is (I think) linear.
- New pass is much simpler in its design by focusing on full unswitching. (See
  below for details on this).
- New pass doesn't carry state for thresholds between pass iterations.
- New pass doesn't carry state for correctness (both miscompile and
  infloop) between pass iterations.
- New pass produces substantially better code after unswitching.
- New pass can handle more trivial unswitch cases.
- New pass doesn't recompute the dominator tree for the entire function
  and instead incrementally updates it.

I've ported all of the trivial unswitching test cases from the old pass
to the new one to make sure that major functionality isn't lost in the
process. For several of the test cases I've worked to improve the
precision and rigor of the CHECKs, but for many I've just updated them
to handle the new IR produced.

My initial motivation was the fact that the old pass carried state in
very unreliable ways between pass iterations, and these mechansims were
incompatible with the new pass manager. However, I discovered many more
improvements to make along the way.

This pass makes two very significant assumptions that enable most of these
improvements:

1) Focus on *full* unswitching -- that is, completely removing whatever
   control flow construct is being unswitched from the loop. In the case
   of trivial unswitching, this means removing the trivial (exiting)
   edge. In non-trivial unswitching, this means removing the branch or
   switch itself. This is in opposition to *partial* unswitching where
   some part of the unswitched control flow remains in the loop. Partial
   unswitching only really applies to switches and to folded branches.
   These are very similar to full unrolling and partial unrolling. The
   full form is an effective canonicalization, the partial form needs
   a complex cost model, cannot be iterated, isn't canonicalizing, and
   should be a separate pass that runs very late (much like unrolling).

2) Leverage LLVM's Loop machinery to the fullest. The original unswitch
   dates from a time when a great deal of LLVM's loop infrastructure was
   missing, ineffective, and/or unreliable. As a consequence, a lot of
   complexity was added which we no longer need.

With these two overarching principles, I think we can build a fast and
effective unswitcher that fits in well in the new PM and in the
canonicalization pipeline. Some of the remaining functionality around
partial unswitching may not be relevant today (not many test cases or
benchmarks I can find) but if they are I'd like to add support for them
as a separate layer that runs very late in the pipeline.

Purely to make reviewing and introducing this code more manageable, I've
split this into first a trivial-unswitch-only pass and in the next patch
I'll add support for full non-trivial unswitching against a *fixed*
threshold, exactly like full unrolling. I even plan to re-use the
unrolling thresholds, as these are incredibly similar cost tradeoffs:
we're cloning a loop body in order to end up with simplified control
flow. We should only do that when the total growth is reasonably small.

One of the biggest changes with this pass compared to the previous one
is that previously, each individual trivial exiting edge from a switch
was unswitched separately as a branch. Now, we unswitch the entire
switch at once, with cases going to the various destinations. This lets
us unswitch multiple exiting edges in a single operation and also avoids
numerous extremely bad behaviors, where we would introduce 1000s of
branches to test for thousands of possible values, all of which would
take the exact same exit path bypassing the loop. Now we will use
a switch with 1000s of cases that can be efficiently lowered into
a jumptable. This avoids relying on somehow forming a switch out of the
branches or getting horrible code if that fails for any reason.

Another significant change is that this pass actively updates the CFG
based on unswitching. For trivial unswitching, this is actually very
easy because of the definition of loop simplified form. Doing this makes
the code coming out of loop unswitch dramatically more friendly. We
still should run loop-simplifycfg (at the least) after this to clean up,
but it will have to do a lot less work.

Finally, this pass makes much fewer attempts to simplify instructions
based on the unswitch. Something like loop-instsimplify, instcombine, or
GVN can be used to do increasingly powerful simplifications based on the
now dominating predicate. The old simplifications are things that
something like loop-instsimplify should get today or a very, very basic
loop-instcombine could get. Keeping that logic separate is a big
simplifying technique.

Most of the code in this pass that isn't in the old one has to do with
achieving specific goals:
- Updating the dominator tree as we go
- Unswitching all cases in a switch in a single step.

I think it is still shorter than just the trivial unswitching code in
the old pass despite having this functionality.

Differential Revision: https://reviews.llvm.org/D32409

llvm-svn: 301576
2017-04-27 18:45:20 +00:00
Chandler Carruth c246a4c973 Disable GVN Hoist due to still more bugs being found in it. There is
also a discussion about exactly what we should do prior to re-enabling
it.

The current bug is http://llvm.org/PR32821 and the discussion about this
is in the review thread for r300200.

llvm-svn: 301505
2017-04-27 00:28:03 +00:00
Filipe Cabecinhas 92dc348773 Simplify the CFG after loop pass cleanup.
Summary:
Otherwise we might end up with some empty basic blocks or
single-entry-single-exit basic blocks.

This fixes PR32085

Reviewers: chandlerc, danielcdh

Subscribers: mehdi_amini, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D30468

llvm-svn: 301395
2017-04-26 12:02:41 +00:00
Daniel Berlin 554dcd8c89 MemorySSA: Move to Analysis, from Transforms/Utils. It's used as
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.

llvm-svn: 299980
2017-04-11 20:06:36 +00:00
Dehao Chen cc75d2441d Add call branch annotation for ICP promoted direct call in SamplePGO mode.
Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.

Reviewers: davidxl, eraman, xur

Reviewed By: davidxl, xur

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30282

llvm-svn: 296028
2017-02-23 22:15:18 +00:00
Dehao Chen 7d230325ef Increases full-unroll threshold.
Summary:
The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403	0.11%
433	0.51%
445	0.48%
447	3.50%
453	1.49%
464	0.75%

Code size:

403	0.56%
433	0.96%
445	2.16%
447	2.96%
453	0.94%
464	8.02%

The compiler time overhead is similar with code size.

Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc

Reviewed By: hfinkel, chandlerc

Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D28368

llvm-svn: 295538
2017-02-18 03:46:51 +00:00