Commit Graph

20356 Commits

Author SHA1 Message Date
Diego Caballero d09530144a [VPlan][LV] Introduce condition bit in VPBlockBase
This patch introduces a VPValue in VPBlockBase to represent the condition
bit that is used as successor selector when a block has multiple successors.
This information wasn't necessary until now, when we are about to introduce
outer loop vectorization support in VPlan code gen.

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48814

llvm-svn: 336554
2018-07-09 15:57:09 +00:00
Xin Tong b467233d8b [CVP] Handle calls with void return value. No need to create CVPLattice state for it.
Summary:
Tests: 10
Metric: compile_time

Program                                         unpatch-result  patch-result diff

Bullet/bullet                                  32.39           30.54        -5.7%
SPASS/SPASS                                    18.14           17.25        -4.9%
mafft/pairlocalalign                           12.10           11.64        -3.8%
ClamAV/clamscan                                19.21           19.63         2.2%
7zip/7zip-benchmark                            49.55           48.85        -1.4%
kimwitu++/kc                                   15.68           15.87         1.2%
lencod/lencod                                  21.13           21.34         1.0%
consumer-typeset/consumer-typeset              13.65           13.62        -0.2%
tramp3d-v4/tramp3d-v4                          29.88           29.92         0.1%
sqlite3/sqlite3                                18.48           18.46        -0.1%
       unpatch-result  patch-result       diff
count  10.000000       10.000000     10.000000
mean   23.022000       22.712400    -0.011671
std    11.362831       11.094183     0.027338
min    12.104000       11.640000    -0.057298
25%    16.299000       16.214000    -0.032282
50%    18.844000       19.048000    -0.001350
75%    27.689000       27.774000     0.007752
max    49.552000       48.852000     0.021861

I also tested only this pass by concatenating all the code from the
llvm/lib/Analysis/ folder and do clang -g followed by opt. I get close to 20% speedup
for the pass. I expect a majority of the gain come from skipping the dbg intrinsics.

Before patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
 ... Pass execution timing report ...
===-------------------------------------------------------------------------===
 Total Execution Time: 3.8303 seconds (3.8279 wall clock)

 ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
 2.0768 ( 57.3%) 0.0990 ( 48.0%) 2.1757 ( 56.8%) 2.1757 ( 56.8%) Bitcode
Writer
 0.8444 ( 23.3%) 0.0600 ( 29.1%) 0.9044 ( 23.6%) 0.9044 ( 23.6%) Called
Value Propagation
 0.7031 ( 19.4%) 0.0472 ( 22.9%) 0.7502 ( 19.6%) 0.7478 ( 19.5%) Module
Verifier
 3.6242 (100.0%) 0.2062 (100.0%) 3.8303 (100.0%) 3.8279 (100.0%) Total

After patch (opt -time-passes -called-value-propagation):
============
===-------------------------------------------------------------------------===
 ... Pass execution timing report ...
===-------------------------------------------------------------------------===
 Total Execution Time: 3.6605 seconds (3.6579 wall clock)

 ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
 2.0716 ( 59.7%) 0.0990 ( 52.5%) 2.1705 ( 59.3%) 2.1706 ( 59.3%) Bitcode
Writer
 0.7144 ( 20.6%) 0.0300 ( 15.9%) 0.7444 ( 20.3%) 0.7444 ( 20.4%) Called
Value Propagation
 0.6859 ( 19.8%) 0.0596 ( 31.6%) 0.7455 ( 20.4%) 0.7429 ( 20.3%) Module
Verifier
 3.4719 (100.0%) 0.1886 (100.0%) 3.6605 (100.0%) 3.6579 (100.0%) Total

Reviewers: davide, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49078

llvm-svn: 336551
2018-07-09 14:53:37 +00:00
Sanjay Patel 5bd36644c8 [InstCombine] fix shuffle-of-binops transform to avoid poison/undef
As noted in D48987, there are many different ways for this transform to go wrong. 
In particular, the poison potential for shifts means we have to more careful with those ops. 
I added tests to make that behavior visible for all of the different cases that I could find.

This is a partial fix. To make this review easier, I did not make changes for the single binop 
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations 
noted with TODO comments. I'll follow-up once we're confident that things are correct here.

The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.

Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows 
for some improvements to div/rem patterns, so there are wins along with the missed opportunities 
and fixes.

Differential Revision: https://reviews.llvm.org/D49047

llvm-svn: 336546
2018-07-09 13:21:46 +00:00
Chandler Carruth ed2965438e [PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in
r335553 with the non-trivial unswitching of switches.

The code correctly updated most aspects of the CFG and analyses, but
missed some crucial aspects:
1) When multiple cases have the same successor, we unswitch that
   a single time and replace the switch with a direct branch. The CFG
   here is correct, but the target of this direct branch may have had
   a PHI node with multiple entries in it.
2) When we still have to clone a successor of the switch into an
   unswitched copy of the loop, we'll delete potentially multiple edges
   entering this successor, not just one.
3) We also have to delete multiple edges entering the successors in the
   original loop when they have to be retained.
4) When the "retained successor" *also* occurs as a case successor, we
   just assert failed everywhere. This doesn't happen very easily
   because its always valid to simply drop the case -- the retained
   successor for switches is always the default successor. However, it
   is likely possible through some contrivance of different loop passes,
   unrolling, and simplifying for this to occur in practice and
   certainly there is nothing "invalid" about the IR so this pass needs
   to handle it.
5) In the case of #4, we also will replace these multiple edges with
   a direct branch much like in #1 and need to collapse the entries in
   any PHI nodes to a single enrty.

All of this stems from the delightful fact that the same successor can
show up in multiple parts of the switch terminator, and each of these
are considered a distinct edge for the purpose of PHI nodes (and
iterating the successors and predecessors) but not for unswitching
itself, the dominator tree, or many other things. For the record,
I intensely dislike this "feature" of the IR in large part because of
the complexity it causes in passes like this. We already have a ton of
logic building sets and handling duplicates, and we just had to add
a bunch more.

I've added a complex test case that covers all five of the above failure
modes. I've also added a variation on it where #4 and #5 occur in loop
exit, adding fun where we have an LCSSA PHI node with "multiple entries"
despite have dedicated exits. There were no additional issues found by
this, but it seems a useful corner case to cover with testing.

One thing that working on all of this code has made painfully clear for
me as well is how amazingly inefficient our PHI node representation is
(in terms of the in-memory data structures and the APIs used to update
them). This code has truly marvelous complexity bounds because every
time we remove an entry from a PHI node we do a linear scan to find it
and then a linear update to the data structure to remove it. We could in
theory batch all of the PHI node updates into a single linear walk of
the operands making this much more efficient, but the APIs fight hard
against this and the fact that we have to handle duplicates in the
peculiar manner we do (removing all but one in some cases) makes even
implementing that very tedious and annoying. Anyways, none of this is
new here or specific to loop unswitching. All code in LLVM that updates
PHI node operands suffers from these problems.

llvm-svn: 336536
2018-07-09 10:30:48 +00:00
Chijun Sima 9e1e0c7b2a [PGOMemOPSize] Preserve the DominatorTree
Summary:
PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort.
When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called.

Reviewers: kuhar, davide, dmgreen

Reviewed By: dmgreen

Subscribers: mzolotukhin, vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D48914

llvm-svn: 336522
2018-07-09 08:07:21 +00:00
Craig Topper 2835278ee0 [LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ.
In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added.

This supports creating ctlz from the following code.

int lzcnt(int x) {
     int count = 0;
     while (x > 0)  {
          count++;
          x = x >> 1;
     }
    return count;
}

Patch by Olga Moldovanova

Differential Revision: https://reviews.llvm.org/D48354

llvm-svn: 336509
2018-07-08 01:45:47 +00:00
Chandler Carruth d8b0c8ce1b [PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure
after trivial unswitching.

This PR illustrates that a fundamental analysis update was not performed
with the new loop unswitch. This update is also somewhat fundamental to
the core idea of the new loop unswitch -- we actually *update* the CFG
based on the unswitching. In order to do that, we need to update the
loop nest in addition to the domtree.

For some reason, when writing trivial unswitching, I thought that the
loop nest structure cannot be changed by the transformation. But the PR
helps illustrate that it clearly can. I've expanded this to a number of
different test cases that try to cover the different cases of this. When
we unswitch, we move an exit edge of a loop out of the loop. If this
exit edge changes which loop reached by an exit is the innermost loop,
it changes the parent of the loop. Essentially, this transformation may
hoist the inner loop up the nest. I've added the simple logic to handle
this reliably in the trivial unswitching case. This just requires
updating LoopInfo and rebuilding LCSSA on the impacted loops. In the
trivial case, we don't even need to handle dedicated exits because we're
only hoisting the one loop and we just split its preheader.

I've also ported all of these tests to non-trivial unswitching and
verified that the logic already there correctly handles the loop nest
updates necessary.

Differential Revision: https://reviews.llvm.org/D48851

llvm-svn: 336477
2018-07-07 01:12:56 +00:00
Vedant Kumar b3091da3af Use Type::isIntOrPtrTy where possible, NFC
It's a bit neater to write T.isIntOrPtrTy() over `T.isIntegerTy() ||
T.isPointerTy()`.

I used Python's re.sub with this regex to update users:

  r'([\w.\->()]+)isIntegerTy\(\)\s*\|\|\s*\1isPointerTy\(\)'

llvm-svn: 336462
2018-07-06 20:17:42 +00:00
Vedant Kumar 6379a62250 [Local] replaceAllDbgUsesWith: Update debug values before RAUW
The replaceAllDbgUsesWith utility helps passes preserve debug info when
replacing one value with another.

This improves upon the existing insertReplacementDbgValues API by:

- Updating debug intrinsics in-place, while preventing use-before-def of
  the replacement value.
- Falling back to salvageDebugInfo when a replacement can't be made.
- Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into
  common utility code.

Along with the API change, this teaches replaceAllDbgUsesWith how to
create DIExpressions for three basic integer and pointer conversions:

- The no-op conversion. Applies when the values have the same width, or
  have bit-for-bit compatible pointer representations.
- Truncation. Applies when the new value is wider than the old one.
- Zero/sign extension. Applies when the new value is narrower than the
  old one.

Testing:

- check-llvm, check-clang, a stage2 `-g -O3` build of clang,
  regression/unit testing.
- This resolves a number of mis-sized dbg.value diagnostics from
  Debugify.

Differential Revision: https://reviews.llvm.org/D48676

llvm-svn: 336451
2018-07-06 17:32:39 +00:00
Benjamin Kramer 3687ac52a9 [LoopSink] Make the enforcement of determinism deterministic.
LoopBlockNumber is a DenseMap<BasicBlock*, int>, comparing the result of
find() will compare a pair<BasicBlock*, int>. That's of course depending
on pointer ordering which varies from run to run. Reverse iteration
doesn't find this because we're copying to a vector first.

This bug has been there since 2016 but only recently showed up on clang
selfhost with FDO and ThinLTO, which is also why I didn't manage to get
a reasonable test case for this. Add an assert that would've caught
this.

llvm-svn: 336439
2018-07-06 14:20:58 +00:00
Max Kazantsev 20da7e467a Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done"
llvm-svn: 336410
2018-07-06 04:04:13 +00:00
Michael Zolotukhin a5f2c52a1e Revert r332168: "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading.""
There were a couple of issues reported (PR38047, PR37929) - I'll reland
the patch when I figure out and fix the rootcause.

llvm-svn: 336393
2018-07-05 22:10:31 +00:00
Matt Arsenault 24ce89b717 Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks
were moved before the canonicalization to provide
a better mapping from fclamp. Changed the behaviour
of fmed3(x,y,NaN) to return max(x,y) instead of
min(x,y) in light of this. Updated tests as a result
and added some new cases to cover the fix.

Patch by Alan Baker

llvm-svn: 336375
2018-07-05 17:05:36 +00:00
Simon Pilgrim dafd828c97 [SLPVectorizer] Begin abstracting InstructionsState alternate matching away from opcodes. NFCI.
This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism.

Differential Revision: https://reviews.llvm.org/D48945

llvm-svn: 336344
2018-07-05 12:30:44 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Sanjay Patel 9c2e7ceb1a [InstCombine] allow narrowing of min/max/abs
We have bailout hacks based on min/max in various places in instcombine 
that shouldn't be necessary. The affected test was added for:
D48930 
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)

I'm assuming the visitTrunc bailout in this patch was added specifically 
to avoid a change from SimplifyDemandedBits, so I'm just moving that 
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.

llvm-svn: 336293
2018-07-04 17:44:04 +00:00
Simon Pilgrim ae1c4dcc6e Fix some irregular whitespace/indentation. NFCI.
llvm-svn: 336291
2018-07-04 17:24:05 +00:00
Anastasis Grammenos 204726b345 [DebugInfo][LoopVectorize] Preserve DL in generated phi instruction
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.

Differential Revision: https://reviews.llvm.org/D48769

llvm-svn: 336256
2018-07-04 10:16:55 +00:00
Anastasis Grammenos 509d79789f [DebugInfo][InstCombine] Preserve DI after combining zext
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)

Differential Revision: https://reviews.llvm.org/D48331

llvm-svn: 336254
2018-07-04 09:55:46 +00:00
Sanjay Patel 3074b9e53f [InstCombine] fold shuffle-with-binop and common value
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments 
in this patch to complete this transform.

It's possible that a binop feeding a select-shuffle has been eliminated 
by earlier transforms (or the code was just written like this in the 1st 
place), so we'll fail to match the patterns that have 2 binops from: 
D48401, 
D48678, 
D48662, 
D48485.

In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass 
through the original values of the source operand).

I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. 
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).

Differential Revision: https://reviews.llvm.org/D48830

llvm-svn: 336196
2018-07-03 13:44:22 +00:00
Bjorn Pettersson 8dd6cf711f [DebugInfo] Corrections for salvageDebugInfo
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).

Consider this example
  %vla = alloca i32, i64 2
  call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())

Instcombine will turn it into
  %vla1 = alloca [2 x i32]
  %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())

If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
  %vla1 = alloca [2 x i32]
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())

The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))

I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).

Reviewers: aprantl, vsk

Reviewed By: aprantl, vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48837

llvm-svn: 336191
2018-07-03 11:29:00 +00:00
Chandler Carruth 3897ded691 [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when
unswitching loops.

Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:

1) We must invalidate any info SCEV has cached before unswitching as we
   may change (or destroy) the loop structure by the act of unswitching,
   and make it hard to recover everything we want to invalidate within
   SCEV.

2) We need to invalidate all of the loops whose CFGs are mutated by the
   unswitching. Notably, this isn't the *entire* loop nest, this is
   every loop contained by the outermost loop reached by an exit block
   relevant to the unswitch.

And we need to do this even when doing trivial unswitching.

I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.

However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.

Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.

Differential Revision: https://reviews.llvm.org/D47624

llvm-svn: 336183
2018-07-03 09:13:27 +00:00
Max Kazantsev 3097b76e8c [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.

Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri

llvm-svn: 336172
2018-07-03 06:23:57 +00:00
Alina Sbirlea 0e15501fa7 Replace "Replacable" with "Replaceable". [NFC]
llvm-svn: 336133
2018-07-02 18:53:40 +00:00
Farhana Aleen 3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Sanjay Patel b999d74132 [InstCombine] reverse canonicalization of add --> or to allow more shuffle folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)

Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving 
to where other passes could use that functionality.

Differential Revision: https://reviews.llvm.org/D48662

llvm-svn: 336128
2018-07-02 17:42:29 +00:00
Simon Pilgrim d5fb50e3bf [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102
2018-07-02 13:41:29 +00:00
Florian Hahn 4ebba909a2 Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098
2018-07-02 12:44:04 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Simon Pilgrim 409bd5f487 [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092
2018-07-02 10:54:19 +00:00
Simon Pilgrim 3dafb553d9 [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI.
llvm-svn: 336069
2018-07-01 20:22:46 +00:00
Simon Pilgrim ef9c97c343 [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI.
This is a basic step towards matching more general instructions types than just opcodes.

llvm-svn: 336068
2018-07-01 20:07:30 +00:00
Simon Pilgrim 77d2067677 [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.
llvm-svn: 336063
2018-07-01 13:41:58 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Eugene Leviant 6e4134459b [Evaluator] Improve evaluation of call instruction
Recommit of r335324 after buildbot failure fix

llvm-svn: 336059
2018-07-01 11:02:07 +00:00
Chandler Carruth 7c557f804d [instsimplify] Move the instsimplify pass to use more obvious file names
and diretory.

Also cleans up all the associated naming to be consistent and removes
the public access to the pass ID which was unused in LLVM.

Also runs clang-format over parts that changed, which generally cleans
up a bunch of formatting.

This is in preparation for doing some internal cleanups to the pass.

Differential Revision: https://reviews.llvm.org/D47352

llvm-svn: 336028
2018-06-29 23:36:03 +00:00
Alex Shlyapnikov 788764ca12 [HWASan] Do not retag allocas before return from the function.
Summary:
Retagging allocas before returning from the function might help
detecting use after return bugs, but it does not work at all in real
life, when instrumented and non-instrumented code is intermixed.
Consider the following code:

F_non_instrumented() {
  T x;
  F1_instrumented(&x);
  ...
}

{
  F_instrumented();
  F_non_instrumented();
}

- F_instrumented call leaves the stack below the current sp tagged
  randomly for UAR detection
- F_non_instrumented allocates its own vars on that tagged stack,
  not generating any tags, that is the address of x has tag 0, but the
  shadow memory still contains tags left behind by F_instrumented on the
  previous step
- F1_instrumented verifies &x before using it and traps on tag mismatch,
  0 vs whatever tag was set by F_instrumented

Reviewers: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D48664

llvm-svn: 336011
2018-06-29 20:20:17 +00:00
Sean Fertile cd0d7634f6 Revert "Extend CFGPrinter and CallPrinter with Heat Colors"
This reverts r335996 which broke graph printing in Polly.

llvm-svn: 336000
2018-06-29 17:48:58 +00:00
Sean Fertile 3b0535b424 Extend CFGPrinter and CallPrinter with Heat Colors
Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or
profiling information. The colors are enabled by default and can be toggled
on/off for CFGPrinter by using the option -cfg-heat-colors for both
-dot-cfg[-only] and -view-cfg[-only].  Similarly, the colors can be toggled
on/off for CallPrinter by using the option -callgraph-heat-colors for both
-dot-callgraph and -view-callgraph.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D40425

llvm-svn: 335996
2018-06-29 17:13:58 +00:00
Sanjay Patel da66753e01 [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have 2 different variable values, then we shuffle (select) those lanes, 
shuffle (select) the constants, and then perform the binop. This eliminates a binop.

The new shuffle uses the same shuffle mask as the existing shuffle, so there's no 
danger of creating a difficult shuffle.

All of the earlier constraints still apply, but we also check for extra uses to 
avoid creating more instructions than we'll remove.

Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.

Differential Revision: https://reviews.llvm.org/D48678

llvm-svn: 335974
2018-06-29 13:44:06 +00:00
Sanjay Patel d512853aa3 [InstCombine] fix opcode check in shuffle fold
There's no way to expose this difference currently, 
but we should use the updated variable because the
original opcodes can go stale if we transform into
something new.

llvm-svn: 335920
2018-06-28 20:52:43 +00:00
Teresa Johnson e87868b7e9 [ThinLTO] Port InlinerFunctionImportStats handling to new PM
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.

Reviewers: wmi, dexonsmith

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D48729

llvm-svn: 335914
2018-06-28 20:07:47 +00:00
Anastasis Grammenos 425df22ee3 [SROA] Preserve DebugLoc when rewriting alloca partitions
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.

Differential Revision: https://reviews.llvm.org/D48640

llvm-svn: 335904
2018-06-28 18:58:30 +00:00
Sanjay Patel 57bda365bf [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other 
direction because that's generally better of course). This allows us to remove the shuffle 
as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv

Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already 
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There 
are planned enhancements for opcode transforms such or -> add.

Note that there's a different fold needed if we've already managed to simplify away a binop 
as seen in the test based on PR37806, but we manage to get that one case here because this 
fold is positioned above the demanded elements fold currently.

Differential Revision: https://reviews.llvm.org/D48485

llvm-svn: 335888
2018-06-28 17:48:04 +00:00
Benjamin Kramer 269eb21e1c Revert "Add support for generating a call graph profile from Branch Frequency Info."
This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost.

llvm-svn: 335851
2018-06-28 13:15:03 +00:00
Jesper Antonsson 514b6b5796 Comment change to verify commit rights. NFC.
Summary: Just a silly one-character correction.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48709

llvm-svn: 335832
2018-06-28 10:55:04 +00:00
Florian Hahn 388af14f85 [SCCP] Mark CFG as preserved.
SCCP does not change the CFG, so we can mark it as preserved.

Reviewers: dberlin, efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47149

llvm-svn: 335820
2018-06-28 09:53:38 +00:00
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Michael J. Spencer 98f5475f44 [CGProfile] Fix unused variable warning.
llvm-svn: 335797
2018-06-28 00:12:04 +00:00
Michael J. Spencer 5bf1ead377 Add support for generating a call graph profile from Branch Frequency Info.
=== Generating the CG Profile ===

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions.  For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:
```
!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}
```

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335794
2018-06-27 23:58:08 +00:00
Teresa Johnson 7e7b13d016 [ThinLTO] Print names in function import debug messages when available
Summary:
Rather than just print the GUID, when it is available in the index,
print the global name as well in the function import thin link debug
messages. Names will be available when the combined index is being
built by the same process, e.g. a linker or "llvm-lto2 run".

Reviewers: davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits

Differential Revision: https://reviews.llvm.org/D48612

llvm-svn: 335760
2018-06-27 18:03:39 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Evgeniy Stepanov 289a7d4c7d Revert "[asan] Instrument comdat globals on COFF targets"
Causes false positive ODR violation reports on __llvm_profile_raw_version.

llvm-svn: 335681
2018-06-26 22:43:48 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Vedant Kumar 78ff0f1b83 Use a variable to appease a no-asserts bot, NFC
Failure URL:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/22836

llvm-svn: 335648
2018-06-26 18:55:26 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Vedant Kumar c85ca4cdab [Local] Add a convenient insertReplacementDbgValues overload, NFC
Add an overload for the common case where the replacement dbg.values
have the same DIExpressions as the originals.

llvm-svn: 335643
2018-06-26 18:44:53 +00:00
Vedant Kumar de46f65bbd [Local] Sink salvageDI's early exit into helper functions, NFC
salvageDebugInfo() performs a check that allows it to exit early without
doing a DenseMap lookup. It's a bit neater and marginally more useful to
sink this early exit into the findDbg{Addr,Users,Values} helpers.

llvm-svn: 335642
2018-06-26 18:44:52 +00:00
Sanjay Patel 9adea01c9f [InstCombine] simplify code for urem fold; NFCI
llvm-svn: 335623
2018-06-26 16:39:29 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 38a86d3136 [InstCombine] cleanup udiv folds; NFCI
This removes a "UDivFoldAction" in favor of a simple constant
matcher. In theory, the existing code could do more matching,
but I don't see any evidence or need for it. I've left a TODO
about using ValueTracking in case we see any regressions.

llvm-svn: 335545
2018-06-25 22:50:26 +00:00
Benjamin Kramer 1649774816 [Instrumentation] Remove unused include
It's also a layering violation.

llvm-svn: 335528
2018-06-25 21:43:09 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Craig Topper 27847868b7 [LoopIdiomRecognize] Fix a couple places where it appears we were unintenionally making copies of DebugLoc.
llvm-svn: 335521
2018-06-25 20:45:45 +00:00
Alexander Richardson 85e200e934 Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC
There are quite a few if statements that enumerate all these cases. It gets
even worse in our fork of LLVM where we also have a Triple::cheri (which
is mips64 + CHERI instructions) and we had to update all if statements that
check for Triple::mips64 to also handle Triple::cheri. This patch helps to
reduce our diff to upstream and should also make some checks more readable.

Reviewed By: atanasyan

Differential Revision: https://reviews.llvm.org/D48548

llvm-svn: 335493
2018-06-25 16:49:20 +00:00
Wei Mi e555127435 [SampleFDO] Add an option to turn on/off warning about samples unused.
If a function has sample to use, but cannot use them because of no debug
information, currently a warning will be issued to inform the missing
opportunity.

This warning assumes the binary generating the profile and the binary using
the profile are similar enough. It is not always the case. Sometimes even
if the binaries are not quite similar, we may still get some benefit by
using sampleFDO. In those cases, we may still want to apply sampleFDO but
not want to see a lot of such warnings pop up.

The patch adds an option for the warning.

Differential Revision: https://reviews.llvm.org/D48510

llvm-svn: 335484
2018-06-25 15:40:31 +00:00
Simon Pilgrim 79e474bf46 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI.
llvm-svn: 335457
2018-06-25 11:46:24 +00:00
Simon Pilgrim 3a0e13f347 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
llvm-svn: 335454
2018-06-25 11:38:27 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
Tobias Edler von Koch 7609cb83e6 Re-land "[LTO] Enable module summary emission by default for regular LTO"
Since we are now producing a summary also for regular LTO builds, we
need to run the NameAnonGlobals pass in those cases as well (the
summary cannot handle anonymous globals).

See https://reviews.llvm.org/D34156 for details on the original change.

This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3.

llvm-svn: 335385
2018-06-22 20:23:21 +00:00
Alina Sbirlea bee50036d3 [LoopUnswitch]Fix comparison for DomTree updates.
Summary:
In LoopUnswitch when replacing a branch Parent -> Succ with a conditional
branch Parent -> True & Parent->False, the DomTree updates should insert an edge for
each of True/False if True/False are different than Succ, and delete Parent->Succ edge
if both are different. The comparison with Succ appears to be incorect,
it's comparing with Parent instead.
There is no test failing either before or after this change, but it seems to me this is
the right way to do the update.

Reviewers: chandlerc, kuhar

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D48457

llvm-svn: 335369
2018-06-22 17:14:35 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 213cb1b82d [SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.
All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

llvm-svn: 335359
2018-06-22 16:10:26 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Sanjay Patel a52963b404 [InstCombine] rearrange shuffle-of-binops logic; NFC
The commutative matcher makes things more complicated
here, and I'm planning an enhancement where this 
form is more readable.

llvm-svn: 335343
2018-06-22 12:46:16 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Chandler Carruth aa5f4d2e23 Revert r335306 (and r335314) - the Call Graph Profile pass.
This is the first pass in the main pipeline to use the legacy PM's
ability to run function analyses "on demand". Unfortunately, it turns
out there are bugs in that somewhat-hacky approach. At the very least,
it leaks memory and doesn't support -debug-pass=Structure. Unclear if
there are larger issues or not, but this should get the sanitizer bots
back to green by fixing the memory leaks.

llvm-svn: 335320
2018-06-22 05:33:57 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Michael J. Spencer fc93dd8e18 [Instrumentation] Add Call Graph Profile pass
This patch adds support for generating a call graph profile from Branch Frequency Info.

The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight.

After scanning all the functions, it generates an appending module flag containing the data. The format looks like:

!llvm.module.flags = !{!0}

!0 = !{i32 5, !"CG Profile", !1}
!1 = !{!2, !3, !4} ; List of edges
!2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32
!3 = !{void (i1)* @freq, void ()* @a, i64 11}
!4 = !{void (i1)* @freq, void ()* @b, i64 20}

Differential Revision: https://reviews.llvm.org/D48105

llvm-svn: 335306
2018-06-21 23:31:10 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 7b0fc75f73 [InstCombine] simplify binops before trying other folds
This is outwardly NFC from what I can tell, but it should be more efficient 
to simplify first (despite the name, SimplifyAssociativeOrCommutative does
not actually simplify as InstSimplify does - it creates/morphs instructions).

This should make it easier to refactor duplicated code that runs for all binops.

llvm-svn: 335258
2018-06-21 17:06:36 +00:00
Sanjay Patel 3e5c051a06 [InstCombine] make div/rem vector constant utility function; NFCI
This was originally in D48401 and will be used there.

llvm-svn: 335242
2018-06-21 14:59:35 +00:00
Nicolai Haehnle db6911a6f9 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

llvm-svn: 335231
2018-06-21 13:37:45 +00:00
Nicolai Haehnle b29ee70122 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

llvm-svn: 335230
2018-06-21 13:37:31 +00:00
Florian Hahn d36aa1f763 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 335206
2018-06-21 07:15:08 +00:00
Chandler Carruth d1dab0c3c0 [PM/LoopUnswitch] Add partial non-trivial unswitching for invariant
conditions feeding a chain of `and`s or `or`s for a branch.

Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.

Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.

I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.

Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.

Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.

After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.

Differential Revision: https://reviews.llvm.org/D47522

llvm-svn: 335203
2018-06-21 06:14:03 +00:00
Michael Zolotukhin 336d75cc73 ProvenanceAnalysis: Store WeakTrackingVH instead of Value* in UnderlyingValue Cache.
Summary:
Since the value stored in the cache might be deleted or replaced with
something else, we need to use tracking ValueHandlers instead of plain
Value pointers. It was discovered in one of internal builds, and
unfortunately there is no small reproducer for the issue.

The cache was introduced in rL327328.

Reviewers: ahatanak, pete

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48407

llvm-svn: 335201
2018-06-21 05:14:00 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Simon Pilgrim 3d1c8c97b8 [SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.

Differential Revision: https://reviews.llvm.org/D48382

llvm-svn: 335170
2018-06-20 20:54:52 +00:00
Sanjay Patel 3597588493 [IR] add/use isIntDivRem convenience function
There are more existing potential users of this,
but I've limited this patch to the first couple
that I found to minimize typo risk.

llvm-svn: 335157
2018-06-20 19:02:17 +00:00
Chandler Carruth 4da3331d3d [PM/LoopUnswitch] Support partial trivial unswitching.
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.

Thanks to Sanjoy and Fedor for the review!

Differential Revision: https://reviews.llvm.org/D46706

llvm-svn: 335156
2018-06-20 18:57:07 +00:00
Vedant Kumar 4e93f3dcf8 [Local] Generalize insertReplacementDbgValues, NFC
This utility should operate on Values, not Instructions. While I'm here,
I've also made it possible to skip emitting replacement dbg.values for
certain debug users (by having RewriteExpr return nullptr).

llvm-svn: 335152
2018-06-20 18:40:14 +00:00
Florian Hahn 5ac2629823 [PredicateInfo] Order instructions in different BBs by DFSNumIn.
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.

Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.

This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.

Reviewers: dberlin, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48230

llvm-svn: 335150
2018-06-20 17:42:01 +00:00
Vedant Kumar 6fa24b0b7f [Local] Add a utility to insert replacement dbg.values, NFC
The purpose of this utility is to make it easier for optimizations to
insert replacement dbg.values for instructions they are deleting. This
is useful in situations where salvageDebugInfo is inapplicable, say,
because the new dbg.value cannot refer to an operand of the dying value.

The utility is called insertReplacementDbgValues.

It assumes that the instruction 'From' is going to be deleted, and
inserts replacement dbg.values for each debug user of 'From'. The
newly-inserted dbg.values refer to 'To' instead of 'From'. Each
replacement dbg.value has the same location and variable as the debug
user it replaces, has a DIExpression determined by the result of
'RewriteExpr' applied to an old debug user of 'From', and is placed
before 'InsertBefore'.

This should simplify future patches, like D48331.

llvm-svn: 335144
2018-06-20 16:50:25 +00:00
Simon Pilgrim 292651a5b7 [SLPVectorizer] Move isOneOf after InstructionsState type. NFCI.
A future patch will have isOneOf use InstructionsState.

llvm-svn: 335142
2018-06-20 16:11:00 +00:00
Simon Pilgrim 0c9f8dcde7 [SLPVectorizer] Use InstructionsState to record AltOpcode
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.

The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.

I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.

Differential Revision: https://reviews.llvm.org/D48359

llvm-svn: 335134
2018-06-20 15:13:40 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Sanjay Patel 825a4faa8d [InstCombine] ignore debuginfo when removing redundant assumes (PR37726)
This is similar to:
rL335083

Fixes::
https://bugs.llvm.org/show_bug.cgi?id=37726

llvm-svn: 335121
2018-06-20 13:22:26 +00:00
Simon Pilgrim b7ac037797 [SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI.
llvm-svn: 335110
2018-06-20 09:39:01 +00:00
Roman Lebedev 42a1ff11fb [NFC][SCEV] Add tests related to bit masking (PR37793)
Summary:
Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287

We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc
But it is currently restricted by rL155136 / rL155362, which says:
```
    // This is a constant shift of a constant shift. Be careful about hiding
    // shl instructions behind bit masks. They are used to represent multiplies
    // by a constant, and it is important that simple arithmetic expressions
    // are still recognizable by scalar evolution.
    //
    // The transforms applied to shl are very similar to the transforms applied
    // to mul by constant. We can be more aggressive about optimizing right
    // shifts.
    //
    // Combinations of right and left shifts will still be optimized in
    // DAGCombine where scalar evolution no longer applies.
```

I think these tests show that for *constants*, SCEV has no issues with that canonicalization.

Reviewers: mkazantsev, spatel, efriedma, sanjoy

Reviewed By: mkazantsev

Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia

Differential Revision: https://reviews.llvm.org/D48229

llvm-svn: 335101
2018-06-20 07:54:11 +00:00
Vedant Kumar f01827f2d1 [IR] Introduce helpers to skip debug instructions (NFC)
This patch introduces two helpers to make it easier to ignore debug
intrinsics:

- Instruction::getNextNonDebugInstruction()

This is just like Instruction::getNextNode(), except that it skips debug
info.

- skipDebugInfo(BasicBlock::iterator)

A free function which advances a BasicBlock iterator past any debug
info. This is a no-op when the iterator already points to a non-debug
instruction.

Part of: llvm.org/PR37728
Related to: https://reviews.llvm.org/D47874

Differential Revision: https://reviews.llvm.org/D48305

llvm-svn: 335083
2018-06-19 23:42:17 +00:00
Simon Pilgrim 0461393660 [SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC)
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.

Noticed while working on D47985.

Differential Revision: https://reviews.llvm.org/D48008

llvm-svn: 335045
2018-06-19 13:40:00 +00:00
Mikhail Dvoretckii 8393f90717 [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.

Differential Revision: https://reviews.llvm.org/D48067

llvm-svn: 335039
2018-06-19 10:49:12 +00:00
David Green e6a9c24878 [LoopSimplifyCFG] Invalidate SCEV in LoopSimplifyCFG
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.

Differential Revision: https://reviews.llvm.org/D48258

llvm-svn: 335036
2018-06-19 09:43:36 +00:00
Simon Pilgrim c966f7213e [SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands.
Minor step towards making the alternate opcode system work with a wider range of opcode pairs.

llvm-svn: 335032
2018-06-19 09:16:06 +00:00
Bjorn Pettersson 2015a39955 Remove valueCoversEntireFragment asserts in ConvertDebugDeclareToDebugValue
This is a fixup for r334830 causing problems in polly-aosp buildbot.

Focus in r334830 was to fix a problem seen with
ConvertDebugDeclareToDebugValue involving store instructions.
It also added some asserts to find out of similar problems
existed for the ConvertDebugDeclareToDebugValue functions
involving load and phi instructions. One of those asserts seems
to blow in the polly-aosp buildbot, so I'll revert the asserts
while debugging.

llvm-svn: 335031
2018-06-19 08:41:34 +00:00
Florian Hahn d8fcf0de31 [LoopInterchange] Move PHI handling to adjustLoopBranches.
This patch moves the logic to handle reduction PHI nodes to the end of
adjustLoopBranches. Reduction PHI nodes in the outer loop header can be
moved to the inner loop header and reduction PHI nodes from the inner loop
header can be moved to the outer loop header. In the latter situation,
we have to deal with 1 kind of PHI nodes:

    PHI nodes that are part of inner loop-only reductions.

We can replace the PHI node with the value coming from outside
the inner loop.

Reviewers: mcrosier, efriedma, karthikthecool

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D46198

llvm-svn: 335027
2018-06-19 08:03:24 +00:00
Max Kazantsev 37da4333a8 [SimplifyIndVars] Eliminate redundant truncs
This patch adds logic to deal with the following constructions:

  %iv = phi i64 ...
  %trunc = trunc i64 %iv to i32
  %cmp = icmp <pred> i32 %trunc, %invariant

Replacing it with
  %iv = phi i64 ...
  %cmp = icmp <pred> i64 %iv, sext/zext(%invariant)

In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).

Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames

llvm-svn: 335020
2018-06-19 04:48:34 +00:00
Xin Tong 54b4227f32 Revert "Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor"
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.

I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.

llvm-svn: 334993
2018-06-18 23:20:08 +00:00
Xin Tong bfd8cfcb8d Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor

This is a missing small optimization in MergeBlockIntoPredecessor.

This helps with one simplifycfg test which expects this case to be handled.

Reviewers: davide, spatel, brzycki, asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48284

llvm-svn: 334992
2018-06-18 22:59:13 +00:00
Florian Hahn 3385caaafd [VPlan] Add VPInstruction to VPRecipe transformation.
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46827

llvm-svn: 334969
2018-06-18 18:28:49 +00:00
Simon Pilgrim 5b962b2fc3 [SLPVectorizer] Tidyup isShuffle helper
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.

Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.

Differential Revision: https://reviews.llvm.org/D48023

llvm-svn: 334958
2018-06-18 16:25:01 +00:00
Florian Hahn 63cbcf98a5 [VPlanRecipeBase] Add eraseFromParent().
Reviewers: dcaballe, hsaito, mkuper, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48081

llvm-svn: 334951
2018-06-18 15:18:48 +00:00
Florian Hahn 3bcff3662c [VPlan] Fix sanitizer problem with insertBefore.
llvm-svn: 334943
2018-06-18 13:51:28 +00:00
Simon Pilgrim 99a5832016 [SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI.
llvm-svn: 334934
2018-06-18 11:35:36 +00:00
Florian Hahn 7591e4e94a [VPlanRecipeBase] Add insertBefore helper.
Reviewers: dcaballe, mkuper, hfinkel, hsaito, mssimpso

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48080

llvm-svn: 334933
2018-06-18 11:34:17 +00:00
Michael Zolotukhin 158a7c3323 CorrelatedValuePropagation: Preserve DT.
Summary:
We only modify CFG in a couple of places, and we can preserve DT there
with a little effort.

Reviewers: davide, vsk

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48059

llvm-svn: 334895
2018-06-16 18:57:31 +00:00
Matt Morehouse 0ea9a90b3d [SanitizerCoverage] Add associated metadata to pc-tables.
Summary:
Using associated metadata rather than llvm.used allows linkers to
perform dead stripping with -fsanitize-coverage=pc-table.  Unfortunately
in my local tests, LLD was the only linker that made use of this metadata.

Partially addresses https://bugs.llvm.org/show_bug.cgi?id=34636 and fixes
https://github.com/google/sanitizers/issues/971.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: Dor1s, hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D48203

llvm-svn: 334858
2018-06-15 20:12:58 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Joseph Tremoulet 6f406d4f02 [InstCombine] Avoid iteration/mutation conflict
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.

Reviewers: majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48192

llvm-svn: 334844
2018-06-15 16:52:40 +00:00
Diego Caballero 68795245cf [LV] Prevent LV to run cost model twice for VF=2
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.

Reviewers: xusx595, hsaito, fhahn, mkuper

Reviewed By: hsaito, xusx595

Differential Revision: https://reviews.llvm.org/D48048

llvm-svn: 334840
2018-06-15 16:21:35 +00:00
Bjorn Pettersson 428caf988b Re-apply "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.

Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334830
2018-06-15 13:48:55 +00:00
Roman Lebedev 84c11aed10 [InstCombine] Recommit: Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm

Reviewed By: spatel, rampitec, nhaehnle

Subscribers: wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334818
2018-06-15 09:56:52 +00:00
Bjorn Pettersson 972fd1c9e7 Revert rL334704: "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This reverts commit r334704.

Buildbots detected an assertion in "test tsan in debug compiler-rt build".

llvm-svn: 334732
2018-06-14 16:08:22 +00:00
Simon Pilgrim dee9c67f24 [EarlyCSE] Fix MSVC build. NFCI.
MSVC doesn't let you assign different lambdas through a ternary operator.

llvm-svn: 334715
2018-06-14 14:22:03 +00:00
Max Kazantsev ff6d1c9188 [EarlyCSE] Propagate conditions of AND and OR instructions
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.

Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames

llvm-svn: 334707
2018-06-14 13:02:13 +00:00
Bjorn Pettersson e406b29c22 [DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.

Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.

When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.

When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.

In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.

Reviewers: dblaikie, aprantl, rnk

Reviewed By: aprantl

Subscribers: JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334704
2018-06-14 11:23:42 +00:00
Simon Pilgrim b234ff136e [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.

Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.

Differential Revision: https://reviews.llvm.org/D48120

llvm-svn: 334701
2018-06-14 10:25:19 +00:00
Hiroshi Inoue f209649dfc [NFC] fix trivial typos in comments
llvm-svn: 334687
2018-06-14 05:41:49 +00:00
Reid Kleckner 12395b7795 [WinASan] Don't instrument globals in sections containing '$'
Such globals are very likely to be part of a sorted section array, such
the .CRT sections used for dynamic initialization. The uses its own
sorted sections called ATL$__a, ATL$__m, and ATL$__z. Instead of special
casing them, just look for the dollar sign, which is what invokes linker
section sorting for COFF.

Avoids issues with ASan and the ATL uncovered after we started
instrumenting comdat globals on COFF.

llvm-svn: 334653
2018-06-13 20:47:21 +00:00
Simon Pilgrim 2c9d2adff5 [SLPVectorizer] getSameOpcode - remove useless cast [NFC]
There's no need to cast the base Value to an Instruction

llvm-svn: 334588
2018-06-13 10:49:24 +00:00
Simon Pilgrim 1224260f83 [SLPVectorizer] getSameOpcode - remove unusued alternate code [NFC]
We early-out for the case where we don't use alternate opcodes, so no need to check for it later.

llvm-svn: 334587
2018-06-13 10:14:27 +00:00
Max Kazantsev 0ed79620c6 [SimplifyIndVars] Ignore dead users
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.

This patch teaches IndVarsSimplify ignore the mist trivial cases of that.

Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy

llvm-svn: 334567
2018-06-13 02:25:32 +00:00
Simon Pilgrim e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Florian Hahn a1cc848399 Use SmallPtrSet explicitly for SmallSets with pointer types (NFC).
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h

Reviewers: dblaikie, craig.topper

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47836

llvm-svn: 334492
2018-06-12 11:16:56 +00:00
Wei Mi a0c0857e7a [SampleFDO] Add a new compact binary format for sample profile.
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.

Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.

Differential Revision: https://reviews.llvm.org/D47955

llvm-svn: 334447
2018-06-11 22:40:43 +00:00
Roman Lebedev ebb3252f00 Revert rL334371 / D47980: "[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)"
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll broke,
and i did not notice because i did not build that backend.

llvm-svn: 334373
2018-06-10 20:32:03 +00:00
Roman Lebedev eb795a0661 [InstCombine] Fold (x >> y) << y -> x & (-1 << y)
Summary:
We already do it for matching splat constants, but not just values.

Further improvements for non-matching splat constants, as noted in
https://reviews.llvm.org/D46760#1123713 will be needed,
but i'd prefer to do that as a follow-up.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX
https://rise4fun.com/Alive/0HF

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47981

llvm-svn: 334372
2018-06-10 20:10:13 +00:00
Roman Lebedev 4cdc59ecf2 [InstCombine] Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334371
2018-06-10 20:10:06 +00:00
Craig Topper 98a79934af [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead.
llvm-svn: 334358
2018-06-10 06:01:36 +00:00
Craig Topper 61998289f9 Use SmallPtrSet instead of SmallSet in places where we iterate over the set.
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.

These places were found by hiding the begin/end methods in the SmallSet forwarding

llvm-svn: 334343
2018-06-09 05:04:20 +00:00
Davide Italiano 189c2cf114 [InstCombine] Skip dbg.value(s) when looking at stack{save,restore}.
Fixes PR37713.

llvm-svn: 334317
2018-06-08 20:42:36 +00:00
Reid Kleckner 0bab222084 [asan] Instrument comdat globals on COFF targets
Summary:
If we can use comdats, then we can make it so that the global metadata
is thrown away if the prevailing definition of the global was
uninstrumented. I have only tested this on COFF targets, but in theory,
there is no reason that we cannot also do this for ELF.

This will allow us to re-enable string merging with ASan on Windows,
reducing the binary size cost of ASan on Windows.

Reviewers: eugenis, vitalybuka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47841

llvm-svn: 334313
2018-06-08 18:33:16 +00:00
Florian Hahn 45e5d5b4be [VPlan] Move recipe construction to VPRecipeBuilder.
This patch moves the recipe-creation functions out of
LoopVectorizationPlanner, which should do the high-level
orchestration of the transformations.

Reviewers: dcaballe, rengolin, hsaito, Ayal

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D47595

llvm-svn: 334305
2018-06-08 17:30:45 +00:00
Daniil Fukalov 37433dc2e1 reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands

the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334300
2018-06-08 16:22:52 +00:00
Florian Hahn b3c6f07dde [VPlan] Move recipe based VPlan generation to separate function.
This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.

Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin

Reviewed By: dcaballe

Subscribers: bollu, tschuett, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D47477

llvm-svn: 334284
2018-06-08 12:53:51 +00:00
Roman Shirokiy 9ba0aa2da0 [LV] Fix PR36983. For a given recurrence, fix all phis in exit block
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.

Differential Revision: https://reviews.llvm.org/D47788

llvm-svn: 334271
2018-06-08 08:21:20 +00:00
Reid Kleckner a3609f75b2 Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.

llvm-svn: 334254
2018-06-08 00:43:27 +00:00
Daniil Fukalov 12c0663a25 [LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334209
2018-06-07 17:30:58 +00:00
Florian Hahn 0d6b01761c [Mem2Reg] Avoid replacing load with itself in promoteSingleBlockAlloca.
We do the same thing in rewriteSingleStoreAlloca.

Fixes PR37632.

Reviewers: chandlerc, davide, efriedma

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47825

llvm-svn: 334187
2018-06-07 11:09:05 +00:00
Max Kazantsev b4b2ccea6d [NFC] Use variable instead of accessing pair many times
llvm-svn: 334173
2018-06-07 08:47:19 +00:00
Michael Zolotukhin 31800864dc SpeculativeExecution Pass: Set PreserveCFG to avoid unnecessary analyses invalidation.
The pass doesn't touch CFG in any way, only moves instructions between
blocks.

llvm-svn: 334150
2018-06-07 00:19:29 +00:00
Teresa Johnson 4ffc3e7834 [ThinLTO] Rename index IsAnalysis flag to HaveGVs (NFC)
With the upcoming patch to add summary parsing support, IsAnalysis would
be true in contexts where we are not performing module summary analysis.
Rename to the more specific and approprate HaveGVs, which is essentially
what this flag is indicating.

llvm-svn: 334140
2018-06-06 22:22:01 +00:00
Sanjay Patel 3cd1aa88f9 [InstCombine] fold another shifty abs pattern to cmp+sel (PR36036)
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036

...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831

Alive proofs including nsw/nuw cases that were first noted in:
D46988

https://rise4fun.com/Alive/Kmp

This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.

llvm-svn: 334137
2018-06-06 21:58:12 +00:00
Roman Lebedev cbf8446359 [InstCombine] PR37603: low bit mask canonicalization
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].

https://godbolt.org/g/VCMNpS
https://rise4fun.com/Alive/idM

When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.

The typical C code looks like:
```
int mask_signed_add(int nbits) {
    return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 1, %0
  %3 = add nsw i32 %2, -1
  ret i32 %3
}
```

But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
    return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 -1, %0
  %3 = xor i32 %2, -1
  ret i32 %3
}
```

Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.

But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47428

llvm-svn: 334127
2018-06-06 19:38:27 +00:00
Tim Northover 9b80060d7b InstCombine: ignore debug instructions during fence combine
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.

Should fix at least the worst part of PR37690.

llvm-svn: 334090
2018-06-06 12:46:02 +00:00
John Brawn e4ff0bd401 [InstCombine] Correct the cmp operand type used when canonicalizing abs/nabs
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.

This fixes PR37686.

llvm-svn: 334019
2018-06-05 14:10:55 +00:00
Sanjay Patel dcb8d304c3 [InstCombine] refine UB-handling in shuffle-binop transform
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask

The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).

For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.

Differential Revision: https://reviews.llvm.org/D47686

llvm-svn: 333962
2018-06-04 22:26:45 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Dmitry Mikulin 4539487650 In thin and full LTO + CFI, direct function calls may go through jump table
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets, except in cases when they
can be pre-empted.

Differential Revision: https://reviews.llvm.org/D46326

llvm-svn: 333937
2018-06-04 18:18:12 +00:00
Serguei Katkov d894fb4288 [InstCombine] Fix div handling
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.

Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576

llvm-svn: 333864
2018-06-04 02:52:36 +00:00
Sanjay Patel 3bd957b7ae [InstCombine] improve sub with bool folds
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.

llvm-svn: 333845
2018-06-03 16:35:26 +00:00
Sanjay Patel bbc6d60677 [InstCombine] call simplify before trying vector folds
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds. 

The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that 
clearer.

This is NFCI at this point, but I've regenerated the test file 
to show the cosmetic value naming difference of using 
instcombine's RAUW vs. the builder.

Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.

llvm-svn: 333820
2018-06-02 16:27:44 +00:00
Chandler Carruth 9281503e8f [PM/LoopUnswitch] Fix how the cloned loops are handled when updating analyses.
Summary:
I noticed this issue because we didn't put the primary cloned loop into
the `NonChildClonedLoops` vector and so never iterated on it. Once
I fixed that, it made it clear why I had to do a really complicated and
unnecesasry dance when updating the loops to remain in canonical form --
I was unwittingly working around the fact that the primary cloned loop
wasn't in the expected list of cloned loops. Doh!

Now that we include it in this vector, we don't need to return it and we
can consolidate the update logic as we correctly have a single place
where it can be handled.

I've just added a test for the iteration order aspect as every time
I changed the update logic partially or incorrectly here, an existing
test failed and caught it so that seems well covered (which is also
evidenced by the extensive working around of this missing update).

Reviewers: asbirlea, sanjoy

Subscribers: mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47647

llvm-svn: 333811
2018-06-02 01:29:01 +00:00
Sanjay Patel 66f7e19f6a [InstCombine] fix vector shuffle transform to replace undef elements (PR37648)
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.

The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.

The test diffs show that vector demanded elements turns some of the potential, 
but not all, unused binop operands back into undef already.

llvm-svn: 333782
2018-06-01 19:23:18 +00:00
Vlad Tsyrklevich 6867ab7c90 [ThinLTOBitcodeWriter] Emit summaries for regular LTO modules
Summary:
Emit summaries for bitcode modules that are only destined for the
regular LTO portion of the build so they can participate in
summary-based dead stripping.

This change reduces the size of a nacl_helper build with cfi-icall
enabled by 7%, removing the majority of the overhead due to enabling
cfi-icall. The cfi-icall size increase was caused by compiling in lots
of unused code and cfi-icall generating jumptable references to unused
symbols that could no longer be removed by -Wl,-gc-sections. Increasing
the visibility of summary-based dead stripping prevented jumptable
entries being created for unused symbols from the regular LTO portion
of the build.

Reviewers: pcc

Reviewed By: pcc

Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47594

llvm-svn: 333768
2018-06-01 15:20:47 +00:00
Florian Hahn 8a17f1f43e Revert r333740: IPSCCP] Use PredicateInfo to propagate facts from cmp.
This is breaking the clang-with-thin-lto-ubuntu bot.

llvm-svn: 333745
2018-06-01 12:58:43 +00:00
Florian Hahn f4df554f32 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

Differential Revision: https://reviews.llvm.org/D45330

llvm-svn: 333740
2018-06-01 10:48:54 +00:00
Craig Topper 9a6c0bdcbd [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative.
Summary:
Loop idiom recognize tries to convert loops like

```
int foo(int x) {
  int cnt = 0;
  while (x) {
    x >>= 1;
    ++cnt;
  }
  return cnt;
}
```

into calls to ctlz, but if x is initially negative this loop should be infinite.

It happens that the cases that motivated this change have an absolute value of x before the loop. So this patch restricts the transform to cases where we know x is positive. Note: We are relying on the absolute value of INT_MIN to be undefined so we can assume that the result is always positive.

Fixes PR37479

Reviewers: spatel, hfinkel, efriedma, javed.absar

Reviewed By: efriedma

Subscribers: dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D47348

llvm-svn: 333702
2018-05-31 22:16:55 +00:00
Sanjay Patel 26368cd5d9 [InstCombine] narrow select to match condition operands' size
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).

As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.

llvm-svn: 333689
2018-05-31 19:55:27 +00:00
Craig Topper c9a4c6208b [JumpThreading] Fix some strange formatting of code inside LLVM_DEBUG. NFC
I don't know if clang-format got confused here or what.

llvm-svn: 333675
2018-05-31 18:08:11 +00:00
David Bolvansky 5430b73755 [SimplifyLibcalls] [NFC] Cleanup, improvements
Summary:
* Use "find('%')" instead of loop to find '%' char (we already uses find('%') in optimizePrintFString..)
* Convert getParent() chains to getModule()/getFunction()

Reviewers: lebedev.ri, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47397

llvm-svn: 333668
2018-05-31 16:39:27 +00:00
Benjamin Kramer 0deb9a9a1f Extend the GlobalObject metadata interface
- Make eraseMetadata return whether it changed something
- Wire getMetadata for a single MDNode efficiently into the attachment
map
- Add hasMetadata, which is less weird than checking getMetadata ==
nullptr on a multimap.

Use it to simplify code.

llvm-svn: 333649
2018-05-31 13:29:58 +00:00
Alexandros Lamprineas 61f0ba1fcc [InstCombine, ARM] Convert vld1 to llvm load
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.

Differential Revision: https://reviews.llvm.org/D46273

llvm-svn: 333643
2018-05-31 12:19:18 +00:00
Max Kazantsev 0bad5be430 [NFC] Factor out a method for further extension
llvm-svn: 333633
2018-05-31 08:08:34 +00:00
Roman Lebedev c0ecd06428 Revert rL333106 / D46814: [InstCombine] Fold unfolded masked merge pattern with variable mask!
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.

The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.

llvm-svn: 333631
2018-05-31 06:00:36 +00:00
Sanjay Patel e5bc441791 [InstCombine] don't change the size of a select if it would mismatch its condition operands' sizes
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'

This is something that came up as far back as D26556, and I lost track of it. 
I suspect that this transform is part of the underlying problem that is 
inspiring some of the recent proposals that seek to match larger patterns 
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).

A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.

Differential Revision: https://reviews.llvm.org/D47163

llvm-svn: 333611
2018-05-31 00:16:58 +00:00
Sanjay Patel ceb595b04e [InstCombine] don't negate constant expression with fsub (PR37605)
X + (-C) would be transformed back into X - C, so infinite loop:
https://bugs.llvm.org/show_bug.cgi?id=37605

llvm-svn: 333610
2018-05-30 23:55:12 +00:00
Vlad Tsyrklevich 178fdb1a3b [LowerTypeTests] Discard extern_weak linkage for definitions
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)

Reviewers: pcc

Reviewed By: pcc

Subscribers: mehdi_amini, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47494

llvm-svn: 333604
2018-05-30 22:39:52 +00:00
George Burgess IV 485762ccba [NewGVN] Fix set comparison; reflow comment
Looks like we intended to compare this->Members with Other->Members
here, but ended up comparing this->Members with this->Members. Oops. :)

Since CongruenceClass::Members is a SmallPtrSet anyway, we can probably
skip building std::sets if we're willing to write a bit more code.

This appears to be no functional change (for sufficiently lax values of
"no"): this equality check was only being called inside of an assert.
So, worst case, we'll catch more bugs in the form of assertion failures.

Thanks to d0k for noting this!

llvm-svn: 333601
2018-05-30 22:24:08 +00:00
Benjamin Kramer c8bd5449e0 [CalledValuePropagation] Just use a sorted vector instead of a set.
The set properties are never used, so a vector is enough. No
functionality change intended.

While there add some std::moves to SparseSolver.

llvm-svn: 333582
2018-05-30 19:31:11 +00:00
Alexandros Lamprineas 52457d33b2 [InstCombine, ARM, AArch64] Convert table lookup to shuffle vector
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.

Differential Revision: https://reviews.llvm.org/D46133

llvm-svn: 333550
2018-05-30 14:38:50 +00:00
Chandler Carruth 71fd27043e [PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.

This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.

I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.

Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.

Differential Revision: https://reviews.llvm.org/D47408

llvm-svn: 333493
2018-05-30 02:46:45 +00:00
Diego Caballero b94b21d441 [VPlan] Replace LLVM_ATTRIBUTE_USED with ifndef NDEBUG
Minor replacement. LLVM_ATTRIBUTE_USED was introduced to silence
a warning but using #ifndef NDEBUG makes more sense in this case.

Reviewers: dblaikie, fhahn, hsaito

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47498

llvm-svn: 333476
2018-05-29 23:10:44 +00:00
Chandler Carruth 4cbcbb0761 [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to
be both simpler and substantially more efficient.

Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.

Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.

On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.

Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.

This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.

I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.

Differential Revision: https://reviews.llvm.org/D47407

llvm-svn: 333461
2018-05-29 20:15:38 +00:00
Fangrui Song afa95ee03d [LLVM-C] [OCaml] Remove LLVMAddBBVectorizePass
Summary: It was fully replaced back in 2014, and the implementation was removed 11 months ago by r306797.

Reviewers: hfinkel, chandlerc, whitequark, deadalnix

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47436

llvm-svn: 333378
2018-05-28 16:58:10 +00:00
David Green aee7ad0cde Revert 333358 as it's failing on some builders.
I'm guessing the tests reply on the ARM backend being built.

llvm-svn: 333359
2018-05-27 12:54:33 +00:00
David Green 3034281b43 [UnrollAndJam] Add a new Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 333358
2018-05-27 12:11:21 +00:00