Commit Graph

11553 Commits

Author SHA1 Message Date
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Sanjay Patel 9c2e7ceb1a [InstCombine] allow narrowing of min/max/abs
We have bailout hacks based on min/max in various places in instcombine 
that shouldn't be necessary. The affected test was added for:
D48930 
...which is a consequence of the improvement in:
D48584 (https://reviews.llvm.org/rL336172)

I'm assuming the visitTrunc bailout in this patch was added specifically 
to avoid a change from SimplifyDemandedBits, so I'm just moving that 
below the EvaluateInDifferentType optimization. A narrow min/max is still
a min/max.

llvm-svn: 336293
2018-07-04 17:44:04 +00:00
Sanjay Patel ad32d22e13 [InstCombine] add value names to test; NFC
That makes it easier to mix and match lines into other tests.

llvm-svn: 336289
2018-07-04 16:56:35 +00:00
Gabor Buella da4a966e1c NFC - Various typo fixes in tests
llvm-svn: 336268
2018-07-04 13:28:39 +00:00
Max Kazantsev bee9ca6568 [NFC] Add test that shows that InstCombine can do better
llvm-svn: 336258
2018-07-04 10:32:02 +00:00
Anastasis Grammenos 204726b345 [DebugInfo][LoopVectorize] Preserve DL in generated phi instruction
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.

Differential Revision: https://reviews.llvm.org/D48769

llvm-svn: 336256
2018-07-04 10:16:55 +00:00
Anastasis Grammenos 509d79789f [DebugInfo][InstCombine] Preserve DI after combining zext
When zext is EvaluatedInDifferentType, InstCombine
drops the dbg.value intrinsic. This patch tries to
preserve said DI, by inserting the zext's old DI in the
resulting instruction. (Only for integer type for now)

Differential Revision: https://reviews.llvm.org/D48331

llvm-svn: 336254
2018-07-04 09:55:46 +00:00
Sanjay Patel 181aa26eb8 [InstCombine] add tests for shuffle+binop with constant op1; NFC
This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830.

llvm-svn: 336220
2018-07-03 18:43:46 +00:00
Sanjay Patel 8307bc407b [Constants] add identity constants for fadd/fmul
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).

The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.

InstCombine actually has several other uses of getBinOpIdentity() 
via SimplifyUsingDistributiveLaws(), but we don't call that for 
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.

llvm-svn: 336215
2018-07-03 17:12:59 +00:00
Sanjay Patel 2c38b7fd8b [Reassociate] add tests for binop with identity constant; NFC
llvm-svn: 336214
2018-07-03 16:44:18 +00:00
Sanjay Patel 5b4a003088 [Reassociate] regenerate checks; NFC
llvm-svn: 336211
2018-07-03 16:01:41 +00:00
Sanjay Patel 5a6ba018d7 [Reassociate] add test for missing FP constant analysis; NFC
llvm-svn: 336208
2018-07-03 15:56:04 +00:00
Sanjay Patel 3074b9e53f [InstCombine] fold shuffle-with-binop and common value
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments 
in this patch to complete this transform.

It's possible that a binop feeding a select-shuffle has been eliminated 
by earlier transforms (or the code was just written like this in the 1st 
place), so we'll fail to match the patterns that have 2 binops from: 
D48401, 
D48678, 
D48662, 
D48485.

In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass 
through the original values of the source operand).

I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. 
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).

Differential Revision: https://reviews.llvm.org/D48830

llvm-svn: 336196
2018-07-03 13:44:22 +00:00
Bjorn Pettersson 8dd6cf711f [DebugInfo] Corrections for salvageDebugInfo
Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).

Consider this example
  %vla = alloca i32, i64 2
  call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())

Instcombine will turn it into
  %vla1 = alloca [2 x i32]
  %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())

If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
  %vla1 = alloca [2 x i32]
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())

The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
  call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))

I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).

Reviewers: aprantl, vsk

Reviewed By: aprantl, vsk

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D48837

llvm-svn: 336191
2018-07-03 11:29:00 +00:00
Chandler Carruth 3897ded691 [PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when
unswitching loops.

Original patch trying to address this was sent in D47624, but that
didn't quite handle things correctly. There are two key principles used
to select whether and how to invalidate SCEV-cached information about
loops:

1) We must invalidate any info SCEV has cached before unswitching as we
   may change (or destroy) the loop structure by the act of unswitching,
   and make it hard to recover everything we want to invalidate within
   SCEV.

2) We need to invalidate all of the loops whose CFGs are mutated by the
   unswitching. Notably, this isn't the *entire* loop nest, this is
   every loop contained by the outermost loop reached by an exit block
   relevant to the unswitch.

And we need to do this even when doing trivial unswitching.

I've added more focused tests that directly check that SCEV starts off
with imprecise information and after unswitching (and simplifying
instructions) re-querying SCEV will produce precise information. These
tests also specifically work to check that an *outer* loop's information
becomes precise.

However, the testing here is still a bit imperfect. Crafting test cases
that reliably fail to be analyzed by SCEV before unswitching and succeed
afterward proved ... very, very hard. It took me several hours and
careful work to build these, and I'm not optimistic about necessarily
coming up with more to cover more elaborate possibilities. Fortunately,
the code pattern we are testing here in the pass is really
straightforward and reliable.

Thanks to Max Kazantsev for the initial work on this as well as the
review, and to Hal Finkel for helping me talk through approaches to test
this stuff even if it didn't come to much.

Differential Revision: https://reviews.llvm.org/D47624

llvm-svn: 336183
2018-07-03 09:13:27 +00:00
Max Kazantsev 3097b76e8c [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.

Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri

llvm-svn: 336172
2018-07-03 06:23:57 +00:00
Tim Shen c7cef4bcc4 [SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).
Summary:
Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change
SCEV is able to prove that the loop doesn't wrap-self (due to zext i16
to i64), disabling the entire loop versioning pass. Removed the zext and
just use i64.

Reviewers: sanjoy

Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D48409

llvm-svn: 336140
2018-07-02 20:01:54 +00:00
Farhana Aleen 3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Sanjay Patel b999d74132 [InstCombine] reverse canonicalization of add --> or to allow more shuffle folding
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)

Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving 
to where other passes could use that functionality.

Differential Revision: https://reviews.llvm.org/D48662

llvm-svn: 336128
2018-07-02 17:42:29 +00:00
Simon Pilgrim 35f196c179 [SLPVectorizer][X86] Begin adding alternate tests for call operators
Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls

llvm-svn: 336125
2018-07-02 17:23:45 +00:00
Sanjay Patel 284ba0c18f [ValueTracking] allow undef elements when matching vector abs
llvm-svn: 336111
2018-07-02 14:43:40 +00:00
Sanjay Patel 951f617e16 [InstCombine] adjust shuffle tests with IR flags; NFC
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report 
identity constants).

llvm-svn: 336101
2018-07-02 13:40:54 +00:00
Florian Hahn 4ebba909a2 Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 336098
2018-07-02 12:44:04 +00:00
Sanjay Patel d980084597 [InstCombine] add tests for shuffle-binop; NFC
This is another pattern mentioned in PR37806.

llvm-svn: 336096
2018-07-02 12:30:46 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Max Kazantsev 66da390506 [NFC] Test that shows unprofitability of instcombine with bit ranges
llvm-svn: 336078
2018-07-02 06:55:00 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Sanjay Patel 279a1a39ad [InstCombine] add abs tests with undef elts; NFC
llvm-svn: 336065
2018-07-01 17:14:37 +00:00
Sanjay Patel a9fdb9fd37 [PatternMatch] allow undef elements in vectors with m_Neg
This is similar to the m_Not change from D44076.

llvm-svn: 336064
2018-07-01 13:42:57 +00:00
David Green 963401d2be [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 336062
2018-07-01 12:47:30 +00:00
Simon Pilgrim 84f77ecba9 [SLPVectorizer][X86] Add some alternate tests for cast operators
Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations

llvm-svn: 336060
2018-07-01 11:29:46 +00:00
Eugene Leviant 6e4134459b [Evaluator] Improve evaluation of call instruction
Recommit of r335324 after buildbot failure fix

llvm-svn: 336059
2018-07-01 11:02:07 +00:00
Sanjay Patel 16a42ca274 [InstCombine] add tests for negate vector with undef elts; NFC
llvm-svn: 336050
2018-06-30 14:11:46 +00:00
Sanjay Patel 4491c0dd45 [InstCombine] add more tests for shuffle-binop folds; NFC
The mul+shl tests add coverage for the fold enabled with D48678.
The and+or tests are not handled yet; that's D48662.

llvm-svn: 335984
2018-06-29 15:28:11 +00:00
Sanjay Patel da66753e01 [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806)
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have 2 different variable values, then we shuffle (select) those lanes, 
shuffle (select) the constants, and then perform the binop. This eliminates a binop.

The new shuffle uses the same shuffle mask as the existing shuffle, so there's no 
danger of creating a difficult shuffle.

All of the earlier constraints still apply, but we also check for extra uses to 
avoid creating more instructions than we'll remove.

Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.

Differential Revision: https://reviews.llvm.org/D48678

llvm-svn: 335974
2018-06-29 13:44:06 +00:00
Roman Lebedev 8d081b78e4 SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction
Summary:
An alternative to D48597.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 | PR37936 ]].

The problem is as follows:
1. `indvars` marks `%dec` as `NUW`.
2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908)
3. `loop-reduce` tries to do some further modification, but crashes
    with an type assertion in cast, because `%dec` is no longer an `Instruction`,

If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`,
store that into a file, and then run `-loop-reduce`, there is no crash.

So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV.
But in this case we can just not crash if it's not an `Instruction`.
This is just a local fix, unlike D48597, so there may very well be other problems.

Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi

Reviewed By: mkazantsev

Subscribers: evstupac, javed.absar, spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D48599

llvm-svn: 335950
2018-06-29 07:44:20 +00:00
Sanjay Patel 019d3cd3b4 [InstCombine] adjust shuffle tests; NFC
Use xor for the extra uses test because div/rem have other problems.

llvm-svn: 335924
2018-06-28 21:14:02 +00:00
Teresa Johnson e87868b7e9 [ThinLTO] Port InlinerFunctionImportStats handling to new PM
Summary:
The InlinerFunctionImportStats will collect and dump stats regarding how
many function inlined into the module were imported by ThinLTO.

Reviewers: wmi, dexonsmith

Subscribers: mehdi_amini, inglorion, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D48729

llvm-svn: 335914
2018-06-28 20:07:47 +00:00
Anastasis Grammenos 425df22ee3 [SROA] Preserve DebugLoc when rewriting alloca partitions
When rewriting an alloca partition copy the DL from the
old alloca over the the new one.

Differential Revision: https://reviews.llvm.org/D48640

llvm-svn: 335904
2018-06-28 18:58:30 +00:00
Sanjay Patel 57bda365bf [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other 
direction because that's generally better of course). This allows us to remove the shuffle 
as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv

Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already 
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There 
are planned enhancements for opcode transforms such or -> add.

Note that there's a different fold needed if we've already managed to simplify away a binop 
as seen in the test based on PR37806, but we manage to get that one case here because this 
fold is positioned above the demanded elements fold currently.

Differential Revision: https://reviews.llvm.org/D48485

llvm-svn: 335888
2018-06-28 17:48:04 +00:00
Florian Hahn 388af14f85 [SCCP] Mark CFG as preserved.
SCCP does not change the CFG, so we can mark it as preserved.

Reviewers: dberlin, efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47149

llvm-svn: 335820
2018-06-28 09:53:38 +00:00
Max Kazantsev f5ba37182e [IndVarSimplify] Ignore unreachable users of truncs
If a trunc has a user in a block which is not reachable from entry,
we can safely perform trunc elimination as if this user didn't exist.

llvm-svn: 335816
2018-06-28 08:20:03 +00:00
Sanjay Patel 1ef49be8b6 [InstCombine] add tests for vector-select-of-binops with 2 variables; NFC
llvm-svn: 335778
2018-06-27 20:23:47 +00:00
Sanjay Patel 7e45aebe55 [InstCombine] add more tests for shuffle with different binops; NFC
llvm-svn: 335756
2018-06-27 17:21:57 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Vedant Kumar f6c0b41fb7 [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:

  (fptrunc (floor (fpext X))) -> (floorf X)

We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).

This was diagnosed by the debugify check added in r335682.

llvm-svn: 335696
2018-06-27 00:47:53 +00:00
Vedant Kumar d13536e9f3 [Debugify] Handle failure to get fragment size when checking dbg.values
It's not possible to get the fragment size of some dbg.values. Teach the
mis-sized dbg.value diagnostic to detect this scenario and bail out.

Tested with:
$ find test/Transforms -print -exec opt -debugify-each -instcombine {} \;

llvm-svn: 335695
2018-06-27 00:47:52 +00:00
Michael Zolotukhin d3b8bdef01 [JumpThreading] Don't try to rewrite a use if it's already valid.
Summary:
When recording uses we need to rewrite after cloning a loop we need to
check if the use is not dominated by the original def. The initial
assumption was that the cloned basic block will introduce a new path and
thus the original def will only dominate the use if they are in the same
BB, but as the reproducer from PR37745 shows it's not always the case.

This fixes PR37745.

Reviewers: haicheng, Ka-Ka

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D48111

llvm-svn: 335675
2018-06-26 22:19:48 +00:00
Matt Arsenault 7e991d30c0 ConstantFold: Don't fold global address vs. null for addrspace != 0
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.

On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)

llvm-svn: 335649
2018-06-26 18:55:43 +00:00
Matt Arsenault 2c1a570aab LoopUnroll: Allow analyzing intrinsic call costs
I'm not sure why the code here is skipping calls since
TTI does try to do something for general calls, but it
at least should allow intrinsics.

Skip intrinsics that should not be omitted as calls, which
is by far the most common case on AMDGPU.

llvm-svn: 335645
2018-06-26 18:51:17 +00:00
Sanjay Patel ad0bfb844d [InstSimplify] fold shifts by sext bool
https://rise4fun.com/Alive/c3Y

llvm-svn: 335633
2018-06-26 17:31:38 +00:00
Sanjay Patel 3d1e4d6fa6 [InstSimplify] add tests for shifts by sext bool; NFC
llvm-svn: 335631
2018-06-26 17:15:07 +00:00
Sanjay Patel 3575f0c0b3 [InstCombine] fold urem with sext bool divisor
Similar to other patches in this series:
https://reviews.llvm.org/rL335512
https://reviews.llvm.org/rL335527
https://reviews.llvm.org/rL335597
https://reviews.llvm.org/rL335616

...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.

Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y

llvm-svn: 335622
2018-06-26 16:30:00 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Sanjay Patel 0f44759b0d [InstCombine] add tests for urem with sext bool divisor; NFC
llvm-svn: 335619
2018-06-26 16:01:24 +00:00
Sanjay Patel 2b7e31095d [InstSimplify] fold srem with sext bool divisor
llvm-svn: 335616
2018-06-26 15:32:54 +00:00
Sanjay Patel 0e0dbebeed [InstSimplify] add tests for srem with sext bool divisor; NFC
llvm-svn: 335609
2018-06-26 14:47:31 +00:00
Sanjay Patel 7c45debaea [InstCombine] fold udiv with sext bool divisor
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.

llvm-svn: 335597
2018-06-26 12:41:15 +00:00
Florian Hahn 4a69b0bb36 [IPSCCP] Change dead blocks to unreachable after visiting all executable blocks.
changeToUnreachable may remove PHI nodes from executable blocks we found values
for and we would fail to replace them. By changing dead blocks to unreachable after
we replaced constants in all executable blocks, we ensure such PHI nodes are replaced
by their known value before.

Fixes PR37780.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48421

llvm-svn: 335588
2018-06-26 10:15:02 +00:00
Bjorn Pettersson 550517bcab Improve ConvertDebugDeclareToDebugValue
Summary:
This is a follow-up to r334830 and r335031.

In the valueCoversEntireFragment check we now also handle
the situation when there is a variable length array (VLA)
involved, and the length of the array has been reduced to
a constant.

The ConvertDebugDeclareToDebugValue functions that are related
to PHI nodes and load instructions now avoid inserting dbg.value
intrinsics when the value does not, for certain, cover the
variable/fragment that should be described.
In r334830 we assumed that the value always covered the entire
var/fragment and we had assertions in the code to show that
assumption. However, those asserts failed when compiling code
with VLAs, so we removed the asserts in r335031. Now when we
know that the valueCoversEntireFragment check can fail also for
PHI/Load instructions we avoid to insert the faulty dbg.value
intrinsic in such situations. Compared to the Store instruction
scenario we simply drop the dbg.value here (as the variable does
not change its value due to PHI/Load, so an earlier dbg.value
describing the variable should still be valid).

Reviewers: aprantl, vsk, efriedma

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48547

llvm-svn: 335580
2018-06-26 06:17:00 +00:00
Gil Rapaport da2e2caa6c [InstCombine] (A + 1) + (B ^ -1) --> A - B
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.

Differential Revision: https://reviews.llvm.org/D48535

llvm-svn: 335579
2018-06-26 05:31:18 +00:00
Chandler Carruth 1652996fd6 [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial
unswitching of switches.

This works much like trivial unswitching of switches in that it reliably
moves the switch out of the loop. Here we potentially clone the entire
loop into each successor of the switch and re-point the cases at these
clones.

Due to the complexity of actually doing nontrivial unswitching, this
patch doesn't create a dedicated routine for handling switches -- it
would duplicate far too much code. Instead, it generalizes the existing
routine to handle both branches and switches as it largely reduces to
looping in a few places instead of doing something once. This actually
improves the results in some cases with branches due to being much more
careful about how dead regions of code are managed. With branches,
because exactly one clone is created and there are exactly two edges
considered, somewhat sloppy handling of the dead regions of code was
sufficient in most cases. But with switches, there are much more
complicated patterns of dead code and so I've had to move to a more
robust model generally. We still do as much pruning of the dead code
early as possible because that allows us to avoid even cloning the code.

This also surfaced another problem with nontrivial unswitching before
which is that we weren't as precise in reconstructing loops as we could
have been. This seems to have been mostly harmless, but resulted in
pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we
have to get this *right*, and everything benefits from it.

While the testing may seem a bit light here because we only have two
real cases with actual switches, they do a surprisingly good job of
exercising numerous edge cases. Also, because we share the logic with
branches, most of the changes in this patch are reasonably well covered
by existing tests.

The new unswitch now has all of the same fundamental power as the old
one with the exception of the single unsound case of *partial* switch
unswitching -- that really is just loop specialization and not
unswitching at all. It doesn't fit into the canonicalization model in
any way. We can add a loop specialization pass that runs late based on
profile data if important test cases ever come up here.

Differential Revision: https://reviews.llvm.org/D47683

llvm-svn: 335553
2018-06-25 23:32:54 +00:00
Sanjay Patel 0c90400bf2 [InstCombine] add/move tests for udiv; NFC
llvm-svn: 335544
2018-06-25 22:27:36 +00:00
Sanjay Patel 6a96d90acd [InstCombine] fold sdiv with sext bool divisor
llvm-svn: 335527
2018-06-25 21:39:41 +00:00
Sanjay Patel 46f9b8c333 [InstCombine] add tests for sdiv with sext bool divisor; NFC
llvm-svn: 335526
2018-06-25 21:36:09 +00:00
Florian Hahn b10b141a79 Revert r335513: [SCEVExp] Advance found insertion point
llvm-svn: 335522
2018-06-25 20:55:26 +00:00
Florian Hahn 0b3ed5742a Force vector width for scev-expander-debug.ll test
llvm-svn: 335520
2018-06-25 20:40:50 +00:00
Florian Hahn 5947c17fd4 [SCEVExp] Advance found insertion point until we find a non-dbg instruction.
This avoids creating unnecessary casts if the IP used to be a dbg info
intrinsic. Fixes PR37727.

Reviewers: vsk, aprantl, sanjoy, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D47874

llvm-svn: 335513
2018-06-25 19:17:29 +00:00
Sanjay Patel 1e911fa746 [InstSimplify] fold div/rem of zexted bool
I was looking at an unrelated fold and noticed that
we don't have this simplification (because the other
fold would break existing tests).

Name: zext udiv
  %z = zext i1 %x to i32
  %r = udiv i32 %y, %z
=>
  %r = %y

Name: zext urem
  %z = zext i1 %x to i32
  %r = urem i32 %y, %z
=>
  %r = 0

Name: zext sdiv
  %z = zext i1 %x to i32
  %r = sdiv i32 %y, %z
=>
  %r = %y

Name: zext srem
  %z = zext i1 %x to i32
  %r = srem i32 %y, %z
=>
  %r = 0

https://rise4fun.com/Alive/LZ9

llvm-svn: 335512
2018-06-25 18:51:21 +00:00
Sanjay Patel a46bcbec58 [InstSimplify] add tests for div/rem with bool divisor; NFC
llvm-svn: 335509
2018-06-25 18:27:14 +00:00
Sanjay Patel 2e8babb4fa [InstCombine] add tests for add-of-sext-bool; NFC
We canonicalize to select with a zext-add and either zext-sub or sext-sub,
so this shows a pattern that's not conforming to the general trend.

llvm-svn: 335506
2018-06-25 17:52:10 +00:00
Stanislav Mekhanoshin d8c9374797 Fix invariant fdiv hoisting in LICM
FDiv is replaced with multiplication by reciprocal and invariant
reciprocal is hoisted out of the loop, while multiplication remains
even if invariant.

Switch checks for all invariant operands and only invariant
denominator to fix the issue.

Differential Revision: https://reviews.llvm.org/D48447

llvm-svn: 335411
2018-06-23 04:01:28 +00:00
Eli Friedman 203eaaf5ba [LoopReroll] Rewrite induction variable rewriting.
This gets rid of a bunch of weird special cases; instead, just use SCEV
rewriting for everything.  In addition to being simpler, this fixes a
bug where we would use the wrong stride in certain edge cases.

The one bit I'm not quite sure about is the trip count handling,
specifically the FIXME about overflow.  In general, I think we need to
widen the exit condition, but that's probably not profitable if the new
type isn't legal, so we probably need a check somewhere.  That said, I
don't think I'm making the existing problem any worse.

As a followup to this, a bunch of IV-related code in root-finding could
be cleaned up; with SCEV-based rewriting, there isn't any reason to
assume a loop will have exactly one or two PHI nodes.

Differential Revision: https://reviews.llvm.org/D45191

llvm-svn: 335400
2018-06-22 22:58:55 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Simon Pilgrim 229a781214 [SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases
llvm-svn: 335348
2018-06-22 13:53:58 +00:00
Sanjay Patel c2b37f73f5 [InstCombine] add shuffle+binops test from PR37806; NFC
This one shows another pattern that we'll need to match
in some cases, but the current ordering of folds allows
us to match this as 2 binops before simplification takes
place.

llvm-svn: 335347
2018-06-22 13:44:42 +00:00
Sanjay Patel cfd9da038c [InstCombine] add tests for shuffle-with-different-binops; NFC
llvm-svn: 335345
2018-06-22 13:19:25 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Eugene Leviant 6d711ca168 Revert r335324 due to a builtbot failure
llvm-svn: 335327
2018-06-22 08:57:01 +00:00
Eugene Leviant ea19c9473c [Evaluator] Improve evaluation of call instruction
Differential revision: https://reviews.llvm.org/D46584

llvm-svn: 335324
2018-06-22 08:29:36 +00:00
Chandler Carruth fe70b29cf7 [LegacyPM] Fix PR37888 by teaching the legacy loop pass manager how to
clear out deleted loops from the current queue beyond just the current
loop.

This is important because SimpleLoopUnswitch will now enqueue the same
loop to be re-processed. When it does this with the legacy PM, we don't
have a way of canceling the rest of the pipeline and so we can end up
deleting the loop before we reprocess it. =/

This change also makes it easy to support deleting other loops in the
queue to process, although I don't have any use cases for that.

Differential Revision: https://reviews.llvm.org/D48470

llvm-svn: 335317
2018-06-22 02:43:41 +00:00
Sanjay Patel 4784e1506e [InstCombine] fix shuffle-of-binops bug
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in 
the other, so we have to check for that possibility and
bail out.

llvm-svn: 335312
2018-06-21 23:56:59 +00:00
Sanjay Patel 23408c25f3 [InstCombine] add test for shuffle-of-binops; NFC
This shows a miscompile that was missed in rL335283.

llvm-svn: 335311
2018-06-21 23:53:01 +00:00
Matthew Voss 30648ab233 [GVN] Avoid casting a vector of size less than 8 bits to i8
Summary:
A reprise of D25849.

This crash was found through fuzzing some time ago and was documented in PR28879.

No check for load size has been added due to the following tests:
  - Transforms/GVN/invariant.group.ll
  - Transforms/GVN/pr10820.ll

These tests expect load sizes that are not a multiple of eight.

Thanks to @davide for the original patch.

Reviewers: nlopes, davide, RKSimon, reames, efriedma

Reviewed By: efriedma

Subscribers: davide, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D48330

llvm-svn: 335294
2018-06-21 21:43:20 +00:00
Sanjay Patel a76b70069d [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806

If we have a common variable operand used in a pair of binops with vector constants 
that are vector selected together, then we can constant shuffle the constant vectors 
to eliminate the shuffle instruction.

This has some tricky parts that are hopefully addressed in the tests and their 
respective comments:

  1. If the shuffle mask contains an undef element, then that lane of the result is 
     undef:
     http://llvm.org/docs/LangRef.html#shufflevector-instruction

     Therefore, we can replace the constant in that lane with an undef value except 
     for div/rem. With div/rem, an undef in the divisor would cause the whole op to 
     be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.

  2. Intersect the wrapping and FMF of the original binops for the new binop. There 
     should be no extra poison or fast-math potential in the new binop that wasn't 
     possible in the original code.

  3. Disregard other uses. Given that we're eliminating uses (shortening the 
     dependency chain), I think that's always the right IR canonicalization. But 
     I purposely chose the udiv test to demonstrate the scenario where both 
     intermediate values have other uses because that seems likely worse for 
     codegen with an expensive math op. This seems like a very rare possibility to 
     me, so I don't think it requires a backend patch first.

Differential Revision: https://reviews.llvm.org/D48401

llvm-svn: 335283
2018-06-21 20:15:09 +00:00
Francis Visoiu Mistrih ac599b6951 Revert r335206 "Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions."
This reverts commit r335206.

As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.

llvm-svn: 335272
2018-06-21 19:18:36 +00:00
Sanjay Patel 3382dc644e [InstCombine] add tests for shuffled cmps; NFC
llvm-svn: 335266
2018-06-21 18:07:38 +00:00
Sanjay Patel 3244537a3c [InstCombine] use constant pattern matchers with icmp+sext
The previous code worked with vectors, but it failed when the
vector constants contained undef elements. 
The matchers handle those cases.

llvm-svn: 335262
2018-06-21 17:51:44 +00:00
Sanjay Patel 5522e968ad [InstCombine] add vector icmp tests with undefs; NFC
llvm-svn: 335261
2018-06-21 17:37:14 +00:00
Sanjay Patel 447e8ece4d [LoopVectorize] regenerate full checks; NFC
llvm-svn: 335257
2018-06-21 16:54:32 +00:00
Nicolai Haehnle b29ee70122 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

llvm-svn: 335230
2018-06-21 13:37:31 +00:00
David Green d143c65de3 [DA] Enable -da-delinearize by default
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.

Differential Revision: https://reviews.llvm.org/D45872

llvm-svn: 335217
2018-06-21 11:53:16 +00:00
Simon Pilgrim 2a9cde026c [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

llvm-svn: 335216
2018-06-21 11:37:13 +00:00
Simon Pilgrim d08fbf6486 [SLPVectorizer][X86] Add horizontal add/sub tests
Shows PR37882 perf regression

llvm-svn: 335215
2018-06-21 11:16:10 +00:00
Florian Hahn d36aa1f763 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 335206
2018-06-21 07:15:08 +00:00
Chandler Carruth d1dab0c3c0 [PM/LoopUnswitch] Add partial non-trivial unswitching for invariant
conditions feeding a chain of `and`s or `or`s for a branch.

Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.

Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.

I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.

Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.

Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.

After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.

Differential Revision: https://reviews.llvm.org/D47522

llvm-svn: 335203
2018-06-21 06:14:03 +00:00
Alina Sbirlea dfd14adeb0 Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor

Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)

2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken

Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:

1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.

2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
  i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
  ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
  - Since Pred is not deleted (BB is), the entry block does not need updating.
  - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
  - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
  - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
  - adding some test owner as subscribers for the interesting tests modified:
    test/CodeGen/X86/avx-cmp.ll
    test/CodeGen/AMDGPU/nested-loop-conditions.ll
    test/CodeGen/AMDGPU/si-annotate-cf.ll
    test/CodeGen/X86/hoist-spill.ll
    test/CodeGen/X86/2006-11-17-IllegalMove.ll

3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.

Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar

Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D48202

llvm-svn: 335183
2018-06-20 22:01:04 +00:00
Sanjay Patel 88de778b9b [InstCombine] fix typo in test comment; NFC
llvm-svn: 335165
2018-06-20 20:16:45 +00:00
Chandler Carruth 4da3331d3d [PM/LoopUnswitch] Support partial trivial unswitching.
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.

As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.

This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.

Thanks to Sanjoy and Fedor for the review!

Differential Revision: https://reviews.llvm.org/D46706

llvm-svn: 335156
2018-06-20 18:57:07 +00:00
Sanjay Patel 74442061f2 [InstCombine] add vector select of binops tests (PR37806)
These represent the most basic requested transform - a matching
operand and 2 constant operands.

llvm-svn: 335151
2018-06-20 17:48:43 +00:00
Florian Hahn 5ac2629823 [PredicateInfo] Order instructions in different BBs by DFSNumIn.
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.

Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.

This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.

Reviewers: dberlin, efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D48230

llvm-svn: 335150
2018-06-20 17:42:01 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Sanjay Patel 0c57de4c21 [InstSimplify] Fix missed optimization in simplifyUnsignedRangeCheck()
For both operands are unsigned, the following optimizations are valid, and missing:

   1. X > Y && X != 0 --> X > Y
   2. X > Y || X != 0 --> X != 0
   3. X <= Y || X != 0 --> true
   4. X <= Y || X == 0 --> X <= Y
   5. X > Y && X == 0 --> false

unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.

Patch by: Li Jia He!

Differential Revision: https://reviews.llvm.org/D47922

llvm-svn: 335129
2018-06-20 14:22:49 +00:00
Sanjay Patel c1d2177f9d [InstSimplify] Add tests for missed optimizations in simplifyUnsignedRangeCheck (NFC)
These are the baseline tests for the functional change in D47922.

Patch by Li Jia He!

Differential Revision: https://reviews.llvm.org/D48000

llvm-svn: 335128
2018-06-20 14:03:13 +00:00
Sanjay Patel 825a4faa8d [InstCombine] ignore debuginfo when removing redundant assumes (PR37726)
This is similar to:
rL335083

Fixes::
https://bugs.llvm.org/show_bug.cgi?id=37726

llvm-svn: 335121
2018-06-20 13:22:26 +00:00
Mikhail Dvoretckii 8393f90717 [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.

Differential Revision: https://reviews.llvm.org/D48067

llvm-svn: 335039
2018-06-19 10:49:12 +00:00
David Green e6a9c24878 [LoopSimplifyCFG] Invalidate SCEV in LoopSimplifyCFG
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.

Differential Revision: https://reviews.llvm.org/D48258

llvm-svn: 335036
2018-06-19 09:43:36 +00:00
Max Kazantsev 37da4333a8 [SimplifyIndVars] Eliminate redundant truncs
This patch adds logic to deal with the following constructions:

  %iv = phi i64 ...
  %trunc = trunc i64 %iv to i32
  %cmp = icmp <pred> i32 %trunc, %invariant

Replacing it with
  %iv = phi i64 ...
  %cmp = icmp <pred> i64 %iv, sext/zext(%invariant)

In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).

Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames

llvm-svn: 335020
2018-06-19 04:48:34 +00:00
Sanjoy Das 6e9b355cc9 Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags"
This reverts r334428.  It incorrectly marks some multiplications as nuw.  Tim
Shen is working on a proper fix.

Original commit message:

[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.

Summary:
Previously we would add them for adds, but not multiplies.

llvm-svn: 335016
2018-06-19 04:09:44 +00:00
Xin Tong 54b4227f32 Revert "Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor"
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.

I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.

llvm-svn: 334993
2018-06-18 23:20:08 +00:00
Xin Tong bfd8cfcb8d Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor

This is a missing small optimization in MergeBlockIntoPredecessor.

This helps with one simplifycfg test which expects this case to be handled.

Reviewers: davide, spatel, brzycki, asbirlea

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48284

llvm-svn: 334992
2018-06-18 22:59:13 +00:00
Diego Caballero 72aed5e5dc Move redundant-vf2-cost.ll test to X86 directory
redundant-vf2-cost.ll is X86 specific. Moved from
test/Transforms/LoopVectorize/redundant-vf2-cost.ll to
test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll

llvm-svn: 334854
2018-06-15 18:46:03 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Joseph Tremoulet 6f406d4f02 [InstCombine] Avoid iteration/mutation conflict
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.

Reviewers: majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48192

llvm-svn: 334844
2018-06-15 16:52:40 +00:00
Diego Caballero 68795245cf [LV] Prevent LV to run cost model twice for VF=2
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.

Reviewers: xusx595, hsaito, fhahn, mkuper

Reviewed By: hsaito, xusx595

Differential Revision: https://reviews.llvm.org/D48048

llvm-svn: 334840
2018-06-15 16:21:35 +00:00
Bjorn Pettersson 428caf988b Re-apply "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.

Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334830
2018-06-15 13:48:55 +00:00
Simon Pilgrim 180497ea11 [SLP][X86] Add AVX2 run to POW2 SDIV Tests
Non-uniform pow2 tests are only make sense on targets with fast (low cost) non-uniform shifts

llvm-svn: 334821
2018-06-15 10:29:37 +00:00
Simon Pilgrim ca6215f8c8 [SLP][X86] Regenerate POW2 SDIV Tests
Added non-uniform pow2 test as well

llvm-svn: 334819
2018-06-15 10:07:03 +00:00
Roman Lebedev 84c11aed10 [InstCombine] Recommit: Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm

Reviewed By: spatel, rampitec, nhaehnle

Subscribers: wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334818
2018-06-15 09:56:52 +00:00
Mikhail Dvoretckii 0531ec654a NFC: Regenerating x86-sse41.ll test for InstCombine
Test regenerated to reduce noise in further patches.

llvm-svn: 334806
2018-06-15 07:59:29 +00:00
Eli Friedman 3f1ce093ea Make uitofp and sitofp defined on overflow.
IEEE 754 defines the expected result on overflow. As far as I know,
hardware implementations (of f16), and compiler-rt (__floatuntisf)
correctly return +-Inf on overflow. And I can't think of any useful
transform that would take advantage of overflow being undefined here.

Differential Revision: https://reviews.llvm.org/D47807

llvm-svn: 334777
2018-06-14 22:58:48 +00:00
Bjorn Pettersson 972fd1c9e7 Revert rL334704: "[DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue"
This reverts commit r334704.

Buildbots detected an assertion in "test tsan in debug compiler-rt build".

llvm-svn: 334732
2018-06-14 16:08:22 +00:00
Max Kazantsev ff6d1c9188 [EarlyCSE] Propagate conditions of AND and OR instructions
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.

Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames

llvm-svn: 334707
2018-06-14 13:02:13 +00:00
Bjorn Pettersson e406b29c22 [DebugInfo] Check size of variable in ConvertDebugDeclareToDebugValue
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.

Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.

When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.

When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.

In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.

Reviewers: dblaikie, aprantl, rnk

Reviewed By: aprantl

Subscribers: JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D48024

llvm-svn: 334704
2018-06-14 11:23:42 +00:00
Max Kazantsev 0ed79620c6 [SimplifyIndVars] Ignore dead users
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.

This patch teaches IndVarsSimplify ignore the mist trivial cases of that.

Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy

llvm-svn: 334567
2018-06-13 02:25:32 +00:00
Wei Mi a0c0857e7a [SampleFDO] Add a new compact binary format for sample profile.
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.

Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.

Differential Revision: https://reviews.llvm.org/D47955

llvm-svn: 334447
2018-06-11 22:40:43 +00:00
Farhana Aleen 078cd48a39 [SLP] Add testcases of min/max reduction pattern for AMDGPU.
Author: FarhanaAleen
llvm-svn: 334435
2018-06-11 20:29:31 +00:00
Justin Lebar aa4fec94d8 [SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.

Reviewers: sanjoy

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D48038

llvm-svn: 334428
2018-06-11 18:57:42 +00:00
Roman Lebedev ebb3252f00 Revert rL334371 / D47980: "[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)"
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll broke,
and i did not notice because i did not build that backend.

llvm-svn: 334373
2018-06-10 20:32:03 +00:00
Roman Lebedev eb795a0661 [InstCombine] Fold (x >> y) << y -> x & (-1 << y)
Summary:
We already do it for matching splat constants, but not just values.

Further improvements for non-matching splat constants, as noted in
https://reviews.llvm.org/D46760#1123713 will be needed,
but i'd prefer to do that as a follow-up.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX
https://rise4fun.com/Alive/0HF

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47981

llvm-svn: 334372
2018-06-10 20:10:13 +00:00
Roman Lebedev 4cdc59ecf2 [InstCombine] Fold (x << y) >> y -> x & (-1 >> y)
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47980

llvm-svn: 334371
2018-06-10 20:10:06 +00:00
Roman Lebedev 1e9457ee76 [NFC][InstCombine] Revisit tests for D47980 / D47981 once more.
llvm-svn: 334370
2018-06-10 20:10:00 +00:00
Craig Topper 98a79934af [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead.
llvm-svn: 334358
2018-06-10 06:01:36 +00:00
Roman Lebedev ff7652e7ab [NFC][InstCombine] More tests for (x >> y) << y -> x & (-1 << y) fold.
Followup for rL334347.
The fold is also valid for ashr.
https://rise4fun.com/Alive/0HF

https://bugs.llvm.org/show_bug.cgi?id=37603
https://reviews.llvm.org/D46760#1123713
https://rise4fun.com/Alive/cplX

llvm-svn: 334349
2018-06-09 14:01:46 +00:00
Roman Lebedev 8aada65f81 [NFC][InstCombine] Tests for (x >> y) << y -> x & (-1 << y) fold.
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://reviews.llvm.org/D46760#1123713
https://rise4fun.com/Alive/cplX

llvm-svn: 334347
2018-06-09 09:27:43 +00:00
Roman Lebedev 794e29f964 [NFC][InstCombine] Tests for (x << y) >> y -> x & (-1 >> y) fold.
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.

https://bugs.llvm.org/show_bug.cgi?id=37603
https://rise4fun.com/Alive/cplX

llvm-svn: 334346
2018-06-09 09:27:39 +00:00
Davide Italiano 189c2cf114 [InstCombine] Skip dbg.value(s) when looking at stack{save,restore}.
Fixes PR37713.

llvm-svn: 334317
2018-06-08 20:42:36 +00:00
Sanjay Patel afcf39e1f9 [InstCombine] add llvm.assume + debuginfo test (PR37726); NFC
llvm-svn: 334314
2018-06-08 18:47:33 +00:00
Daniil Fukalov 37433dc2e1 reapply r334209 with fixes for harfbuzz in Chromium
r334209 description:
[LSR] Check yet more intrinsic pointer operands

the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334300
2018-06-08 16:22:52 +00:00
Roman Lebedev b060ce45ca [InstSimplify] add nuw %x, -1 -> -1 fold.
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
    nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
    respectively. If the nuw and/or nsw keywords are present,
    the result value of the add is a poison value if unsigned
    and/or signed overflow, respectively, occurs.

So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.

I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.

The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
   before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9

https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47908

llvm-svn: 334298
2018-06-08 15:44:47 +00:00
Roman Shirokiy 9ba0aa2da0 [LV] Fix PR36983. For a given recurrence, fix all phis in exit block
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.

Differential Revision: https://reviews.llvm.org/D47788

llvm-svn: 334271
2018-06-08 08:21:20 +00:00
Reid Kleckner a3609f75b2 Revert r334209 "[LSR] Check yet more intrinsic pointer operands"
This causes cast failures when compiling harfbuzz in Chromium.
Reproducer on the way.

llvm-svn: 334254
2018-06-08 00:43:27 +00:00
Roman Lebedev 188a619e56 [NFC][InstSimplify] Add tests for add nuw %x, -1 -> -1 fold.
%ret = add nuw i8 %x, C
From langref:
	nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
	respectively. If the nuw and/or nsw keywords are present,
	the result value of the add is a poison value if unsigned
	and/or signed overflow, respectively, occurs.

So if C is -1, %x can only be 0, and the result is always -1.

https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.

llvm-svn: 334236
2018-06-07 21:19:50 +00:00
Roman Lebedev fdd90f2fc6 [NFC][InstSimplify] One more negative test for shl nuw C, %x -> C fold.
Follow-up for rL334200, rL334206.

llvm-svn: 334235
2018-06-07 21:19:45 +00:00
Roman Lebedev 2683802ba0 [InstSimplify] shl nuw C, %x -> C iff signbit is set on C.
Summary:
`%r = shl nuw i8 C, %x`

As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.

https://rise4fun.com/Alive/WMk
https://rise4fun.com/Alive/udv

Was mentioned in D47428 review.

We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.

Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47883

llvm-svn: 334222
2018-06-07 20:03:45 +00:00
Sanjay Patel 4b1205b40f [TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls
These weren't included in D19544 - probably just an oversight.
D40044 made it more likely that we'll have LLVM math intrinsics rather 
than libcalls, so this bug was more easily exposed.
As the tests/code show, we already have the complete mappings for pow/exp/log.

I don't have any experience with SVML, so I don't know if anything else is 
missing. It's also not clear to me that we should be doing this transform in 
IR rather than DAG/isel, but that's a separate issue.

Differential Revision: https://reviews.llvm.org/D47610

llvm-svn: 334211
2018-06-07 18:21:24 +00:00
Daniil Fukalov 12c0663a25 [LSR] Check yet more intrinsic pointer operands
the patch fixes another assertion in isLegalUse()

Differential Revision: https://reviews.llvm.org/D47794

llvm-svn: 334209
2018-06-07 17:30:58 +00:00
Roman Lebedev 86d376f516 [NFC][InstSimplify] Add more tests for shl nuw C, %x -> C fold.
Follow-up for rL334200.
For these, KnownBits will be needed.

llvm-svn: 334206
2018-06-07 16:18:26 +00:00
Roman Lebedev 847938925b [NFC][InstSimplify] Add tests for shl nuw C, %x -> C fold.
%r = shl nuw i8 C, %x

As per langref: If the nuw keyword is present, then the shift produces
                a poison value if it shifts out any non-zero bits.
Thus, if the sign bit is set on C, then %x can only be 0,
which means that %r can only be C.

https://rise4fun.com/Alive/WMk
Was mentioned in D47428 review.

llvm-svn: 334200
2018-06-07 14:18:38 +00:00
Florian Hahn 0d6b01761c [Mem2Reg] Avoid replacing load with itself in promoteSingleBlockAlloca.
We do the same thing in rewriteSingleStoreAlloca.

Fixes PR37632.

Reviewers: chandlerc, davide, efriedma

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D47825

llvm-svn: 334187
2018-06-07 11:09:05 +00:00
Sanjay Patel 3cd1aa88f9 [InstCombine] fold another shifty abs pattern to cmp+sel (PR36036)
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036

...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831

Alive proofs including nsw/nuw cases that were first noted in:
D46988

https://rise4fun.com/Alive/Kmp

This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.

llvm-svn: 334137
2018-06-06 21:58:12 +00:00
Sanjay Patel 6fda6b1210 [InstCombine] add tests for another abs() pattern (PR36036); NFC
llvm-svn: 334133
2018-06-06 21:32:42 +00:00
Roman Lebedev cbf8446359 [InstCombine] PR37603: low bit mask canonicalization
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].

https://godbolt.org/g/VCMNpS
https://rise4fun.com/Alive/idM

When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.

The typical C code looks like:
```
int mask_signed_add(int nbits) {
    return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 1, %0
  %3 = add nsw i32 %2, -1
  ret i32 %3
}
```

But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
    return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
  %2 = shl i32 -1, %0
  %3 = xor i32 %2, -1
  ret i32 %3
}
```

Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.

But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47428

llvm-svn: 334127
2018-06-06 19:38:27 +00:00
Roman Lebedev 4771bc6c35 [InstCombine][NFC] PR37603: low bit mask canonicalization tests
Differential Revision: https://reviews.llvm.org/D47427

llvm-svn: 334126
2018-06-06 19:38:21 +00:00
Vedant Kumar 6d354ed72e [Debugify] Move debug value intrinsics closer to their operand defs
Before this patch, debugify would insert debug value intrinsics before the
terminating instruction in a block. This had the advantage of being simple,
but was a bit too simple/unrealistic.

This patch teaches debugify to insert debug values immediately after their
operand defs. This enables better testing of the compiler.

For example, with this patch, `opt -debugify-each` is able to identify a
vectorizer DI-invariance bug fixed in llvm.org/PR32761. In this bug, the
vectorizer produced different output with/without debug info present.

Reverting Davide's bugfix locally, I see:

$ ~/scripts/opt-check-dbg-invar.sh ./bin/opt \
  .../SLPVectorizer/AArch64/spillcost-di.ll -slp-vectorizer
Comparing: -slp-vectorizer .../SLPVectorizer/AArch64/spillcost-di.ll
  Baseline: /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.iYYeL1kf
  With DI : /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.sQtQSeet
9,11c9,11
<   %5 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1
<   %6 = bitcast i64* %4 to <2 x i64>*
<   %7 = load <2 x i64>, <2 x i64>* %6, align 8, !tbaa !0
---
>   %5 = load i64, i64* %4, align 8, !tbaa !0
>   %6 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1
>   %7 = load i64, i64* %6, align 8, !tbaa !5
12a13
>   store i64 %5, i64* %8, align 8, !tbaa !0
14,15c15
<   %10 = bitcast i64* %8 to <2 x i64>*
<   store <2 x i64> %7, <2 x i64>* %10, align 8, !tbaa !0
---
>   store i64 %7, i64* %9, align 8, !tbaa !5
:: Found a test case ^

Running this over the *.ll files in tree, I found four additional examples
which compile differently with/without DI present. I plan on filing bugs for
these.

llvm-svn: 334118
2018-06-06 19:05:42 +00:00
Sanjay Patel 0e8b90da0c [ConstProp] move tests for fp <--> int; NFC
These were added for D5603 / rL219542, and there's a proposal to 
change one side in D47807.

These are tests of constant propagation, so they shouldn't have
ever been tested/housed under InstCombine.

llvm-svn: 334107
2018-06-06 16:53:56 +00:00
David Green 25312b2b6c [GlobalMerge] Set the alignment on merged global structs
If no alignment is set, the abi/preferred alignment of structs will be
used which may be higher than required. This can lead to extra padding
and in the end an increase in data size.

Differential Revision: https://reviews.llvm.org/D47633

llvm-svn: 334099
2018-06-06 14:48:32 +00:00
Tim Northover 9b80060d7b InstCombine: ignore debug instructions during fence combine
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.

Should fix at least the worst part of PR37690.

llvm-svn: 334090
2018-06-06 12:46:02 +00:00
Guozhi Wei c4c6b548c5 [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.

llvm-svn: 334049
2018-06-05 21:03:52 +00:00
John Brawn e4ff0bd401 [InstCombine] Correct the cmp operand type used when canonicalizing abs/nabs
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.

This fixes PR37686.

llvm-svn: 334019
2018-06-05 14:10:55 +00:00
Sanjay Patel dcb8d304c3 [InstCombine] refine UB-handling in shuffle-binop transform
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask

The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).

For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.

Differential Revision: https://reviews.llvm.org/D47686

llvm-svn: 333962
2018-06-04 22:26:45 +00:00
Dmitry Mikulin 4539487650 In thin and full LTO + CFI, direct function calls may go through jump table
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets, except in cases when they
can be pre-empted.

Differential Revision: https://reviews.llvm.org/D46326

llvm-svn: 333937
2018-06-04 18:18:12 +00:00
John Brawn c5a6392be3 [ValueTracking] Match select abs pattern when there's an sext involved
When checking a select to see if it matches an abs, allow the true/false values
to be a sign-extension of the comparison value instead of requiring that they're
directly the comparison value, as all the comparison cares about is the sign of
the value.

This fixes a regression due to r333702, where we were no longer generating ctlz
due to isKnownNonNegative failing to match such a pattern.

Differential Revision: https://reviews.llvm.org/D47631

llvm-svn: 333927
2018-06-04 16:53:57 +00:00
Vedant Kumar 7dda22115e [Debugify] Add debug intrinsics before terminating musttail calls
After r333856, opt -debugify would just stop emitting debug value
intrinsics after encountering a musttail call. This wasn't sufficient to
avoid verifier failures.

Debug value intrinicss for all instructions preceding a musttail call
must also be emitted before the musttail call.

llvm-svn: 333866
2018-06-04 03:33:01 +00:00
Serguei Katkov d894fb4288 [InstCombine] Fix div handling
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.

Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576

llvm-svn: 333864
2018-06-04 02:52:36 +00:00
Sanjay Patel 3bd957b7ae [InstCombine] improve sub with bool folds
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.

llvm-svn: 333845
2018-06-03 16:35:26 +00:00
Sanjay Patel bbc6d60677 [InstCombine] call simplify before trying vector folds
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds. 

The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that 
clearer.

This is NFCI at this point, but I've regenerated the test file 
to show the cosmetic value naming difference of using 
instcombine's RAUW vs. the builder.

Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.

llvm-svn: 333820
2018-06-02 16:27:44 +00:00
Sanjay Patel b6333486f4 [InstCombine] add more tests for shuffle-binop; NFC
As noted in the review thread for rL333782, we're lacking coverage
for this transform, so add tests for each binop opcode with constant 
operand.

llvm-svn: 333818
2018-06-02 16:16:42 +00:00
Chandler Carruth 9281503e8f [PM/LoopUnswitch] Fix how the cloned loops are handled when updating analyses.
Summary:
I noticed this issue because we didn't put the primary cloned loop into
the `NonChildClonedLoops` vector and so never iterated on it. Once
I fixed that, it made it clear why I had to do a really complicated and
unnecesasry dance when updating the loops to remain in canonical form --
I was unwittingly working around the fact that the primary cloned loop
wasn't in the expected list of cloned loops. Doh!

Now that we include it in this vector, we don't need to return it and we
can consolidate the update logic as we correctly have a single place
where it can be handled.

I've just added a test for the iteration order aspect as every time
I changed the update logic partially or incorrectly here, an existing
test failed and caught it so that seems well covered (which is also
evidenced by the extensive working around of this missing update).

Reviewers: asbirlea, sanjoy

Subscribers: mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D47647

llvm-svn: 333811
2018-06-02 01:29:01 +00:00
Karl-Johan Karlsson 6d52e5c3e4 [ConstantFold] Disallow folding vector geps into bitcasts
Summary:
Getelementptr returns a vector of pointers, instead of a single address,
when one or more of its arguments is a vector. In such case it is not
possible to simplify the expression by inserting a bitcast of operand(0)
into the destination type, as it will create a bitcast between different
sizes.

Reviewers: majnemer, mkuper, mssimpso, spatel

Reviewed By: spatel

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D46379

llvm-svn: 333783
2018-06-01 19:34:35 +00:00
Sanjay Patel 66f7e19f6a [InstCombine] fix vector shuffle transform to replace undef elements (PR37648)
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.

The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.

The test diffs show that vector demanded elements turns some of the potential, 
but not all, unused binop operands back into undef already.

llvm-svn: 333782
2018-06-01 19:23:18 +00:00
Sanjay Patel 3883fcd0c0 [InstCombine] add tests for broken shuffle transform (PR37648)
llvm-svn: 333779
2018-06-01 18:52:38 +00:00
Vlad Tsyrklevich 6867ab7c90 [ThinLTOBitcodeWriter] Emit summaries for regular LTO modules
Summary:
Emit summaries for bitcode modules that are only destined for the
regular LTO portion of the build so they can participate in
summary-based dead stripping.

This change reduces the size of a nacl_helper build with cfi-icall
enabled by 7%, removing the majority of the overhead due to enabling
cfi-icall. The cfi-icall size increase was caused by compiling in lots
of unused code and cfi-icall generating jumptable references to unused
symbols that could no longer be removed by -Wl,-gc-sections. Increasing
the visibility of summary-based dead stripping prevented jumptable
entries being created for unused symbols from the regular LTO portion
of the build.

Reviewers: pcc

Reviewed By: pcc

Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47594

llvm-svn: 333768
2018-06-01 15:20:47 +00:00
Sanjay Patel 284fe7a8ec [InstCombine] add baseline test for bug with div+select transform (D47576)
llvm-svn: 333756
2018-06-01 14:39:05 +00:00
Florian Hahn 8a17f1f43e Revert r333740: IPSCCP] Use PredicateInfo to propagate facts from cmp.
This is breaking the clang-with-thin-lto-ubuntu bot.

llvm-svn: 333745
2018-06-01 12:58:43 +00:00
Florian Hahn f4df554f32 Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

Differential Revision: https://reviews.llvm.org/D45330

llvm-svn: 333740
2018-06-01 10:48:54 +00:00
Sanjay Patel affe450db7 [LoopVectorize, x86] add tests to show missing SVML transforms; NFC
llvm-svn: 333707
2018-05-31 22:31:02 +00:00
Craig Topper 9a6c0bdcbd [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative.
Summary:
Loop idiom recognize tries to convert loops like

```
int foo(int x) {
  int cnt = 0;
  while (x) {
    x >>= 1;
    ++cnt;
  }
  return cnt;
}
```

into calls to ctlz, but if x is initially negative this loop should be infinite.

It happens that the cases that motivated this change have an absolute value of x before the loop. So this patch restricts the transform to cases where we know x is positive. Note: We are relying on the absolute value of INT_MIN to be undefined so we can assume that the result is always positive.

Fixes PR37479

Reviewers: spatel, hfinkel, efriedma, javed.absar

Reviewed By: efriedma

Subscribers: dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D47348

llvm-svn: 333702
2018-05-31 22:16:55 +00:00
Sanjay Patel 2ae1ab30e3 [LoopVectorize, x86] regenerate checks; NFC
I removed the 'fast' flag from the calls because that's not required.

llvm-svn: 333695
2018-05-31 21:30:36 +00:00
Sanjay Patel 26368cd5d9 [InstCombine] narrow select to match condition operands' size
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).

As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.

llvm-svn: 333689
2018-05-31 19:55:27 +00:00
Sanjay Patel dfbe6b49f0 [InstCombine] regenerate checks; NFC
llvm-svn: 333682
2018-05-31 19:25:02 +00:00
Alexandros Lamprineas 61f0ba1fcc [InstCombine, ARM] Convert vld1 to llvm load
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.

Differential Revision: https://reviews.llvm.org/D46273

llvm-svn: 333643
2018-05-31 12:19:18 +00:00
Roman Lebedev c0ecd06428 Revert rL333106 / D46814: [InstCombine] Fold unfolded masked merge pattern with variable mask!
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.

The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.

llvm-svn: 333631
2018-05-31 06:00:36 +00:00
Sanjay Patel e5bc441791 [InstCombine] don't change the size of a select if it would mismatch its condition operands' sizes
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'

This is something that came up as far back as D26556, and I lost track of it. 
I suspect that this transform is part of the underlying problem that is 
inspiring some of the recent proposals that seek to match larger patterns 
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).

A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.

Differential Revision: https://reviews.llvm.org/D47163

llvm-svn: 333611
2018-05-31 00:16:58 +00:00
Sanjay Patel ceb595b04e [InstCombine] don't negate constant expression with fsub (PR37605)
X + (-C) would be transformed back into X - C, so infinite loop:
https://bugs.llvm.org/show_bug.cgi?id=37605

llvm-svn: 333610
2018-05-30 23:55:12 +00:00
Vlad Tsyrklevich 178fdb1a3b [LowerTypeTests] Discard extern_weak linkage for definitions
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)

Reviewers: pcc

Reviewed By: pcc

Subscribers: mehdi_amini, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D47494

llvm-svn: 333604
2018-05-30 22:39:52 +00:00
Karl-Johan Karlsson ebaaa2ddae [ValueTracking] Fix endless recursion in isKnownNonZero()
Summary:
The isKnownNonZero() function have checks that abort the recursion when
it reaches the specified max depth. However one of the recursive calls
was placed before the max depth check was done, resulting in a endless
recursion that eventually triggered a segmentation fault.

Fixed the problem by moving the max depth check above the first
recursive call.

Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bjope, llvm-commits

Differential Revision: https://reviews.llvm.org/D47531

llvm-svn: 333557
2018-05-30 15:56:46 +00:00
Alexandros Lamprineas 52457d33b2 [InstCombine, ARM, AArch64] Convert table lookup to shuffle vector
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.

Differential Revision: https://reviews.llvm.org/D46133

llvm-svn: 333550
2018-05-30 14:38:50 +00:00
Chandler Carruth 71fd27043e [PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.

This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.

I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.

Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.

Differential Revision: https://reviews.llvm.org/D47408

llvm-svn: 333493
2018-05-30 02:46:45 +00:00
Chandler Carruth 4cbcbb0761 [LoopInstSimplify] Re-implement the core logic of loop-instsimplify to
be both simpler and substantially more efficient.

Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.

Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.

On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.

Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.

This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.

I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.

Differential Revision: https://reviews.llvm.org/D47407

llvm-svn: 333461
2018-05-29 20:15:38 +00:00
Farhana Aleen eacb1020aa [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default.
Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found
         to be resolved by some other fixes.

Author: FarhanaAleen
llvm-svn: 333380
2018-05-28 18:15:11 +00:00
David Green aee7ad0cde Revert 333358 as it's failing on some builders.
I'm guessing the tests reply on the ARM backend being built.

llvm-svn: 333359
2018-05-27 12:54:33 +00:00
David Green 3034281b43 [UnrollAndJam] Add a new Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953

llvm-svn: 333358
2018-05-27 12:11:21 +00:00
Florian Hahn 718af2f817 Revert r333268: [IPSCCP] Use PredicateInfo to propagate facts from...
Reverting this to see if this is causing the failures of the
clang-with-thin-lto-ubuntu bot.

[IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.

This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

Differential Revision: https://reviews.llvm.org/D45330

llvm-svn: 333323
2018-05-25 23:32:02 +00:00
Guozhi Wei 03005d3f03 [CodeGenPrepare] Revert r331783
The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further.

llvm-svn: 333305
2018-05-25 20:30:26 +00:00
Craig Topper 8f77dcadff Recommit r333226 "[ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive."
Libfuzzer tests have been fixed to prevent being optimized.

Original commit message:

If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.

This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. N

Differential Revision: https://reviews.llvm.org/D47041

llvm-svn: 333300
2018-05-25 19:18:09 +00:00
David Stenberg 05b6a53340 [MustExecute] Fix a debug invariant issue in isGuaranteedToExecute()
Summary:
Look past debug intrinsics when querying whether an instruction is the
first instruction in the header block. The commit includes a reproducer
for a case where LICM would not hoist an instruction, due to the presence
of the intrinsic.

A caveat with this commit is that the check will not work properly if
the instruction at hand is a debug intrinsic. I assume that no one
depends on isGuaranteedToExecute() to return true for debug intrinsics
for these cases (and that this might be an indication of another debug
invariant issue), so I thought that it was not worth adding that extra
bit of complexity.

Reviewers: reames, anna

Reviewed By: anna

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47197

llvm-svn: 333274
2018-05-25 13:02:59 +00:00
Florian Hahn b4a70b9f47 [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

Differential Revision: https://reviews.llvm.org/D45330

llvm-svn: 333268
2018-05-25 11:12:33 +00:00
Craig Topper 8174281b93 Revert r333226 "[ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive."
This breaks some libFuzzer tests. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/15589/steps/check-fuzzer/logs/stdio

Reverting to investigate

llvm-svn: 333253
2018-05-25 04:01:56 +00:00
Vedant Kumar 4872535eb9 [Debugify] Set a DI version module flag for llc compatibility
Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.

llvm-svn: 333237
2018-05-24 23:00:23 +00:00
Vedant Kumar b70e35686b [Debugify] Avoid printing unnecessary square braces, NFC
llvm-svn: 333236
2018-05-24 23:00:22 +00:00
Craig Topper 49f23fe349 [ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive.
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.

This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.

Differential Revision: https://reviews.llvm.org/D47041

llvm-svn: 333226
2018-05-24 21:22:51 +00:00
Warren Ristow d3efa9429f [InstCombine] Enable more reassociations using FMF 'reassoc' + 'nsz'
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set.  This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.

Differential Revision: https://reviews.llvm.org/D47335

llvm-svn: 333221
2018-05-24 20:16:43 +00:00
Jun Bum Lim dfbe6fa832 [LICM] Preserve DT and LoopInfo specifically
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).

This change should fix PR37323.

Reviewers: uabelho, davide, dberlin, Ka-Ka

Reviewed By: dberlin

Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46775

llvm-svn: 333198
2018-05-24 15:58:34 +00:00
Chad Rosier 274d72faad [InstCombine] Combine XOR and AES instructions on ARM/ARM64.
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.

Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!

llvm-svn: 333193
2018-05-24 15:26:42 +00:00
Karl-Johan Karlsson 478232d52f [NaryReassociate] Detect deleted instr with WeakVH
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.

This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.

This fixes pr37539

Reviewers: tra, meheff, grosser, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47139

llvm-svn: 333155
2018-05-24 06:09:02 +00:00
Changpeng Fang 5f9154618e StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly
Summary:
  StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.

However, we found a problem for a SubRegion which is a loop itself:

--> BB1 --> BB2 --> BB3 -->

In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.

In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.

Reviewers:
  arsenm, jlebar

Differential Revision:
  https://reviews.llvm.org/D46912

llvm-svn: 333111
2018-05-23 18:34:48 +00:00
Roman Lebedev 6b6c553bb8 [InstCombine] Fold unfolded masked merge pattern with variable mask!
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].

Now that the backend is all done, we can finally fold it!

The canonical unfolded masked merge pattern is
```(x &  m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y |  m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```

https://rise4fun.com/Alive/ndQw

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: nicholas, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D46814

llvm-svn: 333106
2018-05-23 17:47:52 +00:00
Craig Topper 3b768e8602 [InstCombine] Negate ABS/NABS patterns by swapping the select operands to remove the negation
Differential Revision: https://reviews.llvm.org/D47236

llvm-svn: 333101
2018-05-23 17:29:03 +00:00
Max Kazantsev d99f3bacb4 [LoopUnswitch] Fix SCEV invalidation in unswitching
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.

Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin

llvm-svn: 333072
2018-05-23 10:09:53 +00:00
Piotr Padlewski d6f7346a4b Fix aliasing of launder.invariant.group
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.

Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar

Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D47088

llvm-svn: 333070
2018-05-23 09:16:44 +00:00
David Bolvansky cd3eb99016 [InstCombine] [NFC] Added more tests for unlocked IO transformation
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47243

llvm-svn: 333057
2018-05-23 03:01:45 +00:00
Sanjay Patel 4b96935bd7 [InstCombine] use nsw negation for abs libcalls
Also, produce the canonical IR abs (s<0) to be more efficient. 

This is the libcall equivalent of the clang builtin change from:
rL333038

Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."

That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.

llvm-svn: 333042
2018-05-22 23:29:40 +00:00
Sanjay Patel 3ef8f858da [InstCombine] move misplaced test file and regenerate checks; NFC
llvm-svn: 333039
2018-05-22 23:15:56 +00:00
David Bolvansky 88e262bcdd Delete empty test file
Differential Revision: https://reviews.llvm.org/D47230

llvm-svn: 333031
2018-05-22 21:47:08 +00:00
David Bolvansky 1f343fa0e0 [InstCombine] Remove calloc transformations
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47218

llvm-svn: 333022
2018-05-22 20:27:36 +00:00
Sanjay Patel 9781679f0f [InstCombine] move/add tests for sub with bool op; NFC
llvm-svn: 333012
2018-05-22 18:50:06 +00:00
Florian Hahn a6e63f176c [NewGVN] Fix handling of assumes
This patch fixes two bugs:

* test1: Previously assume(a >= 5) concluded that a == 5. That's only
         valid for assume(a == 5)...
* test2: If operands were swapped, additional users were added to the
         wrong cmp operand. This resulted in an "unsettled iteration"
         assertion failure.

Patch by Nikita Popov

Differential Revision: https://reviews.llvm.org/D46974

llvm-svn: 333007
2018-05-22 17:38:22 +00:00
Sanjay Patel dd5fb8f03f [InstCombine] fix broken test
Looks like the last line got chopped off from rL332990.

llvm-svn: 332992
2018-05-22 16:14:16 +00:00
David Bolvansky 41f4b64ee1 [InstCombine] Calloc-ed strings optimizations
Summary:
Example cases:
strlen(calloc(...)) -> 0

Reviewers: efriedma, bkramer

Reviewed By: bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47059

llvm-svn: 332990
2018-05-22 15:41:23 +00:00
Karl-Johan Karlsson 11d68a619e [LowerSwitch] Fixed faulty PHI node update
Summary:
When lowerswitch merge several cases into a new default block it's not
updating the PHI nodes accordingly. The code that update the PHI nodes
for the default edge only update the first entry and do not remove the
remaining ones, to make sure the number of entries match the number of
predecessors.

This is easily fixed by replacing the code that update the PHI node with
the already existing utility function for updating PHI nodes.

Reviewers: hans, reames, arsenm

Reviewed By: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47055

llvm-svn: 332960
2018-05-22 08:46:48 +00:00
Bjorn Pettersson fecef6be9e [LoopVersioning] Don't modify the list that we iterate over in addPHINodes
Summary:
In LoopVersioning::addPHINodes we need to iterate over all
users for a value "Inst", and if the user is outside of the
VersionedLoop we should replace the use of "Inst" by using
the value "PN" instead.

Replacing the use of "Inst" for a user of "Inst" also means
that Inst->users() is modified. So it is not safe to do the
replace while iterating over Inst->users() as we used to do.
This patch splits the task into two steps. First we iterate
over Inst->users() to find all users that should be updated.
Those users are saved into a local data structure on the stack.
And then, in the second step, we do the actual updates. This
time iterating over the local data structure.

Reviewers: mzolotukhin, anemet

Reviewed By: mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47134

llvm-svn: 332958
2018-05-22 08:33:02 +00:00
Stanislav Mekhanoshin 0e132dca53 [AMDGPU] Optimze old value of v_mov_b32_dpp
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.

Differential Revision: https://reviews.llvm.org/D46596

llvm-svn: 332956
2018-05-22 08:04:33 +00:00
Matt Arsenault 1349a04ef5 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

llvm-svn: 332953
2018-05-22 06:32:10 +00:00
Sanjay Patel ec50effbd6 [InstCombine] regenerate checks; NFC
llvm-svn: 332894
2018-05-21 21:09:14 +00:00
Sanjay Patel 94b1f846b2 [InstCombine] add tests for cast-of-select; NFC
In all cases, we're pulling the cast above the select.
That's not a good canonicalization if we're creating 
a select that then mismatches the operand size of its
condition.

llvm-svn: 332883
2018-05-21 20:23:58 +00:00
Craig Topper f14e62c9a5 [EarlyCSE] Improve EarlyCSE of some absolute value cases.
Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous.

Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same.

Differential Revision: https://reviews.llvm.org/D47037

llvm-svn: 332865
2018-05-21 18:42:42 +00:00
Diego Caballero 168d04d544 [VPlan] Reland r332654 and silence unused func warning
r332654 was reverted due to an unused function warning in
release build. This commit includes the same code with the
warning silenced.

Differential Revision: https://reviews.llvm.org/D44338

llvm-svn: 332860
2018-05-21 18:14:23 +00:00
Alexey Bataev 7c9ad0db3d [InstCombine] Fix PR37526: MinMax patterns produce an infinite loop.
Summary:
This patch fixes PR37526 by simplifying the newly generated LoadInst
instructions. If the pointer address is a bitcast from the pointer to
the NewType, we can just remove this extra bitcast instead of creating
the new one. This fixes the PR37526 + may speed up the whole compilation
process.

Reviewers: spatel, RKSimon, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47144

llvm-svn: 332855
2018-05-21 17:46:34 +00:00
Nico Weber e4a12cfa2f revert r332610, it breaks cfi, see D46326
llvm-svn: 332838
2018-05-21 11:44:39 +00:00
David Green 8ceab61c75 [CVP] Require DomTree for new Pass Manager
We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. Hence it would crash if DT was not already available. This now
gets DT directly and plumbs it through to where it is used (instead of using it
through SQ).

llvm-svn: 332836
2018-05-21 11:06:28 +00:00
Craig Topper e4c045b7df [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all intrinsics.

llvm-svn: 332824
2018-05-20 23:34:04 +00:00
Sanjay Patel a003c728a5 [InstCombine] choose 1 form of abs and nabs as canonical
We already do this for min/max (see the blob above the diff), 
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR 
transforms and it's likely the best for codegen.

This might solve the motivating cases for D47037 and D47041, 
but I think those patches still make sense. We can't guarantee 
this canonicalization if the icmp has more than one use.

Differential Revision: https://reviews.llvm.org/D47076

llvm-svn: 332819
2018-05-20 14:23:23 +00:00
Max Kazantsev c0b268f90c [IRCE] Fix miscompile with range checks against negative values
In the patch rL329547, we have lifted the over-restrictive limitation on collected range
checks, allowing to work with range checks with the end of their range not being
provably non-negative. However it appeared that the non-negativity of this value was
assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based
on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative.

The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK)
and the second time with `X = IRC.getEnd()`, where we may now see the problem if
the end is actually a negative value. In this case, we may sometimes miscompile.

This patch is the conservative fix of the miscompile problem. Rather than rejecting
non-provably non-negative `getEnd()` values, we will check it for non-negativity in
runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X`
is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe
iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original
`[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked
correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative.

So we in fact prohibit execution of the main loop if at least one of range checks was
made against a negative value (and we figured it out in runtime). It is still better than
what we have before (non-negativity had to be proved in compile time) and prevents
us from miscompile, however it is sometiles too restrictive for unsigned range checks
against a negative value (which in fact can be eliminated).

Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly,
this limitation can be lifted, too.

Differential Revision: https://reviews.llvm.org/D46860
Reviewed By: samparker

llvm-svn: 332809
2018-05-19 13:06:37 +00:00
Benjamin Kramer a76b64ff80 [MergeICmps] Don't crash when memcmp is not available
Fixes clang crashing with -fno-builtin, PR37527.

llvm-svn: 332808
2018-05-19 12:51:59 +00:00
Yaxun Liu ea988f1fd9 Fix evaluator for non-zero alloca addr space
The evaluator goes through BB and creates global vars as temporary values to evaluate
results of LLVM instructions. It creates undef for alloca, however it assumes alloca
in addr space 0. If the next instruction is addrspace cast to 0, then we get an invalid
cast instruction.

This patch let the temp global var have an address space matching alloca addr space,
so that the valuation can be done.

Differential Revision: https://reviews.llvm.org/D47081

llvm-svn: 332794
2018-05-19 02:58:16 +00:00
Piotr Padlewski ce358262eb Dissallow non-empty metadata for invariant.group
Summary:
This feature is not needed, but it might be usefull in the future
to use metadata to mark what which function should support it
(and strip it when not).

Reviewers: rsmith, sanjoy, amharc, kuhar

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45419

llvm-svn: 332787
2018-05-18 23:53:46 +00:00
Piotr Padlewski a26a08cb52 Constant fold launder of null and undef
Summary:
This might be useful because clang will add
some barriers for pointer comparisons.

Reviewers: majnemer, dberlin, hfinkel, nlewycky, davide, rsmith, amharc,
kuhar

Subscribers: davide, amharc, llvm-commits

Differential Revision: https://reviews.llvm.org/D32423

llvm-svn: 332786
2018-05-18 23:52:57 +00:00
Amara Emerson 08099c7edd Delete a test that was missed in the revert r332747.
r332747 originally reverted r332654 which added this test.

llvm-svn: 332755
2018-05-18 19:21:40 +00:00
Sanjay Patel 56e09c6928 [InstCombine] add tests for lack of abs/nabs canonicalization; NFC
llvm-svn: 332726
2018-05-18 15:26:38 +00:00
Sanjay Patel fa3e4601c6 [InstCombine] regenerate checks; NFC
There were a combination of auto-generated styles in use
here because the scripts have evolved. 

llvm-svn: 332725
2018-05-18 15:22:19 +00:00
Alexander Ivchenko 5c54742da4 [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.

Comes with a clang patch (D46881).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46882

llvm-svn: 332705
2018-05-18 11:58:25 +00:00
David Stenberg 0af67e5b65 [SimplifyCFG] Fix a debug invariant bug in FoldBranchToCommonDest()
Summary:
Fix a case where FoldBranchToCommonDest() would bail out from doing CSE
when encountering a debug intrinsic. Handle that by skipping past the
debug intrinsics.

Also, as a minor refactoring, rename checkCSEInPredecessor() to
tryCSEWithPredecessor() to make it a bit more clear that the function
may remove instructions.

Reviewers: fhahn, craig.topper, dblaikie, xbolva00

Reviewed By: fhahn, xbolva00

Subscribers: vsk, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D46635

llvm-svn: 332698
2018-05-18 08:52:15 +00:00
Serguei Katkov 5095883fe9 [LICM] Extend the MustExecute scope
CanProveNotTakenFirstIteration utility does not handle the case when
condition of the branch is a constant. Add its handling.

Reviewers: reames, anna, mkazantsev
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46996

llvm-svn: 332695
2018-05-18 04:56:28 +00:00
Diego Caballero f58ad3129c [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
Patch #3 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Expected to be NFC for the current inner loop vectorization path. It
introduces the basic algorithm to build the VPlan plain CFG (single-level
CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization
path using VPInstructions. It includes:
  - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
  - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
  - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.

Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl.

Differential Revision: https://reviews.llvm.org/D44338

llvm-svn: 332654
2018-05-17 19:24:47 +00:00
Anastasis Grammenos d6c6678766 [Debugify] Print the output to stderr
Currently debugify prints it's output to stdout,
with this patch all the output generated goes to stderr.

This change lets us use debugify without taking away
the ability to pipe the output to other llvm tools.

llvm-svn: 332642
2018-05-17 18:19:58 +00:00
Craig Topper bd332588bd [InstCombine] Propagate the nsw/nuw flags from the add in the 'shifty' abs pattern to the sub in the select version.
According to alive this is valid. I'm hoping to use this to make an assumption that the sign bit is zero after this sequence. The only way it wouldn't be is if the input was INT__MIN, but by preserving the flags we can make doing this to INT_MIN UB.

The nuw flags is weird because it creates such a contradiction that the original number would have to be positive meaning we could remove the select entirely, but we don't get that far.

Differential Revision: https://reviews.llvm.org/D46988

llvm-svn: 332623
2018-05-17 16:29:52 +00:00
Dmitry Mikulin 3c6b4e35bd In thin and full LTO + CFI, direct function calls may go through jump table
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets.

Differential Revision: https://reviews.llvm.org/D46326

llvm-svn: 332610
2018-05-17 14:29:07 +00:00
Mikael Holmen 2ca16899ec Require DominatorTree when requiring/preserving LoopInfo in the old pass manager
Summary:
Require DominatorTree when requiring/preserving LoopInfo in the old pass manager

BreakCriticalEdges tries to keep LoopInfo and DominatorTree updated if they
exist. However, since commit r321653 and r321805, to update LoopInfo we
must have a DominatorTree, or we will hit an assert.

To fix this we now make a couple of passes that only required/preserved
LoopInfo also require DominatorTree.

This solves PR37334.

Reviewers: eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D46829

llvm-svn: 332583
2018-05-17 09:05:40 +00:00
Martin Storsjo c10788728b [Analysis] Only use _unlocked stdio functions on linux
The existing comment said that the functions were available only
on GNU/Linux (and on certain Android versions), but only checked
T.isGNUEnvironment() which also is true on MinGW (for arch-windows-gnu
triplets), which doesn't have such functions.

Existing checks in the initialize function in TargetLibraryInfo.cpp
also use only T.isOSLinux() to check for glibc features.

This fixes use of stdio on MinGW.

Differential Revision: https://reviews.llvm.org/D47002

llvm-svn: 332581
2018-05-17 08:16:08 +00:00
Bjorn Pettersson 81a76a388a [SROA] Handle PHI with multiple duplicate predecessors
Summary:
The verifier accepts PHI nodes with multiple entries for the
same basic block, as long as the value is the same.

As seen in PR37203, SROA did not handle such PHI nodes properly
when speculating loads over the PHI, since it inserted multiple
loads in the predecessor block and changed the PHI into having
multiple entries for the same basic block, but with different
values.

This patch teaches SROA to reuse the same speculated load for
each PHI duplicate entry in such situations.

Resolves: https://bugs.llvm.org/show_bug.cgi?id=37203

Reviewers: uabelho, chandlerc, hfinkel, bkramer, efriedma

Reviewed By: efriedma

Subscribers: dberlin, efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D46426

llvm-svn: 332577
2018-05-17 07:21:41 +00:00
Hiroshi Inoue f5c0e6c285 [SROA] pr37267: fix assertion failure in integer widening
The current integer widening does not support rewriting partial split slices in rewriteIntegerStore (and rewriteIntegerLoad).
This patch adds explicit checks for this case in isIntegerWideningViableForSlice.
Before r322533, splitting is allowed only for the whole-alloca slice and hence the above case is implicitly rejected by another check `if (DL.getTypeStoreSize(ValueTy) > Size)` because whole-alloca slice is larger than the partition.

Differential Revision: https://reviews.llvm.org/D46750

llvm-svn: 332575
2018-05-17 06:32:17 +00:00
Stanislav Mekhanoshin 595fdcf43b [AMDGPU] Move lsr test. NFC.
llvm-svn: 332562
2018-05-17 01:30:51 +00:00
Benjamin Kramer 8ac15bf4dc [InstCombine] Fix the signature of fgets_unlocked.
It returns a pointer, not an int. This miscompiles all code that uses
the return value of fgets.

llvm-svn: 332531
2018-05-16 21:45:39 +00:00
Sanjay Patel 2eb3512090 [InstCombine] allow more binop (shuffle X), C transforms
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

llvm-svn: 332479
2018-05-16 15:15:22 +00:00
David Bolvansky ca22d427b9 [SimplifyLibcalls] Replace locked IO with unlocked IO
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,

Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja

Reviewed By: rja

Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D45736

llvm-svn: 332452
2018-05-16 11:39:52 +00:00
Shoaib Meenai 074728a2a9 [ObjCARC] Prevent code motion into a catchswitch
A catchswitch must be the only non-phi instruction in its basic block;
attempting to move a retain or release into a catchswitch basic block
will result in invalid IR. Explicitly mark a CFG hazard in this case to
prevent the code motion.

Differential Revision: https://reviews.llvm.org/D46482

llvm-svn: 332430
2018-05-16 04:52:18 +00:00
Evgeny Stupachenko bff9302c3d Fix LSR compile time hang.
Summary:
Limit number of reassociations in GenerateReassociationsImpl.

Reviewers: qcolombet, mkazantsev

Differential Revision: https://reviews.llvm.org/D46039

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 332426
2018-05-16 02:48:50 +00:00
Anastasis Grammenos 66f13e0ba2 [Debugify] Fix test failing after r332416
I missed a test that needed an update.

Failing bot: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/30071

llvm-svn: 332418
2018-05-16 00:11:52 +00:00
Sanjay Patel 919882638e [InstCombine] fix binop (shuffle X), C --> shuffle (binop X, C') to check uses
llvm-svn: 332407
2018-05-15 22:00:37 +00:00
Marek Olsak 3c5fd145c5 StructurizeCFG: fix inverting conditions
Author: Samuel Pitoiset

Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.

This fixes a regression introduced by
"[PatternMatch] define m_Not using m_Xor and cst_pred_ty"

https://reviews.llvm.org/D46351

llvm-svn: 332403
2018-05-15 21:41:55 +00:00
Sanjay Patel cc0fbdb6e4 [InstCombine] add more tests for binop-shuffle; NFC
The splat pattern is part of PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

llvm-svn: 332393
2018-05-15 20:34:09 +00:00
Sanjay Patel 3c35290c58 [InstCombine] fix binop-of-shuffles to check uses
llvm-svn: 332375
2018-05-15 17:14:23 +00:00
Sanjay Patel c5b4401cf7 [InstCombine] add multi-use shuffle tests and regenerate checks; NFC
llvm-svn: 332373
2018-05-15 16:47:47 +00:00
whitequark 8f0ab258bd [MergeFunctions] Fix merging of small weak functions
When two interposable functions are merged, we cannot replace
uses and have to emit calls to a common internal function. However,
writeThunk() will not actually emit a thunk if the function is too
small. This leaves us in a broken state where mergeTwoFunctions
already rewired the functions, but writeThunk doesn't do anything.

This patch changes the implementation so that:

 * writeThunk() does just that.
 * The direct replacement of calls is moved into mergeTwoFunctions()
   into the non-interposable case only.
 * isThunkProfitable() is extracted and will be called for
   the non-iterposable case always, and in the interposable case
   only if uses are still left after replacement.

This issue has been introduced in https://reviews.llvm.org/D34806,
where the code for checking thunk profitability has been moved.

Differential Revision: https://reviews.llvm.org/D46804

Reviewed By: whitequark

llvm-svn: 332342
2018-05-15 11:31:07 +00:00
Vedant Kumar 595ba1d548 [Debugify] Add -debugify-each for testing each pass in a pipeline
This adds a -debugify-each mode to opt which, when enabled, wraps each
{Module,Function}Pass in a pipeline with logic to add, check, and strip
synthetic debug info for testing purposes.

This mode can be used to test complex pipelines for debug info bugs, or
to collect statistics about the number of debug values & locations lost
throughout various stages of a pipeline.

Patch by Son Tuan Vu!

Differential Revision: https://reviews.llvm.org/D46525

llvm-svn: 332312
2018-05-15 00:29:27 +00:00
Keno Fischer de577af8c0 [InstCombine] fix crash due to ignored addrspacecast
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.

This caused julia issue https://github.com/JuliaLang/julia/issues/27055

Patch by Jeff Bezanson <jeff@juliacomputing.com>

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D46722

llvm-svn: 332302
2018-05-14 22:05:01 +00:00
Sanjay Patel bf55e6dee1 [AggressiveInstCombine] avoid crashing on unsimplified code (PR37446)
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37446
...raises another question: why do we run aggressive-instcombine before 
regular instcombine?

llvm-svn: 332243
2018-05-14 13:43:32 +00:00
Craig Topper 911025b1cd [X86] Extend instcombine folds for pclmuldq intrinsics to the 256 and 512 bit version.
llvm-svn: 332202
2018-05-13 21:56:32 +00:00
Craig Topper f170b85d40 [X86] Add missing test for the InstCombines of pclmulqdq.
Apparently this test was lost when r293151 was committed. It was present in the review, but not the commit.

llvm-svn: 332199
2018-05-13 18:26:06 +00:00
Sergey Dmitriev 69c9cd277d [CodeExtractor] Allow extracting blocks with exception handling
This is a CodeExtractor improvement which adds support for extracting blocks
which have exception handling constructs if that is legal to do. CodeExtractor
performs validation checks to ensure that extraction is legal when it finds
invoke instructions or EH pads (landingpad, catchswitch, or cleanuppad) in
blocks to be extracted.

I have also added an option to allow extraction of blocks with alloca
instructions, but no validation is done for allocas. CodeExtractor caller has
to validate it himself before allowing alloca instructions to be extracted.
By default allocas are still not allowed in extraction blocks.

Differential Revision: https://reviews.llvm.org/D45904

llvm-svn: 332151
2018-05-11 22:49:49 +00:00
Craig Topper a17d627abb [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang.
llvm-svn: 332146
2018-05-11 21:59:34 +00:00
Artem Belevich c2cd5d5ce0 [Split GEP] handle trunc() in separate-const-offset-from-gep pass.
Let separate-const-offset-from-gep pass handle trunc() when it calculates
constant offset relative to base. The pass itself may insert trunc()
instructions when it canonicalises array indices to pointer-size integers
and needs to handle trunc() in order to evaluate the offset.

Differential Revision: https://reviews.llvm.org/D46732

llvm-svn: 332142
2018-05-11 21:13:19 +00:00
Daniel Neilson f6651d4d94 [InstCombine] Handle atomic memset in the same way as regular memset
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.

Reviewers: apilipenko, skatkov, mkazantsev, anna, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D46660

llvm-svn: 332132
2018-05-11 20:04:50 +00:00
David Bolvansky cd93c4ef1a [InstCombine] snprintf optimizations
Reviewers: spatel, efriedma, majnemer, rja, bkramer

Reviewed By: rja, bkramer

Subscribers: mstorsjo, rja, llvm-commits

Differential Revision: https://reviews.llvm.org/D46285

llvm-svn: 332110
2018-05-11 17:50:49 +00:00
Martin Storsjo 0d7c37756b [Analysis] Validate the return type of s(n)printf like libcalls
If the sprintf function is static (as on mingw-w64, where many stdio
functions are static inline wrappers), earlier optimization passes
could optimize out the return value altogether, and make it void,
which could break optimizations of this libcall that touch the
return value.

This fixes the issue discussed in PR37408 for the sprintf function.

Differential Revision: https://reviews.llvm.org/D46752

llvm-svn: 332106
2018-05-11 16:53:56 +00:00
Davide Italiano 6e1f7bf316 [Reassociate] Prevent infinite loops when processing PHIs.
Phi nodes can reside in live blocks but one of their incoming
arguments can come from a dead block. Dead blocks and reassociate
don't play nice together. In fact, reassociate performs an RPO
as a first step to avoid processing dead blocks.

The reason why Reassociate might not fixpoint when examining
dead blocks is that the following:

  %xor0 = xor i16 %xor1, undef
  %xor1 = xor i16 %xor0, undef

is perfectly valid LLVM IR (if it appears in a dead block),
so the worklist algorithm keeps pushing the two instructions for
reexamination. Note that this is not Reassociate fault, at least
not entirely. It's llvm that has a weird definition of dominance.

Fixes PR37390.

llvm-svn: 332100
2018-05-11 15:45:36 +00:00
Daniel Neilson 8f30ec65b0 [InstCombine] Unify handling of atomic memtransfer with non-atomic memtransfer
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.

In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.

Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.

Reviewers: apilipenko, skatkov, mkazantsev, anna, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D46658

llvm-svn: 332093
2018-05-11 14:30:02 +00:00
Brian Gesiak c651113439 [Coroutines] PR34897: Fix incorrect elisions
Summary:
https://bugs.llvm.org/show_bug.cgi?id=34897 demonstrates an incorrect
coroutine frame allocation elision in the coro-elide pass. The elision
is performed on the basis that the SSA variables from all llvm.coro.begin
are directly referenced in subsequent llvm.coro.destroy instructions.

However, this ignores the fact that the function may exit through paths
that do not run these destroy instructions. In the sample program from
PR34897, for example, the llvm.coro.destroy instruction is only
executed in exception handling code. When the coroutine function exits
normally, llvm.coro.destroy is not called. Eliding the allocation in
this case causes a subsequent reference to the coroutine handle from
outside of the function to access freed memory.

To fix the issue, when finding an llvm.coro.destroy for each llvm.coro.begin,
only consider llvm.coro.destroy that are executed along non-exceptional paths.

Test Plan:
1. Download the sample program from
   https://bugs.llvm.org/show_bug.cgi?id=34897, compile it with
   `clang++ -fcoroutines-ts -stdlib=libc++ -std=c++1z -O2`, and run it.
   It should print `"run1\ncheck1\nrun2\ncheck2"` and then exit
   successfully.
2. Compile https://godbolt.org/g/mCKfnr and confirm it is still
   optimized to a single instruction, 'return 1190'.
3. `check-llvm`

Reviewers: rsmith, GorNishanov, eric_niebler

Reviewed By: GorNishanov

Subscribers: andrewrk, lewissbaker, EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D43242

llvm-svn: 332077
2018-05-11 03:12:28 +00:00
Craig Topper 4b026e5ebd [InstCombine] Add tests for cases where we don't recognize type promoted rotate idioms.
These rotates take the form

(x << (n & mask)) | (x >> (-n & mask)) where mask is bitwidth - 1.

If x has been promoted to a wider type than its original bit width due to type promotion we fail to narrower it and therefore don't recognize it as a rotate.

llvm-svn: 332068
2018-05-11 00:46:09 +00:00
Wei Mi 0c2f6be662 [SampleFDO] Don't treat warm callsite with inline instance in the profile as cold
We found current sampleFDO had a performance issue when triaging a regression.
For a callsite with inline instance in the profile, even if hot callsite inliner
cannot inline it, it may still execute enough times and should not be treated as
cold in regular inliner later. However, currently if such callsite is not inlined
by hot callsite inliner, and the BB where the callsite locates doesn't get
samples from other instructions inside of it, the callsite will have no profile
metadata annotated. In regular inliner cost analysis, if the callsite has no
profile annotated and its caller has profile information, it will be treated as
cold.

The fix changes the isCallsiteHot check and chooses to compare
CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo.

Differential Revision: https://reviews.llvm.org/D45377

llvm-svn: 332058
2018-05-10 23:02:27 +00:00
Martin Storsjo 86e6742c17 Revert "[InstCombine] snprintf optimizations"
This reverts commit SVN r331889, which could trigger failed
assertions for cases where the snprintf function is declared
with a vaguely differing signature (e.g. being defined as
static inline), see PR37408.

llvm-svn: 332043
2018-05-10 21:23:36 +00:00
Sanjay Patel c7bb14301a [InstCombine] add folds for minnum(-a, -b) --> -maxnum(a, b)
This is similar to what we do for integer min/max with 'not'
ops (rL321882).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=37404
https://bugs.llvm.org/show_bug.cgi?id=37405

llvm-svn: 332031
2018-05-10 20:03:13 +00:00
Sanjay Patel b5322e385e [InstCombine] add minnum/maxnum tests (PR37404, PR37405); NFC
llvm-svn: 332025
2018-05-10 19:21:08 +00:00
Haicheng Wu 0aae2bc260 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Sanjay Patel dc4bb3fd36 [InstCombine] regenerate full checks; NFC
llvm-svn: 331998
2018-05-10 17:05:38 +00:00
Daniel Neilson 71fa1b904a [DSE] Teach the pass about partial overwrite of atomic memory intrinsics
Summary:
This change teaches DSE that the atomic memory intrinsics can be overwriten
partially in the same way as the non-atomic forms. Specifically, that the
atomic memcpy & memset can be shortened at the end and that the atomic memset
can be shortened at the beginning, if they partially overwritten
by later stores.

Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith, spatel, filcab, sanjoy

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45584

llvm-svn: 331991
2018-05-10 15:12:49 +00:00
whitequark 68403564df [PR37339] Fix assertion in FunctionComparator::cmpInlineAsm
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=37339.

InlineAsm is only uniqued if the FunctionTypes are exactly the
same, while cmpTypes() for example considers all pointer types
in the default address space to be the same. For this reason
the end of cmpInlineAsm() can be reached.

This patch replaces the unreachable assertion with a check that
the function types are not identical.

Differential Revision: https://reviews.llvm.org/D46495

Reviewers: jfb
llvm-svn: 331990
2018-05-10 15:05:47 +00:00
Sanjay Patel 6f2cf73b37 [PhaseOrdering] remove stale comments; NFC
Forgot to update this with rL331937.

llvm-svn: 331939
2018-05-09 23:10:46 +00:00
Sanjay Patel ac3951a735 [AggressiveInstCombine] convert a chain of 'and-shift' bits into masked compare
This is a follow-up to D45986. As suggested there, we should match the "all-bits-set" 
pattern in addition to "any-bits-set".

This was a little more complicated than I thought it would be initially because the 
"and 1" instruction can be anywhere in the chain. Hopefully, the code comments make 
that logic understandable, but if you see a way to simplify or improve that, it's 
most appreciated.

This transforms patterns that emerge from bitfield tests as seen in PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098

I think it would also help reduce the large test from:
D46336
D46595 
but we need something to reassociate that case to the forms we're expecting here first.

Differential Revision: https://reviews.llvm.org/D46649

llvm-svn: 331937
2018-05-09 23:08:15 +00:00
Philip Reames 79e917d117 [InstCombine] Widen guards with conditions between
The previous handling for guard widening in InstCombine was extremely restrictive. In particular, it didn't handle the common case where we had two guards separated by a single icmp. Handle this by scanning through a small fixed window of instructions to find the next guard if needed.

Differential Revision: https://reviews.llvm.org/D46203

llvm-svn: 331935
2018-05-09 22:56:32 +00:00
Benjamin Kramer 0d2fc1a501 [InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low dividend bits that are zero in the divisor
This is safe as long as the udiv is not exact. The pattern is not common in
C++ code, but comes up all the time in code generated by XLA's GPU backend.

Differential Revision: https://reviews.llvm.org/D46647

llvm-svn: 331933
2018-05-09 22:27:34 +00:00
Farhana Aleen e24f3ff8de [AMDGPU] Support horizontal vectorization of min/max.
Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D46604

llvm-svn: 331920
2018-05-09 21:18:34 +00:00
David Bolvansky 9b5e6e8288 [InstCombine] snprintf optimizations
Reviewers: spatel, efriedma, majnemer, rja, bkramer

Reviewed By: rja, bkramer

Subscribers: rja, llvm-commits

Differential Revision: https://reviews.llvm.org/D46285

llvm-svn: 331889
2018-05-09 16:09:31 +00:00
Karl-Johan Karlsson b27dd33c58 [LV] Add lit testcase for bitcast problem. NFC
llvm-svn: 331878
2018-05-09 13:34:57 +00:00
Benjamin Kramer ccb0fbe9a0 Revert "[InstCombine] snprintf optimizations"
This reverts commit r331849. It miscompiles
snprintf(buf, sizeof(buf), "%s", "any constant string); into
memcpy(buf, "%s", sizeof("any constant string"));

llvm-svn: 331866
2018-05-09 11:38:57 +00:00
Bjorn Pettersson 9f953cdd7c [MergedLoadStoreMotion] Fix a debug invariant bug in mergeStores
Summary:
MergedLoadStoreMotion::mergeStores is using some heuristics
to limit the amount of stores that it tries to sink (see
MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The
heuristic involves counting the number of instructions in
one of the basic blocks that is part of the transformation.

We now ignore dbg intrinsics when counting instruction for
the MagicCompileTimeControl heuristic. This to make sure that
the amount of stores that are sunk doesn't depend on the amount
of debug information (if -g is used or not).

Reviewers: Gerolf, davide, majnemer

Reviewed By: davide

Subscribers: dberlin, bjope, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D46600

llvm-svn: 331852
2018-05-09 06:52:12 +00:00
David Bolvansky 44a37f04b2 [InstCombine] snprintf optimizations
Reviewers: spatel, efriedma, majnemer, rja

Reviewed By: rja

Subscribers: rja, llvm-commits

Differential Revision: https://reviews.llvm.org/D46285

llvm-svn: 331849
2018-05-09 06:34:20 +00:00
Shiva Chen 2c864551df [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

llvm-svn: 331841
2018-05-09 02:40:45 +00:00
Heejin Ahn bf7716952a Support a funclet operand bundle in LowerInvoke
Summary:
The current LowerInvoke pass cannot handle invoke instructions with a
funclet bundle operand. The order of operands for an invoke instruction
is {call arguments, callee, funclet operand (if any), normal dest,
unwind dest}. The current code assumes there is no funclet operand and
incorrectly includes a funclet operand into call arguments.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46242

llvm-svn: 331832
2018-05-09 00:53:50 +00:00
Davide Italiano 48283ba3a1 [SimplifyCFG] Fix a crash when folding PHIs.
We enter MergeBlockIntoPredecessor with a block looking like this:

for.inc.us-lcssa:                                 ; preds = %cond.end
  %k.1.lcssa.ph = phi i32 [ %conv15, %cond.end ]
  %t.3.lcssa.ph = phi i32 [ %k.1.lcssa.ph, %cond.end ]
  br label %for.inc, !dbg !66

[note the first arg of the PHI being a PHI].
FoldSingleEntryPHINodes gets rid of both PHIs (calling, eraseFromParent).
But right before we call the function, we push into IncomingValues the
only argument of the PHIs, and shortly after we try to iterate over
something which has been invalidated before :(

The fix its not trying to remove PHIs which have an incoming value
coming from the same BB we're looking at.

Fixes PR37300 and rdar://problem/39910460

Differential Revision:  https://reviews.llvm.org/D46568

llvm-svn: 331824
2018-05-08 23:28:15 +00:00
Hideki Saito d722d61402 [LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block
Summary:
Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body.
This resulted in an IL verification error ---- value used before defined.


Reviewers: rengolin, fhahn, hfinkel

Reviewed By: rengolin, fhahn

Subscribers: dcaballe, Ka-Ka, llvm-commits

Differential Revision: https://reviews.llvm.org/D46302

llvm-svn: 331799
2018-05-08 18:57:34 +00:00
Guozhi Wei 1aea95a9ea [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

llvm-svn: 331783
2018-05-08 17:58:32 +00:00
Bjorn Pettersson 51cebc98f3 [LCSSA] Do not remove used PHI nodes in formLCSSAForInstructions
Summary:
In formLCSSAForInstructions we speculatively add new PHI
nodes, that sometimes ends up without having any uses. It
has been discovered that sometimes an added PHI node can
appear as being unused in one iteration of the Worklist,
although it can end up being used by a PHI node added in
a later iteration. We now check, a second time, that the
PHI node still is unused before we remove it. This avoids
an assert about "Trying to remove a phi with uses." for the
added test case.

Reviewers: davide, mzolotukhin, mattd, dberlin

Reviewed By: mzolotukhin, dberlin

Subscribers: dberlin, mzolotukhin, davide, bjope, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D46422

llvm-svn: 331741
2018-05-08 06:59:47 +00:00
Teresa Johnson 59da890c96 [NewPM] Emit inliner NoDefinition missed optimization remark
Summary: Makes this consistent with the old PM.

Reviewers: eraman

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D46526

llvm-svn: 331709
2018-05-08 01:45:46 +00:00
Dmitry Mikulin 738bac77c1 Remove explicit setting of the CFI jumptable section name, it does not appear
to be needed: jump table sections are created with .cfi.jumptable suffix. With
this change each jump table is placed in a separate section, which allows the
linker to re-order them.

Differential Revision: https://reviews.llvm.org/D46537

llvm-svn: 331680
2018-05-07 21:30:15 +00:00
Roman Lebedev ffec58bce6 [InstCombine][NFC] Add tests for one more masked merge pattern.
This pattern came up in D46494.
I'm pretty sure we want to canonicalize it from
	(x | ~m) & (y &  m)
to
	(x &  m) | (y & ~m)

https://rise4fun.com/Alive/TEM

llvm-svn: 331625
2018-05-07 09:42:45 +00:00
Piotr Padlewski e9832dfdf3 [CaptureTracking] Handle capturing of launder.invariant.group
Summary:
launder.invariant.group has the same rules of capturing as
bitcast, gep, etc - the original value is not captured
if the returned pointer is not captured.

With this patch, we mark 40% more functions as noalias when compiling with -fstrict-vtable-pointers;
1078 vs 1778  (39.37%)

Reviewers: sanjoy, davide, nlewycky, majnemer, mehdi_amini

Subscribers: JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D32673

llvm-svn: 331587
2018-05-05 10:23:27 +00:00
Philip Reames 5b39acd111 [LICM] Compute a must execute property for the prefix of the header as we go
Computing this property within the existing walk ensures that the cost is linear with the size of the block. If we did this from within isGuaranteedToExecute, it would be quadratic without some very fancy caching.

This allows us to reliably catch a hoistable instruction within a header which may throw at some point *after* our hoistable instruction. It doesn't do anything for non-header cases, but given how common single block loops are, this seems very worthwhile.

llvm-svn: 331557
2018-05-04 21:35:00 +00:00
Shoaib Meenai 57fadab1cb [ObjCARC] Account for catchswitch in bitcast insertion
A catchswitch is both a pad and a terminator, meaning it must be the
only non-phi instruction in its basic block. When we're inserting a
bitcast in the incoming basic block for a phi, if that incoming block is
a catchswitch, we should go up the dominator tree to find a valid
insertion point rather than attempting to insert before the catchswitch
(which would result in invalid IR).

Differential Revision: https://reviews.llvm.org/D46412

llvm-svn: 331548
2018-05-04 19:03:11 +00:00
Craig Topper a3f39ee33d [LoopIdiomRecognize] Add a test case to show incorrect transformation of an infinite loop with side effets into a countable loop using ctlz.
We currently recognize this idiom where x is signed and thus the shift in an ashr.

int cnt = 0;
while (x) {
  x >>= 1; // arithmetic shift right
  ++cnt;
}

and turn it into (bitwidth - ctlz(x)). And if there is anything else in the loop we will create a new loop that runs that many times.

If x is initially negative, the shift result will never be 0 and thus the loop is infinite. If you put something with side effects in the loop, that side effect will now only happen bitwidth times instead of an infinite number of times.

So this transform is only safe for logical shift right (which we don't currently recognize) or if we can prove that x cannot be negative before the loop.

llvm-svn: 331493
2018-05-03 23:50:29 +00:00
Sanjay Patel e7b6654711 [InstCombine] refine select-of-constants to bitwise ops
Add logic for the special case when a cmp+select can clearly be
reduced to just a bitwise logic instruction, and remove an 
over-reaching chunk of general purpose bit magic. The primary goal 
is to remove cases where we are not improving the IR instruction 
count when doing these select transforms, and in all cases here that 
is true.

In the motivating 3-way compare tests, there are further improvements
because we can combine/propagate select values (not sure if that
belongs in instcombine, but it's there for now).

DAGCombiner has folds to turn some of these selects into bit magic,
so there should be no difference in the end result in those cases.
Not all constant combinations are handled there yet, however, so it
is possible that some targets will see more cmov/csel codegen with
this change in IR canonicalization. 

Ideally, we'll go further to *not* turn selects into multiple 
logic/math ops in instcombine, and we'll canonicalize to selects.
But we should make sure that this step does not result in regressions
first (and if it does, we should fix those in the backend).

The general direction for this change was discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105373.html
http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html

Alive proofs for the new bit magic:
https://rise4fun.com/Alive/XG7

Differential Revision: https://reviews.llvm.org/D46086

llvm-svn: 331486
2018-05-03 21:58:44 +00:00
Piotr Padlewski c77ab8ef2f perform DSE through launder.invariant.group
Summary:
Alias Analysis knows that llvm.launder.invariant.group
returns pointer that mustalias argument, but this information
wasn't used, therefor we didn't DSE through launder.invariant.group

Reviewers: chandlerc, dberlin, bogner, hfinkel, efriedma

Reviewed By: dberlin

Subscribers: amharc, llvm-commits, nlewycky, rsmith

Differential Revision: https://reviews.llvm.org/D31581

llvm-svn: 331449
2018-05-03 11:03:53 +00:00
Piotr Padlewski 5dde809404 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Craig Topper 856fd68690 [LoopIdiomRecognize] When looking for 'x & (x -1)' for popcnt, make sure the left hand side of the 'and' matches the left hand side of the 'subtract'
llvm-svn: 331437
2018-05-03 05:48:49 +00:00
Craig Topper a0cba89f86 [LoopIdiomRecognize] Add a test case showing that we transform to ctpop without fully checking the 'x & (x-1)' part.
The code fails to check that the same value is used twice. We only make sure the left hand side of the and is part of the loop recurrence. The 'x' in the subtract can be any value.

llvm-svn: 331436
2018-05-03 05:48:48 +00:00
Max Kazantsev 58fce7e54b Re-enable "[SCEV] Make computeExitLimit more simple and more powerful"
This patch was temporarily reverted because it has exposed bug 37229 on
PowerPC platform. The bug is unrelated to the patch and was just a general
bug in the optimization done for PowerPC platform only. The bug was fixed
by the patch rL331410.

This patch returns the disabled commit since the bug was fixed.

llvm-svn: 331427
2018-05-03 02:37:55 +00:00
Chandler Carruth 71c3a3fac5 [GCOV] Emit the writeout function as nested loops of global data.
Summary:
Prior to this change, LLVM would in some cases emit *massive* writeout
functions with many 10s of 1000s of function calls in straight-line
code. This is a very wasteful way to represent what are fundamentally
loops and creates a number of scalability issues. Among other things,
register allocating these calls is extremely expensive. While D46127 makes this
less severe, we'll still run into scaling issues with this eventually. If not
in the compile time, just from the code size.

Now the pass builds up global data structures modeling the inputs to
these functions, and simply loops over the data structures calling the
relevant functions with those values. This ensures that the code size is
a fixed and only data size grows with larger amounts of coverage data.

A trivial change to IRBuilder is included to make it easier to build
the constants that make up the global data.

Reviewers: wmi, echristo

Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D46357

llvm-svn: 331407
2018-05-02 22:24:39 +00:00
Daniel Sanders 8d0d1aa229 [reassociate] Fix excessive revisits when processing long chains of reassociatable instructions.
Summary:
Some of our internal testing detected a major compile time regression which I've
tracked down to:
    r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst".
It appears that processing long chains of reassociatable instructions causes
non-linear (potentially exponential) growth in the number of times an
instruction is revisited. For example, the included test revisits instructions
220 times in a 20-instruction test.

It appears that r278938 reversed the order instructions were visited and that
this is preventing scheduled revisits from being cancelled as a result of
visiting the instructions naturally during normal processing. However, simply
reversing the order also harmed the generated code. Upon closer inspection, it
was discovered that revisits occurred in the opposite order to the first pass
(Thanks to escha for spotting that).

This patch makes the revisit order consistent with the first pass which allows
more revisits to be cancelled. This does appear to have a small impact on the
generated code in few cases but it significantly reduces compile-time.

After this patch, our internal test that was most affected by the regression
dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46%
of the runtime it had before (99.54% improvement).

Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only:
| metric         | geomean before patch | geomean after patch | delta   |
| -----          | -----                | -----               | -----   |
| compile time   | 0.1956               | 0.1261              | -35.54% |
| execution time | 0.3240               | 0.3237              | -       |
| code size      | 7365.4459            | 7365.6079           | -       |

The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:
| Performance Regressions - compile_time | Δ | Previous | Current |
| MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk | 9.82% | 2.0473 | 2.2483 |

Reviewers: javed.absar, dberlin

Reviewed By: dberlin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45734

llvm-svn: 331381
2018-05-02 17:59:16 +00:00
Farhana Aleen e2dfe8a853 [AMDGPU] Support horizontal vectorization.
Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46213

llvm-svn: 331313
2018-05-01 21:41:12 +00:00
Sanjay Patel d2025a2e31 [AggressiveInstCombine] convert a chain of 'or-shift' bits into masked compare
and (or (lshr X, C), ...), 1 --> (X & C') != 0

I initially thought about implementing the minimal pattern in instcombine as mentioned here:
https://bugs.llvm.org/show_bug.cgi?id=37098#c6

...but we need to do better to catch the more general sequence from the motivating test 
(more than 2 bits in the compare). And a test-suite run with statistics showed that this 
pattern only happened 2 times currently. It would potentially happen more often if 
reassociation worked better (D45842), but it's probably still not too frequent?

This is small enough that I didn't see a need to create a whole new class/file within 
AggressiveInstCombine. There are likely other relatively small matchers like what was 
discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome). 
We could potentially also consolidate matchers for ctpop, bswap, etc under here.

Differential Revision: https://reviews.llvm.org/D45986

llvm-svn: 331311
2018-05-01 21:02:09 +00:00
Sanjay Patel f70671582d [AggressiveInstCombine] add more bitfield test patterns; NFC
Add another baseline for D45986 and a pattern that won't be
matched with that patch.

llvm-svn: 331309
2018-05-01 20:55:03 +00:00
Sanjay Patel 2b36e95d45 [PhaseOrdering] add tests for bittest patterns from bitfields; NFC
As mentioned in D45986, there's a potential ordering dependency 
between instcombine and aggressive-instcombine for detecting these,
so I'm adding a few tests to confirm that the expected folds occur
using -O3 (because aggressive-instcombine only runs at -O3 currently).

llvm-svn: 331308
2018-05-01 20:53:44 +00:00
Daniel Neilson 1091ca4640 [LV] Move test/Transforms/LoopVectorize/pr23997.ll
Summary:
This fixes a build break with r331269.

test/Transforms/LoopVectorize/pr23997.ll

should be in:

test/Transforms/LoopVectorize/X86/pr23997.ll

llvm-svn: 331281
2018-05-01 16:40:45 +00:00
Matthew Simpson 661e6a02bd [SLP] Add additional test for transposable binary operations with reuse
llvm-svn: 331274
2018-05-01 15:59:26 +00:00
Daniel Neilson 9e4bbe801a [LV] Preserve inbounds on created GEPs
Summary:
This is a fix for PR23997.

The loop vectorizer is not preserving the inbounds property of GEPs that it creates.
This is inhibiting some optimizations. This patch preserves the inbounds property in
the case where a load/store is being fed by an inbounds GEP.

Reviewers: mkuper, javed.absar, hsaito

Reviewed By: hsaito

Subscribers: dcaballe, hsaito, llvm-commits

Differential Revision: https://reviews.llvm.org/D46191

llvm-svn: 331269
2018-05-01 15:35:08 +00:00
Wei Mi eec5ba9fae Fix the issue that ComputeValueKnownInPredecessors only handles the case when
phi is on lhs of a comparison op.

For the following testcase,
L1:

  %t0 = add i32 %m, 7
  %t3 = icmp eq i32* %t2, null
  br i1 %t3, label %L3, label %L2

L2:

  %t4 = load i32, i32* %t2, align 4
  br label %L3

L3:

  %t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ]
  %t6 = icmp eq i32 %t0, %t5
  br i1 %t6, label %L4, label %L5

We know if we go through the path L1 --> L3, %t6 should always be true. However
currently, if the rhs of the eq comparison is phi, JumpThreading fails to
evaluate %t6 to true. And we know that Instcombine cannot guarantee always
canonicalizing phi to the left hand side of the comparison operation according
to the operand priority comparison mechanism in instcombine. The patch handles
the case when rhs of the comparison op is a phi.

Differential Revision: https://reviews.llvm.org/D46275

llvm-svn: 331266
2018-05-01 14:47:24 +00:00
Omer Paparo Bivas 54596e1c1a [InstCombine] new testcases for OverflowingBinaryOperators and PossiblyExactOperators transformations; NFC
instcombine should transform the relevant cases if the OverflowingBinaryOperator/PossiblyExactOperator can be proven to be safe.

Change-Id: I7aec62a31a894e465e00eb06aed80c3ea0c9dd45
llvm-svn: 331265
2018-05-01 14:27:10 +00:00
Omer Paparo Bivas 82ef8e19ef [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case
Differential Revision: https://reviews.llvm.org/D45731

Change-Id: I85d4226504e954933c41598327c91b2d08192a9d
llvm-svn: 331257
2018-05-01 12:25:46 +00:00
Sanjay Patel 1934cb1c49 [InstCombine] fix test to restore intent
This test had values that differed in only in capitalization,
and that causes problems for the auto-generating check line
script. So I changed that in rL331226, but I accidentally
forgot to change a subsequent use of a param.

llvm-svn: 331228
2018-04-30 21:28:18 +00:00
Sanjay Patel dab8515370 [InstCombine] add tests, update checks; NFC
llvm-svn: 331226
2018-04-30 21:03:36 +00:00
Roman Lebedev aa4faec114 [InstCombine] Unfold masked merge with constant mask
Summary:
As discussed in D45733, we want to do this in InstCombine.

https://rise4fun.com/Alive/LGk

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: chandlerc, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D45867

llvm-svn: 331205
2018-04-30 17:59:33 +00:00
Roman Lebedev b7ae4ef82b [InstCombine][NFC] Add tests for unfolding masked merge with constant mask
Summary: As discussed in D45733, we want to do this in InstCombine.

Differential Revision: https://reviews.llvm.org/D45866

llvm-svn: 331204
2018-04-30 17:59:26 +00:00
Davide Italiano bd3bf1660b [SLPVectorizer] Debug info shouldn't impact spill cost computation.
<rdar://problem/39794738>

(Also, PR32761).

Differential Revision:  https://reviews.llvm.org/D46199

llvm-svn: 331199
2018-04-30 16:57:33 +00:00
Roman Lebedev 136867931a [InstCombine] Canonicalize variable mask in masked merge
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.

https://rise4fun.com/Alive/Yol

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45664

llvm-svn: 331112
2018-04-28 15:45:07 +00:00
Roman Lebedev 6b1e66b188 [InstCombine][NFC] Add tests for variable mask canonicalization in masked merge
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.

Differential Revision: https://reviews.llvm.org/D45663

llvm-svn: 331111
2018-04-28 15:45:00 +00:00
Max Kazantsev 303572f3df [NFC] Add some tests that demonstrate unrecognized three-way comparison patterns
llvm-svn: 331100
2018-04-28 04:38:21 +00:00
Philip Reames 502d4481d4 [LoopGuardWidening] Make PostDomTree optional
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes.  This should help locality of access substaintially and avoids the cost of computing PostDom.

The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes.  (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)

llvm-svn: 331094
2018-04-27 23:15:56 +00:00
Adrian Prantl 210a29de7b Fix a bug in GlobalOpt's handling of DIExpressions.
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.

Thanks to Reid Kleckner for providing me a reproducer!

llvm-svn: 331086
2018-04-27 21:41:36 +00:00
Roman Lebedev 6959b8e76f [PatternMatch] Stabilize the matching order of commutative matchers
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.

This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.

This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.

The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.

The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.

Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..

The motivational unittest is included.
I'd like to use this in D45664.

Reviewers: spatel, craig.topper, arsenm, RKSimon

Reviewed By: craig.topper

Subscribers: xbolva00, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D45828

llvm-svn: 331085
2018-04-27 21:23:20 +00:00
Sanjay Patel 2677038cc0 [Reassociate] add a test with debug info; NFC
As suggested in D45842 
(although still not sure if we're going to advance that),
we must invalidate references to instructions that have
been recycled (operands were changed, so result is different).

llvm-svn: 331083
2018-04-27 21:14:15 +00:00
Philip Reames e4ec473b3f [MustExecute/LICM] Special case first instruction in throwing header
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block.  We can't cheaply determine whether a given instruction is before the first potential throw in the block.  While we're working on that in the background, special case the first instruction within the header.

why this particular special case?  Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.

In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)

Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV.  Using the full walk is not suitable for a general solution.
llvm-svn: 331079
2018-04-27 20:44:01 +00:00
Philip Reames 9258e9d190 [LoopGuardWidening] Split out a loop pass version of GuardWidening
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure.  As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction.  the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.

This patch only includes a legacy pass.  A follow up will add a new style pass as well.

llvm-svn: 331060
2018-04-27 17:29:10 +00:00
Florian Hahn f3fea0f11f [LoopInterchange] Allow some loops with PHI nodes in the exit block.
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.

This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472

Reviewers: efriedma, karthikthecool, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D43237

llvm-svn: 331037
2018-04-27 13:52:51 +00:00
Benjamin Kramer 733c7fc55d [NVPTX] Turn on Loop/SLP vectorization
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.

This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.

I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.

Differential Revision: https://reviews.llvm.org/D46130

llvm-svn: 331035
2018-04-27 13:36:05 +00:00
Matt Morehouse 1ae1febfde Revert "[SimplifyLibcalls] Replace locked IO with unlocked IO"
This reverts r331002 due to sanitizer bot breakage.

llvm-svn: 331011
2018-04-27 01:48:09 +00:00
Eli Friedman e06539456c [LowerTypeTests] Mark .cfi.jumptable nounwind.
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.

Differential Revision: https://reviews.llvm.org/D46082

llvm-svn: 331008
2018-04-27 00:32:24 +00:00
David Bolvansky 2c9cc9c731 [SimplifyLibcalls] Replace locked IO with unlocked IO
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,

Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D45736

llvm-svn: 331002
2018-04-26 22:31:43 +00:00
Roman Lebedev 33095e3610 [InstCombine][NFC] Regenerate checks in or-xor.ll
llvm-svn: 330996
2018-04-26 21:41:56 +00:00
Roman Lebedev cabaeac29c [InstCombine][NFC] Regenerate checks in and-or-not.ll
llvm-svn: 330994
2018-04-26 21:13:09 +00:00
Sanjoy Das 6f1937b10f [InstCombine] Simplify Add with remainder expressions as operands.
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1).  This is a common pattern seen in code generated by the XLA
GPU backend.

Add test cases for this new optimization.

Patch by Bixia Zheng!

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar

Differential Revision: https://reviews.llvm.org/D45976

llvm-svn: 330992
2018-04-26 20:52:28 +00:00
Sanjoy Das 0e643db48f Add test cases to prepare for the optimization that simplifies Add with
remainder expressions as operands.

Summary:
Add test cases to prepare for the new optimization that Simplifies integer add
expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1).

Patch by Bixia Zheng!

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46017

llvm-svn: 330991
2018-04-26 20:52:27 +00:00
Roman Lebedev 7cc56f1599 [InstCombine][NFC] add2.ll: add a few commutative checks.
Fixes some missing test coverage in InstCombineAddSub.cpp, visitAdd()

llvm-svn: 330986
2018-04-26 20:07:17 +00:00
Roman Lebedev 1efe879641 [InstCombine][NFC] Autogenerate checks in add2.ll
llvm-svn: 330985
2018-04-26 20:07:12 +00:00
Roman Lebedev 3d7b22621c [NFC][InstCombine] rem.ll: add a few commutative tests.
This closes a gap in missing test coverage in
isKnownToBeAPowerOfTwo() from ValueTracking.cpp

llvm-svn: 330975
2018-04-26 18:44:37 +00:00
Roman Lebedev e117e1a440 [NFC][InstCombine] Regenerate rem.ll test
llvm-svn: 330974
2018-04-26 18:44:32 +00:00
Matthew Simpson cfdec0ff70 [SLP] Add tests for transposable binary operations
These test cases are vectorizable, but we are currently unable to vectorize
them effectively.

llvm-svn: 330945
2018-04-26 14:50:04 +00:00
Alex Bradbury 130b8b3f2b [RISCV] Implement isTruncateFree
Adapted from ARM's implementation introduced in r313533 and r314280.

llvm-svn: 330940
2018-04-26 13:37:00 +00:00
Florian Hahn fd2bc11248 [LoopInterchange] Ignore debug intrinsics during legality checks.
Reviewers: aprantl, mcrosier, karthikthecool

Reviewed By: aprantl

Subscribers: mattd, vsk, #debug-info, llvm-commits

Differential Revision: https://reviews.llvm.org/D45379

llvm-svn: 330931
2018-04-26 10:26:17 +00:00
Max Kazantsev 2c287ec9c5 Revert "[SCEV] Make computeExitLimit more simple and more powerful"
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.

This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.

llvm-svn: 330893
2018-04-26 02:07:40 +00:00
David Bolvansky cb8ca5f37c [SimplifyLibcalls] Atoi, strtol replacements
Reviewers: spatel, lebedev.ri, xbolva00, efriedma

Reviewed By: xbolva00, efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D45418

llvm-svn: 330860
2018-04-25 18:58:53 +00:00
Taewook Oh 923c216da5 [ICP] Do not attempt type matching for variable length arguments.
Summary:
When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument.

It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch.

Reviewers: mssimpso, davidxl, davide

Reviewed By: mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46026

llvm-svn: 330844
2018-04-25 17:19:21 +00:00
Sanjay Patel 0387ceb67a [InstCombine] add tests for select to logic folds; NFC
As discussed in D45862, we want these folds sometimes
because they're good improvements.
But as we can see here, the current logic doesn't
check uses and doesn't produce optimal code in all
cases. 

llvm-svn: 330837
2018-04-25 15:59:23 +00:00
Florian Hahn 1da30c659d [LoopInterchange] Use getExitBlock()/getExitingBlock instead of manual impl.
This also means we have to check if the latch is the exiting block now,
as `transform` expects the latches to be the exiting blocks too.

https://bugs.llvm.org/show_bug.cgi?id=36586

Reviewers: efriedma, davide, karthikthecool

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D45279

llvm-svn: 330806
2018-04-25 09:35:54 +00:00
Bjorn Pettersson bec2a7c4eb [DebugInfo] Invalidate debug info in ReassociatePass::RewriteExprTree
Summary:
When Reassociate is rewriting an expression tree it may
reuse old binary expression nodes, for new expressions.
Whenever an expression node is reused, but with a non-trivial
change in the result, we need to invalidate any debug info
that is associated with the node.

If for example rewriting
  x = mul a, b
  y = mul c, x
into
  x = mul c, b
  y = mul a, x
we still get the same result for 'y', but 'x' is a new expression.
All debug info referring to 'x' must be invalidated (marked as
optimized out) since we no longer calculate the expected value.

As a side-effect this patch avoid (at least some) problems where
reassociate could end up creating IR with debug-use before def.
Earlier the dbg.value nodes where left untouched in the IR, while
the reused binary nodes where sinked to just before the root node
of the rewritten expression tree. See PR27273 for more info about
such problems.

Reviewers: dblaikie, aprantl, dexonsmith

Reviewed By: aprantl

Subscribers: JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45975

llvm-svn: 330804
2018-04-25 09:23:56 +00:00
Sanjay Patel 54795bb16b [InstCombine] move tests for select with bit-test of condition; NFC
These are all but 1 of the select-of-constant tests that appear 
to be transformed within foldSelectICmpAnd() and the block above 
it predicated by decomposeBitTestICmp().

As discussed in D45862 (and can be seen in several tests here),
we probably want to stop doing those transforms because they
can increase the instruction count without benefitting other
passes or codegen.

The 1 test not included here is a urem test where the bit hackery
allows us to remove a urem. To preserve killing that urem, we 
should do some stronger known-bits analysis or pattern matching of 
'urem x, (select-of-pow2-constants)'.

llvm-svn: 330768
2018-04-24 21:06:06 +00:00
Florian Hahn 97ae30b8d6 [LoopInterchange] Add REQUIRES: asserts to test.
llvm-svn: 330748
2018-04-24 18:10:52 +00:00
Diego Caballero 60f2776b2f [LV][VPlan] Detect outer loops for explicit vectorization.
Patch #2 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

This patch introduces the basic infrastructure to detect, legality check
and process outer loops annotated with hints for explicit vectorization.
All these changes are protected under the feature flag
-enable-vplan-native-path. This should make this patch NFC for the existing
inner loop vectorizer.

Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso.

Differential Revision: https://reviews.llvm.org/D42447

llvm-svn: 330739
2018-04-24 17:04:17 +00:00
Florian Hahn ceee788947 [LoopInterchange] Make isProfitableForVectorization slightly more conservative.
After D43236, we started interchanging loops with empty dependence
matrices.  In isProfitableForVectorization, we try to determine if
interchanging makes the loop dependences more friendly to the
vectorizer. If there are no dependences, we should not interchange,
based on that heuristic.

Reviewers: efriedma, mcrosier, karthikthecool, blitz.opensource

Reviewed By: mcrosier

Differential Revision: https://reviews.llvm.org/D45208

llvm-svn: 330738
2018-04-24 16:55:32 +00:00
Sanjay Patel 510af48e5d [InstCombine] regenerate checks; NFC
The first step in fixing problems raised in D45862
is to make the problems visible. Now we can more easily
see/update cases where selects have been turned into 
multiple instructions with no apparent improvement in 
analysis or benefits for other passes (vectorization).

llvm-svn: 330731
2018-04-24 16:08:03 +00:00
Sanjay Patel f03ec65517 [InstCombine] regenerate checks; NFC
The current version of the script uses regex for params.
This could mask a bug (param values got wrongly swapped),
but it seems unlikely in practice, so let's just update
the whole file to reduce diffs when there is a meaningful
change here.

llvm-svn: 330729
2018-04-24 15:42:30 +00:00
Benjamin Kramer f85f5da3b2 [LoadStoreVectorize] Ignore interleaved invariant loads.
The memory location an invariant load is using can never be clobbered by
any store, so it's safe to move the load ahead of the store.

Differential Revision: https://reviews.llvm.org/D46011

llvm-svn: 330725
2018-04-24 15:28:47 +00:00
Chandler Carruth 43acdb35bc [PM/LoopUnswitch] Fix a bug in the loop block set formation of the new
loop unswitch.

This code incorrectly added the header to the loop block set early. As
a consequence we would incorrectly conclude that a nested loop body had
already been visited when the header of the outer loop was the preheader
of the nested loop. In retrospect, adding the header eagerly doesn't
really make sense. It seems nicer to let the cycle be formed naturally.
This will catch crazy bugs in the CFG reconstruction where we can't
correctly form the cycle earlier rather than later, and makes the rest
of the logic just fall out.

I've also added various asserts that make these issues *much* easier to
debug.

llvm-svn: 330707
2018-04-24 10:33:08 +00:00
Max Kazantsev 7790d6cbff [NFC] Use FileCheck in test
llvm-svn: 330684
2018-04-24 04:42:37 +00:00
Chandler Carruth 0ace148ca6 [PM/LoopUnswitch] Remove another over-aggressive assert.
This code path can very clearly be called in a context where we have
baselined all the cloned blocks to a particular loop and are trying to
handle nested subloops. There is no harm in this, so just relax the
assert. I've added a test case that will make sure we actually exercise
this code path.

llvm-svn: 330680
2018-04-24 03:27:00 +00:00
George Burgess IV 8e807bf3fa Reland r301880(!): "[InstSimplify] Handle selects of GEPs with 0 offset"
I was reminded today that this patch got reverted in r301885. I can no
longer reproduce the failure that caused the revert locally (...almost
one year later), and the patch applied pretty cleanly, so I guess we'll
see if the bots still get angry about it.

The original breakage was InstSimplify complaining (in "assertion
failed" form) about getting passed some crazy IR when running `ninja
check-sanitizer`. I'm unable to find traces of what, exactly, said crazy
IR was. I suppose we'll find out pretty soon if that's still the case.
:)

Original commit:

  Author: gbiv
  Date: Mon May  1 18:12:08 2017
  New Revision: 301880

  URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev
  Log:
  [InstSimplify] Handle selects of GEPs with 0 offset

  In particular (since it wouldn't fit nicely in the summary):
  (select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)

  Differential Revision: https://reviews.llvm.org/D31435

llvm-svn: 330667
2018-04-24 00:25:01 +00:00
Sanjay Patel fa8f5ad9f3 [AggressiveInstCombine] add tests for PR37098; NFC
I'm not sure if this is where we should try to fold these
patterns inspired by:
https://bugs.llvm.org/show_bug.cgi?id=37098
...if this isn't the right place, we can move the tests.

llvm-svn: 330642
2018-04-23 20:20:32 +00:00
Xin Tong 8edff27923 [CallSiteSplit] Make sure we remove nonnull if the parameter turns out to be a constant.
Summary: We do not need nonull attribute if we know an argument is going to be constant.

Reviewers: junbuml, davide, fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45608

llvm-svn: 330641
2018-04-23 20:09:08 +00:00
Bjorn Pettersson 8e484dc531 [MemCpyOpt] Skip optimizing basic blocks not reachable from entry
Summary:
Skip basic blocks not reachable from the entry node
in MemCpyOptPass::iterateOnFunction.

Code that is unreachable may have properties that do not exist
for reachable code (an instruction in a basic block can for
example be dominated by a later instruction in the same basic
block, for example if there is a single block loop).
MemCpyOptPass::processStore is only safe to use for reachable
basic blocks, since it may iterate past the basic block
beginning when used for unreachable blocks. By simply skipping
to optimize unreachable basic blocks we can avoid asserts such
as "Assertion `!NodePtr->isKnownSentinel()' failed."
in MemCpyOptPass::processStore.

The problem was detected by fuzz tests.

Reviewers: eli.friedman, dneilson, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D45889

llvm-svn: 330635
2018-04-23 19:55:04 +00:00
Daniel Neilson cc45e923c5 [DSE] Teach the pass that atomic memory intrinsics are stores.
Summary:
This change teaches DSE that the atomic memory intrinsics are stores
that can be eliminated, and can allow other stores to be eliminated.
This change specifically does not teach DSE that these intrinsics
can be partially eliminated (i.e. length reduced, and dest/src changed);
that will be handled in another change.

Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith

Reviewed By: efriedma

Subscribers: dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D45535

llvm-svn: 330629
2018-04-23 19:06:49 +00:00
Max Kazantsev 91f481665e [LoopRotate] Fix incorrect SCEV invalidation in loop rotation
LoopRotate only invalidates innermost loops while the changes that it makes may
also affert any of this parents. With patch rL329047, SCEV becomes much smarter
about calculation of exit counts for outer loops, so we cannot assume that they are
not affected.

Differential Revision: https://reviews.llvm.org/D45945

llvm-svn: 330582
2018-04-23 12:33:31 +00:00
Max Kazantsev b1137c42fa [LoopSimplify] Fix incorrect SCEV invalidation
In the function `simplifyOneLoop` we optimistically assume that changes in the
inner loop only affect this very loop and have no impact on its parents. In fact,
after rL329047 has been merged, we can now calculate exit counts for outer
loops which may depend on inner loops. Thus, we need to invalidate all parents
when we do something to a loop.

There is an evidence of incorrect behavior of `simplifyOneLoop`: when we insert
`SE->verify()` check in the end of this funciton, it fails on a bunch of existing
test, in particular:

    LLVM :: Transforms/LoopUnroll/peel-loop-not-forced.ll
    LLVM :: Transforms/LoopUnroll/peel-loop-pgo.ll
    LLVM :: Transforms/LoopUnroll/peel-loop.ll
    LLVM :: Transforms/LoopUnroll/peel-loop2.ll

Note that previously we have fixed issues of this variety, see rL328483.
This patch makes this function invalidate the outermost loop properly.

Differential Revision: https://reviews.llvm.org/D45937
Reviewed By: chandlerc

llvm-svn: 330576
2018-04-23 10:32:37 +00:00
Chandler Carruth bf7190a154 [PM/LoopUnswitch] Remove a buggy assert in the new loop unswitch.
The condition this was asserting doesn't actually hold. I've added
comments to explain why, removed the assert, and added a fun test case
reduced out of 403.gcc.

llvm-svn: 330564
2018-04-23 06:58:36 +00:00
Sanjay Patel 30be665e82 [PatternMatch] allow undef elements when matching a vector zero
This is the last step in getting constant pattern matchers to allow
undef elements in constant vectors.

I'm adding a dedicated m_ZeroInt() function and building m_Zero() from
that. In most cases, calling code can be updated to use m_ZeroInt()
directly when there's no need to match pointers, but I'm leaving that
efficiency optimization as a follow-up step because it's not always
clear when that's ok.

There are just enough icmp folds in InstSimplify that can be used for 
integer or pointer types, that we probably still want a generic m_Zero()
for those cases. Otherwise, we could eliminate it (and possibly add a
m_NullPtr() as an alias for isa<ConstantPointerNull>()).

We're conservatively returning a full zero vector (zeroinitializer) in
InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify),
but I'm not sure if that's actually necessary in all cases. We may be 
able to propagate an undef lane instead. One test where this happens is 
marked with 'TODO'.
 

llvm-svn: 330550
2018-04-22 17:07:44 +00:00
Sanjay Patel c1265ab99e [InstCombine] add vector test with undef elts; NFC
llvm-svn: 330547
2018-04-22 15:59:14 +00:00
Sanjay Patel e187cd3273 [InstSimplify, InstCombine] add vector tests with undef elts; NFC
llvm-svn: 330543
2018-04-22 14:19:37 +00:00
Sanjay Patel 5f845732ed [InstSimplify] move tests for shifts; NFC
llvm-svn: 330516
2018-04-21 16:58:00 +00:00
Sanjay Patel d0b27a1156 [InstSimplify] move/add/regenerate checks for tests; NFC
llvm-svn: 330515
2018-04-21 16:23:47 +00:00
Shoaib Meenai d64b83266b [ObjCARC] Account for funclet token in storeStrong transform
When creating a call to storeStrong in ObjCARCContract, ensure the call
gets the correct funclet token, otherwise WinEHPrepare will turn the
call (and all subsequent instructions) into unreachable.

We already have logic to do this for the ARC autorelease elision marker;
factor that out into a common function that's used for both. These are
the only two places in this transform that create call instructions.

Differential Revision: https://reviews.llvm.org/D45857

llvm-svn: 330487
2018-04-20 22:11:03 +00:00
Sean Fertile 18f17333dd [PartialInlining] Fix Crash from holding a reference to a destructed ORE.
The callback used to create an ORE for the legacy PI pass caches the allocated
object in a unique_ptr in the runOnModule function, and returns a reference to
that object. Under certian circumstances we can end up holding onto that
reference after the OREs destruction. Rather then allowing the new and legacy
passes to create ORE object in diffrent ways, create the ORE at the point of
use.

Differential Revision: https://reviews.llvm.org/D43219

llvm-svn: 330473
2018-04-20 19:56:26 +00:00
Florian Hahn 773872fd67 [NewGVN] Split OpPHI detection and creation.
It also adds a check making sure PHIs for operands are all in the same
block.

Patch by Daniel Berlin <dberlin@dberlin.org>

Reviewers: dberlin, davide

Differential Revision: https://reviews.llvm.org/D43865

llvm-svn: 330444
2018-04-20 16:37:13 +00:00
Michael Zolotukhin f79d15e432 Fix typo in a test.
llvm-svn: 330434
2018-04-20 13:51:36 +00:00
Michael Zolotukhin a2c9af0209 Revert "Revert r330403 and r330413."
Reapply the patches with a fix. Thanks Ilya and Hans for the reproducer!
This reverts commit r330416.

The issue was that removing predecessors invalidated uses that we stored
for rewrite. The fix is to finish manipulating with CFG before we select
uses for rewrite.

llvm-svn: 330431
2018-04-20 13:34:32 +00:00
Ilya Biryukov afe822bd6d Revert r330403 and r330413.
Revert r330413: "[SSAUpdaterBulk] Use SmallVector instead of DenseMap for storing rewrites."
Revert r330403 "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time."

r330403 commit seems to crash clang during our integrate while doing PGO build with the following stacktrace:
      #2 llvm::SSAUpdaterBulk::RewriteAllUses(llvm::DominatorTree*, llvm::SmallVectorImpl<llvm::PHINode*>*)
      #3 llvm::JumpThreadingPass::ThreadEdge(llvm::BasicBlock*, llvm::SmallVectorImpl<llvm::BasicBlock*> const&, llvm::BasicBlock*)
      #4 llvm::JumpThreadingPass::ProcessThreadableEdges(llvm::Value*, llvm::BasicBlock*, llvm::jumpthreading::ConstantPreference, llvm::Instruction*)
      #5 llvm::JumpThreadingPass::ProcessBlock(llvm::BasicBlock*)
The crash happens while compiling 'lib/Analysis/CallGraph.cpp'.

r3340413 is reverted due to conflicting changes.

llvm-svn: 330416
2018-04-20 10:52:54 +00:00
Roman Lebedev f6934d725b [NFC][InstCombine] Regenerate two tests that are affected by folding masked merge
llvm-svn: 330415
2018-04-20 10:49:19 +00:00
Michael Zolotukhin 79e4f7fadb Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time.
Hopefully, changing set to vector removes nondeterminism detected by
some bots, or the new assert will catch something.

This reverts commit r330180.

llvm-svn: 330403
2018-04-20 08:01:08 +00:00
Vlad Tsyrklevich 5d15230c37 Fix build failures for r330387 on buildbots that don't build the X86 target
llvm-svn: 330388
2018-04-20 02:26:12 +00:00
Vlad Tsyrklevich 230b256783 LowerTypeTests: Propagate symver directives
Summary:
This change fixes https://crbug.com/834474, a build failure caused by
LowerTypeTests not preserving .symver symbol versioning directives for
exported functions. Emit symver information to ThinLTO summary data and
then propagate symver directives for exported functions to the merged
module.

Emitting symver information to the summaries increases the size of
intermediate build artifacts for a Chromium build by less than 0.2%.

Reviewers: pcc

Reviewed By: pcc

Subscribers: tejohnson, mehdi_amini, eraman, llvm-commits, eugenis, kcc

Differential Revision: https://reviews.llvm.org/D45798

llvm-svn: 330387
2018-04-20 01:36:48 +00:00
Sanjay Patel ad8976db16 [Reassociate] add baseline tests for binop swapping; NFC
Similar to rL330086, I don't know if we want to do these 
transforms here, but we might as well have the tests
here either way to show that this pass is missing 
potential functionality (intentionally or not).

llvm-svn: 330368
2018-04-19 21:56:17 +00:00
Chandler Carruth 32e62f9c5b [PM/LoopUnswitch] Detect irreducible control flow within loops and skip unswitching non-trivial edges.
Summary:
This fixes the bug pointed out in review with non-trivial unswitching.

This also provides a basis that should make it pretty easy to finish
fleshing out a routine to scan an entire function body for irreducible
control flow, but this patch remains minimal for disabling loop
unswitch.

Reviewers: sanjoy, fedor.sergeev

Subscribers: mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45754

llvm-svn: 330357
2018-04-19 18:44:25 +00:00
Florian Hahn b789165e6b [NewGVN] Add ops as dependency if we cannot find a leader for ValueOp.
If those operands change, we might find a leader for ValueOp, which
could enable new phi-of-op creation.

This fixes a case where we missed creating a phi-of-ops node. With D43865
and this patch, bootstrapping clang/llvm works with -enable-newgvn, whereas
without it, the "value changed after iteration" assertion is triggered.

Reviewers: dberlin, davide

Reviewed By: dberlin

Differential Revision: https://reviews.llvm.org/D42180

llvm-svn: 330334
2018-04-19 15:05:47 +00:00
Roman Lebedev d536de1e7b [NFC][InstCombine] A few more tests for masked merge add/xor -> or with constant mask
llvm-svn: 330325
2018-04-19 13:02:17 +00:00
Florian Hahn 9a175bc1bc Remove file accidentally added in r330320.
llvm-svn: 330321
2018-04-19 12:09:05 +00:00
Florian Hahn 2342533e1a [IR/BasicBlockTest] Fix asan failure introduced in rL330316.
The argument has to be deleted after the module containing the function
gets deleted.

llvm-svn: 330320
2018-04-19 12:06:26 +00:00
Sanjay Patel b2ab3f28d5 [SimplifyLibcalls] Realloc(null, N) -> Malloc(N)
Patch by Dávid Bolvanský!

Differential Revision: https://reviews.llvm.org/D45413

llvm-svn: 330259
2018-04-18 14:21:31 +00:00
Sam Parker 3c19051bf0 [IRCE] Only check for NSW on equality predicates
After investigation discussed in D45439, it would seem that the nsw
flag restriction is unnecessary in most cases. So the IsInductionVar
lambda has been removed, the functionality extracted, and now only
require nsw when using eq/ne predicates.

Differential Revision: https://reviews.llvm.org/D45617

llvm-svn: 330256
2018-04-18 13:50:28 +00:00
Florian Hahn ac27758895 [LoopUnroll] Only peel if a predicate becomes known in the loop body.
If a predicate does not become known after peeling, peeling is unlikely
to be beneficial.

Reviewers: mcrosier, efriedma, mkazantsev, junbuml

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D44983

llvm-svn: 330250
2018-04-18 12:29:24 +00:00
Bjorn Pettersson bc4f19b6bd [DebugInfo] Sink related dbg users when sinking in InstCombine
Summary:
When sinking an instruction in InstCombine we now also sink
the DbgInfoIntrinsics that are using the sunken value.

Example)

When sinking the load in this input

bb.X:
  %0 = load i64, i64* %start, align 4, !dbg !31
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  br label %for.body

we now also move the dbg.value, like this

bb.X:
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  %0 = load i64, i64* %start, align 4, !dbg !31
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br label %for.body

In the past we haven't moved the dbg.value so we got

bb.X:
  tail call void @llvm.dbg.value(metadata i64 %0, ...)
  br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
  %0 = load i64, i64* %start, align 4, !dbg !31
  br label %for.body


So in the past we got a debug-use before the def of %0.
And that dbg.value was also on the path jumping to %for.end, for
which %0 never was defined.

CodeGenPrepare normally comes to rescue later (when not moving
the dbg.value), since it moves dbg.value instrinsics quite
brutally, without really analysing if it is correct to move
the intrinsic (see PR31878).
So at the moment this patch isn't expected to have much impact,
besides that it is moving the dbg.value already in opt, making
the IR look more sane directly.

This can be seen as a preparation to (hopefully) make it possible
to turn off CodeGenPrepare::placeDbgValues later as a solution
to PR31878.

I also adjusted test/DebugInfo/X86/sdagsplit-1.ll to make the
IR in the test case up-to-date with this behavior in InstCombine.

Reviewers: rnk, vsk, aprantl

Reviewed By: vsk, aprantl

Subscribers: mattd, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D45425

llvm-svn: 330243
2018-04-18 08:08:04 +00:00
Sanjay Patel aea15131db [InstCombine] peek through bitcasted vector/array pointer GEP operand
The bitcast may be interfering with other combines or vectorization 
as shown in PR16739:
https://bugs.llvm.org/show_bug.cgi?id=16739

Most pointer-related optimizations are probably able to look through 
this bitcast, but removing the bitcast shrinks the IR, so it's at
least a size savings.

Differential Revision: https://reviews.llvm.org/D44833

llvm-svn: 330237
2018-04-18 00:36:40 +00:00
Vedant Kumar b0585893cc [Mem2Reg] Create merged debug locations for inserted phis
Track the debug locations of the incoming values to newly-created phis,
and apply merged debug locations to the phis.

A merged location will be on line 0, but will have the correct scope
set. This improves crash reporting when an inlined instruction with a
merged location triggers a machine exception. A debugger will be able to
narrow down the crash to the correct inlined scope, instead of simply
pointing to the outer scope of the caller.

Taken together with a change allows generating merged line-0 locations
for  instructions which aren't calls, this results in a 0.5% increase in
the uncompressed size of the .debug_line section of a stage2+Release
build of clang (-O3 -g).

rdar://33858697

Differential Revision: https://reviews.llvm.org/D45397

llvm-svn: 330227
2018-04-17 22:03:08 +00:00
Michael Zolotukhin 21458fdc55 Revert "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again."
This reverts r330175. There are still stage3/stage4 miscompares.

llvm-svn: 330180
2018-04-17 07:31:27 +00:00
Michael Zolotukhin 3f5fd1b129 Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again.
One more, hopefully the last, bug is fixed: when forming UsesToRewrite
we should ignore phi operands coming from edges that we want to delete.

This reverts r329910.

llvm-svn: 330175
2018-04-17 04:45:22 +00:00
Craig Topper 60c7e0d587 [X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already enabled it. NFC
This makes the test similar to the arith-sub.ll and arith-mul.ll tests.

llvm-svn: 330144
2018-04-16 18:14:19 +00:00
Haicheng Wu f7466f3164 [SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.

Differential Revision: https://reviews.llvm.org/D45469

llvm-svn: 330143
2018-04-16 18:09:49 +00:00
Sanjay Patel 1170daa277 [InstCombine] simplify fneg+fadd folds; NFC
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.

llvm-svn: 330126
2018-04-16 14:13:57 +00:00
Roman Lebedev f84bfb2147 [InstCombine] Simplify 'xor' to 'or' if no common bits are set.
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.

The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.

https://rise4fun.com/Alive/Pkmg

Reviewers: spatel, craig.topper, eli.friedman, jingyue

Reviewed By: spatel

Subscribers: llvm-commits

Was reviewed as part of https://reviews.llvm.org/D45631

llvm-svn: 330103
2018-04-15 18:59:44 +00:00
Roman Lebedev 620b3da38f [InstCombine] Simplify 'add' to 'or' if no common bits are set.
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.

The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.

https://rise4fun.com/Alive/Pkmg

Reviewers: spatel, craig.topper, eli.friedman, jingyue

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45631

llvm-svn: 330101
2018-04-15 18:59:33 +00:00
Warren Ristow 8b2f27ce3a [InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF
These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.

llvm-svn: 330089
2018-04-14 19:18:28 +00:00
Sanjay Patel 713f09d014 [InstCombine] add shift+logic tests (PR37098); NFC
It debateable whether instcombine should be in the business of
reassociation, but it is currently. 

These tests and PR37098 demonstrate a missing ability to do a 
simple reassociation that allows eliminating shifts. 

If we decide that functionality belongs somewhere else, then we 
should still have some tests to show that we've intentionally 
limited instcombine to *not* include this ability.

llvm-svn: 330086
2018-04-14 13:39:02 +00:00
Roman Lebedev 8db3e115e7 [InstCombine][NFC] masked-merge: add 'and' tests, too.
(and plain 'or', for completeness sake.)

After submitting D45631, i have realized that it will *already*
affect 'and' pattern, and it was obvious that there were no
good test patterns to show that.

Since the masked-merge.ll is getting kinda big,
unify naming schemes a bit, and split into 'xor'/'and'/'or'
testfiles, with the only difference being the last operation.

llvm-svn: 330072
2018-04-13 21:57:01 +00:00
Krzysztof Parzyszek dfed941eec [LV] Introduce TTI::getMinimumVF
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.

Differential Revision: https://reviews.llvm.org/D45271

llvm-svn: 330062
2018-04-13 20:16:32 +00:00
Roman Lebedev fe6a0b9a65 [InstCombine][NFC] masked-merge: commutativity tests: ensure the ordering.
This was intended since initially, but i did not really think
about it, and did not know how to force that. Now that the
xor->or fold is working (patch upcoming), this came up
to improve the test coverage.

A followup for rL330003, rL330007
https://bugs.llvm.org/show_bug.cgi?id=6773

llvm-svn: 330039
2018-04-13 17:15:55 +00:00
Roman Lebedev 4899a9cc89 [InstCombine][NFC] Regenerate logical-select.ll test
llvm-svn: 330017
2018-04-13 14:07:29 +00:00
Roman Lebedev 53e423ed1e [InstCombine][NFC] Add last few tests with constant mask for masked merge folding.
A followup for rL330003
https://bugs.llvm.org/show_bug.cgi?id=6773

llvm-svn: 330007
2018-04-13 12:00:00 +00:00
Roman Lebedev 038d996c80 [InstCombine][NFC] Add tests for masked merge folding.
https://bugs.llvm.org/show_bug.cgi?id=6773

As discussed there, some backends may want to undo this fold
(x86+bmi for scalars, x86+sse for vectors, ...)
https://bugs.llvm.org/show_bug.cgi?id=37104

https://rise4fun.com/Alive/JXt

llvm-svn: 330003
2018-04-13 10:56:35 +00:00
Roman Lebedev c00659328a [InstCombine]: foldSelectICmpAndAnd(): and is commutative
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.

I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]

This extends/generalizes that fold, so it is handled too.

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45539

llvm-svn: 330001
2018-04-13 09:57:57 +00:00
Craig Topper 254ed028a4 [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.

We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.

llvm-svn: 329990
2018-04-13 06:07:18 +00:00
Vedant Kumar 65b0d4df20 [DebugInfo] Create merged locations for instructions other than calls
This lifts a restriction on DILocation::getMergedLocation(), allowing it
to create merged locations for instructions other than calls.

Instruction::applyMergedLocation() now defaults to creating merged
locations for all instructions.

The default behavior of getMergedLocation() is unchanged: callers which
invoke it directly are unaffected.

This change will enable a follow-up Mem2Reg fix which improves crash
reporting.

Differential Revision: https://reviews.llvm.org/D45396

llvm-svn: 329955
2018-04-12 20:58:24 +00:00
Sam Parker 9737535943 [IRCE] isKnownNonNegative helper function
Created a helper function to query for non negative SCEVs. Uses the
SGE predicate to catch constants that could be interpreted as
negative.

Differential Revision: https://reviews.llvm.org/D45481

llvm-svn: 329907
2018-04-12 12:49:40 +00:00
Roman Lebedev 53271ba1d2 [InstCombine][NFC]: Add tests: foldSelectICmpAndAnd(): and is commutative
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.

I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45538

llvm-svn: 329901
2018-04-12 12:04:57 +00:00
Hiroshi Inoue bcadfee2ad [NFC] fix trivial typos in documents and comments
"is is" -> "is", "if if" -> "if", "or or" -> "or"

llvm-svn: 329878
2018-04-12 05:53:20 +00:00
George Burgess IV 48ee59b6f0 [DeadArgElim] Remove allocsize attributes on callsites
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.

Without this, I get verifier errors on `@Test2`.

It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.

llvm-svn: 329868
2018-04-12 02:06:01 +00:00
Daniel Neilson 381cdf3e07 [DSE] Add tests for atomic memory intrinsics (NFC)
Summary:
These tests show that DSE currently does nothing with the atomic memory
intrinsics. Future work will teach DSE how to simplify these.

llvm-svn: 329845
2018-04-11 19:46:02 +00:00
Daniel Neilson 9cfa786faa [DSE] Regenerate tests with update_test_checks.py (NFC)
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/OverwriteStoreBegin.ll
test/Transforms/DeadStoreElimination/OverwriteStoreEnd.ll

llvm-svn: 329839
2018-04-11 18:43:10 +00:00
Daniel Neilson 7e2e5c3c58 [DSE] Regenerate tests with update_test_checks.py (NFC)
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/simple.ll
test/Transforms/DeadStoreElimination/memintrinsics.ll

llvm-svn: 329824
2018-04-11 16:50:04 +00:00
Sanjay Patel ff98682c9c [InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse()
llvm-svn: 329821
2018-04-11 15:57:18 +00:00
Haicheng Wu 5ba379557d [SLP] update a test case. NFC.
llvm-svn: 329818
2018-04-11 15:09:49 +00:00
Artur Gainullin d928201ac5 Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max.
Bitwise 'not' of the min/max could be eliminated in the pattern:

%notx = xor i32 %x, -1
%cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y
%smax = select i1 %cmp1, i32 %notx, i32 %y
%res = xor i32 %smax, -1

https://rise4fun.com/Alive/lCN

Reviewers: spatel

Reviewed by: spatel

Subscribers: a.elovikov, llvm-commits

Differential Revision: https://reviews.llvm.org/D45317

llvm-svn: 329791
2018-04-11 10:29:37 +00:00
Marek Olsak a9a58fa236 AMDGPU: enable 128-bit for local addr space under an option
Author: Samuel Pitoiset

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

v2: - fix regressions in merge-stores.ll and multiple_tails.ll

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
2018-04-10 22:48:23 +00:00
Sanjay Patel 3b6d46761f [CVP] simplify phi with constant incoming values that match common variable edge values
This is based on an example that was recently posted on llvm-dev:

void *propagate_null(void* b, int* g) {
  if (!b) {
    return 0;
  }
  (*g)++;
  return b;
}

https://godbolt.org/g/xYk3qG

The original code or constant propagation in other passes has obscured the fact 
that the phi can be removed completely.

Differential Revision: https://reviews.llvm.org/D45448

llvm-svn: 329755
2018-04-10 20:42:39 +00:00
Michael Zolotukhin aa7868594e [SSAUpdaterBulk] Handle CFG with unreachable from entry blocks.
llvm-svn: 329660
2018-04-10 02:16:29 +00:00
Zhaoshi Zheng 43af17be41 [MemorySSAUpdater] Mark Phi users of a node being moved as non-optimize
Fix PR36484, as suggested:

<quote>
during moves, mark the direct users of the erased things that were phis as "not to be optimized"
<quote>

llvm-svn: 329621
2018-04-09 20:55:37 +00:00
Alexey Bataev 2f67dbb73e [SLP] Additional tests for reorder reuse vectorization, NFC.
llvm-svn: 329603
2018-04-09 19:02:34 +00:00
Xin Tong 0efadbbcde [MergeICmp] Split blocks that do other work.
Summary:
We do not try to move the instructions and split the block till we
know the blocks can be split, i.e. BCE-cmp-insts can be separated from
non-BCE-cmp-insts.

Reviewers: davide, courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44443

llvm-svn: 329564
2018-04-09 13:14:06 +00:00
Max Kazantsev 8624a4786a [IRCE] Relax restriction on collected range checks
In IRCE, we have a very old legacy check that works when we collect comparisons that we
treat as range checks. It ensures that the value against which the indvar is compared is
loop invariant and is also positive.

This latter condition remained there since the times when IRCE was only able to handle
signed latch comparison. As the optimization evolved, it now learned how to intersect
signed or unsigned ranges, and this logic has no reliance on the fact that the right border
of each range should be positive.

The old implementation of this non-negativity check was also naive enough and just looked
into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this
check did not allow to deal with the most simple case that looks like follows:

  int size; // not known non-negative
  int length; //known non-negative;
  i = 0;
  if (size != 0) {
    do {
      range_check(i < size);
      range_check(i < length);
    ++i;
    } while (i < size)
  }

In this case, even if from some dominating conditions IRCE could parse loop
structure, it could only remove the range check against `length` and simply
ignored the check against `size`.

In this patch we remove this obsolete check. It will allow IRCE to pick comparison
against `size` as a potential range check and then let Range Intersection logic
decide whether it is OK to eliminate it or not.

Differential Revision: https://reviews.llvm.org/D45362
Reviewed By: samparker

llvm-svn: 329547
2018-04-09 06:01:22 +00:00
Piotr Padlewski ac5abbfdd9 NFC: Update NewGVN invariant.group test
llvm-svn: 329533
2018-04-08 16:04:09 +00:00
Sanjay Patel de9f7458a4 [InstCombine] add/move tests for fsub folds; NFC
There are a pair of folds that try to merge fneg into fsub
with an intervening cast, but as shown in the FIXME tests,
they can create extra instructions.

llvm-svn: 329501
2018-04-07 14:07:58 +00:00
Roman Lebedev 41922f1a6d [InstCombine] Get rid of select of bittest (PR36950 / PR17564)
Summary:
See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45107
https://godbolt.org/g/iAYRup

Alive proof: https://rise4fun.com/Alive/uiH

Testing: `ninja check-llvm`

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45108

llvm-svn: 329492
2018-04-07 10:37:24 +00:00
Vitaly Buka 66f53d71f7 Runtime flag to control branch funnel threshold
Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45193

llvm-svn: 329459
2018-04-06 21:32:36 +00:00
Sanjay Patel a9ca709011 [InstCombine] limit nsz: -(X - Y) --> Y - X to hasOneUse()
As noted in the post-commit discussion for r329350, we shouldn't
generally assume that fsub is the same cost as fneg.

llvm-svn: 329429
2018-04-06 17:24:08 +00:00
Sanjay Patel a6823f0e67 [InstCombine] add test for fsub+fneg with extra use; NFC
llvm-svn: 329418
2018-04-06 16:30:52 +00:00
Sanjay Patel bafdf97632 [InstCombine] add potential calloc tests and regenerate checks; NFC
D45344 is proposing to remove the use restriction that made the calloc
transform safe, but it doesn't currently address the problematic example 
given inD16337. Add a test to make sure that doesn't break.

llvm-svn: 329412
2018-04-06 16:06:08 +00:00
Mircea Trofin aa3fea6cb0 [GlobalOpt] Fix support for casts in ctors.
Summary:
Fixing an issue where initializations of globals where constructors use
casts were silently translated to 0-initialization.

Reviewers: davidxl, evgeny777

Reviewed By: evgeny777

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45198

llvm-svn: 329409
2018-04-06 15:54:47 +00:00
Chad Rosier 45735b8e40 [LoopUnroll] Make LoopPeeling respect the AllowPeeling preference.
The SimpleLoopUnrollPass isn't suppose to perform loop peeling.

Differential Revision: https://reviews.llvm.org/D45334

llvm-svn: 329395
2018-04-06 13:57:21 +00:00
Hans Wennborg b230c763a4 EntryExitInstrumenter: Handle musttail calls
Inserting instrumentation between a musttail call and ret instruction
would create invalid IR. Instead, treat musttail calls as function
exits.

llvm-svn: 329385
2018-04-06 10:14:09 +00:00
Sanjay Patel 04683de82f [InstCombine] FP: Z - (X - Y) --> Z + (Y - X)
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.

Note that we already have these transforms if the ops are 
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need 
all of 'fast' for these transforms.

llvm-svn: 329362
2018-04-05 23:21:15 +00:00
Sanjay Patel 715ba65317 [InstCombine] add FP tests for Z - (X - Y); NFC
A fold for this pattern was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
...and apparently there were no tests that went with that fold.

llvm-svn: 329360
2018-04-05 22:56:54 +00:00
Sanjay Patel 03e2526728 [InstCombine] nsz: -(X - Y) --> Y - X
This restores part of the fold that was removed with rL73243 (PR4374).

llvm-svn: 329350
2018-04-05 21:37:17 +00:00
Roman Lebedev daa8da1ff4 [InstCombine][NFC] Regenerate select-of-bittest.ll with instnamer pass
As requested by spatel in https://reviews.llvm.org/D45329

llvm-svn: 329349
2018-04-05 21:34:59 +00:00
Roman Lebedev be9a226e21 [InstCombine] [NFC] Add more tests for getting rid of select of bittest (D45108, PR36950 / PR17564)
Summary:
More tests for D45108:
* One use tests
* allow shift to be a variable, too

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45329

llvm-svn: 329348
2018-04-05 21:34:53 +00:00
Daniel Neilson 367c2aea4e [InstCombine] Properly change GEP type when reassociating loop invariant GEP chains
Summary:
This is a fix to PR37005.

Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val

By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.

This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.

Reviewers: sebpop, spatel

Reviewed By: sebpop

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45287

llvm-svn: 329331
2018-04-05 18:51:45 +00:00
Sanjay Patel 37248d35c3 [InstCombine] add test for fneg+fsub with nsz; NFC
There used to be a fold that would handle this case more generally,
but it was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374

llvm-svn: 329322
2018-04-05 17:40:51 +00:00
Sanjay Patel deaf4f354e [InstCombine] use pattern matchers for fsub --> fadd folds
This allows folding for vectors with undef elements.

llvm-svn: 329316
2018-04-05 17:06:45 +00:00
Sanjay Patel 7becb3ae4b [InstCombine] add tests for fsub --> fadd; NFC
llvm-svn: 329313
2018-04-05 16:51:09 +00:00
Sanjay Patel 2204520e49 [PatternMatch] define m_FNeg using m_FSub
Using cstfp_pred_ty in the definition allows us to match vectors with undef elements.

This replicates the change for m_Not from D44076 / rL326823 and continues
towards making all pattern matchers allow undef elements in vectors.

llvm-svn: 329303
2018-04-05 15:36:55 +00:00
Sanjay Patel 2eaa2a43f8 [InstCombine] add vector and vector undef tests for FP folds; NFC
llvm-svn: 329294
2018-04-05 15:07:35 +00:00
Florian Hahn 8eda4b0dc0 [LoopInterchange] Require asserts for test using -stats (NFC)
This fixes a buildbot failure.

llvm-svn: 329279
2018-04-05 13:07:39 +00:00
Florian Hahn 6e0043365b [LoopInterchange] Add stats counter for number of interchanged loops.
Reviewers: samparker, karthikthecool, blitz.opensource

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D45209

llvm-svn: 329269
2018-04-05 10:39:23 +00:00
Florian Hahn 831a757728 [LoopInterchange] Preserve LoopInfo after interchanging.
LoopInterchange relies on LoopInfo being up-to-date, so we should
preserve it after interchanging. This patch updates restructureLoops to
move the BBs of the interchanged loops to the right place.

Reviewers: davide, efriedma, karthikthecool, mcrosier

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D45278

llvm-svn: 329264
2018-04-05 09:48:45 +00:00
Taewook Oh e0db533feb [CallSiteSplitting] Do not perform callsite splitting inside landing pad
Summary:
If the callsite is inside landing pad, do not perform callsite splitting.

Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure.

Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad.

Attached test case will crash with assertion failure without the fix.

Reviewers: fhahn, junbuml, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45130

llvm-svn: 329250
2018-04-05 04:16:23 +00:00
Vitaly Buka 4296ea72ff Don't inline @llvm.icall.branch.funnel
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.

Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45116

llvm-svn: 329235
2018-04-04 21:46:27 +00:00
Eric Fiselier 96bbec79b4 [Analysis] Support aligned new/delete functions.
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.

Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer

Reviewed By: bkramer

Subscribers: ckennelly, llvm-commits

Differential Revision: https://reviews.llvm.org/D44769

llvm-svn: 329218
2018-04-04 19:01:51 +00:00
Eric Fiselier e03d45fa8e Revert "[Analysis] Support aligned new/delete functions."
This reverts commit bee3bbd9bdd3ab3364b8fb0cdb6326bc1ae740e0.

llvm-svn: 329217
2018-04-04 18:23:00 +00:00
Eric Fiselier 0d5f3b0281 [Analysis] Support aligned new/delete functions.
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.

Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer

Reviewed By: bkramer

Subscribers: ckennelly, llvm-commits

Differential Revision: https://reviews.llvm.org/D44769

llvm-svn: 329215
2018-04-04 18:12:01 +00:00
Roman Lebedev c0c9ba7ee0 [InstCombine] [NFC] Add tests for getting rid of select of bittest (PR36950 / PR17564)
Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45108

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45107

llvm-svn: 329198
2018-04-04 14:10:13 +00:00
Simon Pilgrim f1e668830f [SLPVectorizer][X86] Regenerate some tests. NFCI
llvm-svn: 329196
2018-04-04 13:53:51 +00:00
Nicolai Haehnle eb7311ffb1 StructurizeCFG: Test for branch divergence correctly
Fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform, so the branch is
non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

As discovered after committing an earlier version of this change, this
exposes a subtle interaction between this pass and DivergenceAnalysis:
since we remove and re-create branch instructions, we can no longer rely
on DivergenceAnalysis for branches in subregions that were already
processed by the pass.

Explicitly remove branch instructions from DivergenceAnalysis to
avoid dangling pointers as a matter of defensive programming, and
change how we detect non-uniform subregions.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Differential Revision: https://reviews.llvm.org/D43743

llvm-svn: 329165
2018-04-04 10:58:15 +00:00
Max Kazantsev 613af1f7ca [SCEV] Prove implications for SCEVUnknown Phis
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.

The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.

Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna

llvm-svn: 329150
2018-04-04 05:46:47 +00:00
Craig Topper 7d3aba6687 [SimplifyCFG] Teach merge conditional stores to handle cases where the PostBB has more than 2 predecessors by inserting a new block for the store.
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.

This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.

Reviewers: efriedma, jmolloy, ABataev

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39760

llvm-svn: 329142
2018-04-04 03:47:17 +00:00
Sanjay Patel 81b3b10a95 [InstCombine] allow more fmul folds with 'reassoc'
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().

llvm-svn: 329121
2018-04-03 22:19:19 +00:00
Gor Nishanov d4712715dd [coroutines] Respect alloca alignment requirements when building coroutine frame
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.

For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.

```
%PackedStruct = type <{ i64 }>
...
  %data = alloca %PackedStruct, align 8
```

If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:

```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```

Reviewers: rnk, lewissbaker, modocache

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D45221

llvm-svn: 329112
2018-04-03 20:54:20 +00:00
Florian Hahn 9467ccf447 [LoopInterchange] Add remark for calls preventing interchanging.
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.


Reviewers: sebpop, karthikthecool, blitz.opensource

Reviewed By: sebpop

Differential Revision: https://reviews.llvm.org/D45206

llvm-svn: 329111
2018-04-03 20:54:04 +00:00
Daniel Neilson 901acfab0c [InstCombine] Fold compare of int constant against a splatted vector of ints
Summary:
Folding patterns like:
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %ext = extractelement <4 x i8> %insvec, i32 0
  %cond = icmp eq i32 %ext, 0

Combined with existing rules, this allows us to fold patterns like:
  %insvec = insertelement <4 x i8> undef, i8 %val, i32 0
  %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
  %cast = bitcast <4 x i8> %vec to i32
  %cond = icmp eq i32 %cast, 0
into:
  %cond = icmp eq i8 %val, 0

When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.

Reviewers: spatel, anna, mkazantsev

Reviewed By: spatel

Subscribers: efriedma, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D44997

llvm-svn: 329087
2018-04-03 17:26:20 +00:00
Alexey Bataev 428e9d9d87 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 329085
2018-04-03 17:14:47 +00:00
Florian Hahn b79217077d [LoopInterchange] Update tests so DA can handle access after D35430.
I have taken the opportunity to simplify some tests slightly and move
parts around.

It also brings back a few IR checks for interchangable loops.

Reviewers: karthikthecool, sebpop, grosser

Reviewed By: sebpop

Differential Revision: https://reviews.llvm.org/D45207

llvm-svn: 329081
2018-04-03 16:37:58 +00:00
Alexey Bataev 976aff148a [SLP] Added tests for checks of reordering of the repeated instructions,
NFC.

llvm-svn: 329080
2018-04-03 16:31:26 +00:00
Benjamin Kramer 2fc3b18922 Revert "[SLP] Fix PR36481: vectorize reassociated instructions."
This reverts commit r328980 and r329046. Makes the vectorizer crash.

llvm-svn: 329071
2018-04-03 14:40:33 +00:00
Ikhlas Ajbar b7322e8ac7 peel loops with runtime small trip counts
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 329042
2018-04-03 03:39:43 +00:00
Haicheng Wu 7f0daaeb86 [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

llvm-svn: 329035
2018-04-03 00:05:10 +00:00
Brian Gesiak 64521bed0d [Coroutines] Avoid assert splitting hidden coros
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.

Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.

Test Plan: check-llvm

Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk

Reviewed By: rnk

Subscribers: llvm-commits, eric_niebler

Differential Revision: https://reviews.llvm.org/D44185

llvm-svn: 329033
2018-04-02 23:39:40 +00:00
Reid Kleckner 298ffc609b [InstCombine] Don't strip function type casts from musttail calls
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.

Reviewers: efriedma, majnemer, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45186

llvm-svn: 329027
2018-04-02 22:49:44 +00:00
Reid Kleckner a9e9918ee4 Treat inlining a notail call as a regular, non-tail call
Otherwise, we end up inlining a musttail call into a non-tail position,
which breaks verifier invariants.

Fixes PR31014

llvm-svn: 329015
2018-04-02 21:23:16 +00:00
Sanjay Patel cbb0450540 [InstCombine] add folds for icmp + sub (PR36969)
(A - B) >u A --> A <u B
C <u (C - D) --> C <u D

https://rise4fun.com/Alive/e7j

Name: ugt
  %sub = sub i8 %x, %y
  %cmp = icmp ugt i8 %sub, %x
=>
  %cmp = icmp ult i8 %x, %y
  
Name: ult
  %sub = sub i8 %x, %y
  %cmp = icmp ult i8 %x, %sub
=>
  %cmp = icmp ult i8 %x, %y

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=36969

llvm-svn: 329011
2018-04-02 20:37:40 +00:00
Sanjay Patel be0442eeaa [InstCombine] add tests for icmp (sub x, y), x (PR36969); NFC
llvm-svn: 329010
2018-04-02 20:23:54 +00:00
Rong Xu 5a8d4c3357 [DeadArgumentElim] Clone function level metadatas
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.

This patch clones the metadatas in the original function to the new function.

Differential Revision: https://reviews.llvm.org/D44127

llvm-svn: 328991
2018-04-02 17:27:38 +00:00
Gor Nishanov b0316d96ae [coroutines] Add support for llvm.coro.noop intrinsics
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.

Reviewers: EricWF, modocache, rnk, lewissbaker

Reviewed By: modocache

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45114

llvm-svn: 328986
2018-04-02 16:55:12 +00:00
Alexey Bataev 3decaf4275 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

llvm-svn: 328980
2018-04-02 14:51:37 +00:00
Teresa Johnson 974706ebf7 [ThinLTO] Add an import cutoff for debugging/triaging
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no  limit).

Reviewers: wmi

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45127

llvm-svn: 328934
2018-04-01 15:54:40 +00:00
David Green f80ebc8d21 [LoopRotate] Rotate loops with loop exiting latches
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.

Differential Revision: https://reviews.llvm.org/D44199

llvm-svn: 328933
2018-04-01 12:48:24 +00:00
Teresa Johnson db83aceb06 [ThinLTO] Add an option to force summary call edges cold for debugging
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.

Reviewers: eraman

Subscribers: inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D45062

llvm-svn: 328909
2018-03-31 00:18:08 +00:00
Krzysztof Parzyszek fce30c2ba3 Revert "peel loops with runtime small trip counts"
This reverts commit r328854, it breaks some Hexagon tests.

llvm-svn: 328875
2018-03-30 16:55:44 +00:00
Ikhlas Ajbar a7d614c3e5 [Hexagon] add missing lit config file
llvm-svn: 328855
2018-03-30 03:32:24 +00:00
Ikhlas Ajbar 66c8ba5a50 peel loops with runtime small trip counts
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.

Differential Revision: https://reviews.llvm.org/D44880

llvm-svn: 328854
2018-03-30 03:05:34 +00:00
Dinar Temirbulatov c326c1c582 [SLPVectorizer] Add tests related to PR30787, NFCI.
llvm-svn: 328813
2018-03-29 18:57:03 +00:00
Haicheng Wu c7cc87922e [JumpThreading] Don't select an edge that we know we can't thread
In r312664 (D36404), JumpThreading stopped threading edges into
loop headers. Unfortunately, I observed a significant performance
regression as a result of this change. Upon further investigation,
the problematic pattern looked something like this (after
many high level optimizations):

while (true) {
    bool cond = ...;
    if (!cond) {
        <body>
    }
    if (cond)
        break;
}

Now, naturally we want jump threading to essentially eliminate the
second if check and hook up the edges appropriately. However, the
above mentioned change, prevented it from doing this because it would
have to thread an edge into the loop header.

Upon further investigation, what is happening is that since both branches
are threadable, JumpThreading picks one of them at arbitrarily. In my
case, because of the way that the IR ended up, it tended to pick
the one to the loop header, bailing out immediately after. However,
if it had picked the one to the exit block, everything would have
worked out fine (because the only remaining branch would then be folded,
not thraded which is acceptable).

Thus, to fix this problem, we can simply eliminate loop headers from
consideration as possible threading targets earlier, to make sure that
if there are multiple eligible branches, we can still thread one of
the ones that don't target a loop header.

Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42260

llvm-svn: 328798
2018-03-29 16:01:26 +00:00
Rong Xu 662f38b16f [PGO] Fix branch probability remarks assert
Fixed counter/weight overflow that leads to an assertion. Also fixed the help
string for pgo-emit-branch-prob option.

Differential Revision: https://reviews.llvm.org/D44809

llvm-svn: 328653
2018-03-27 18:55:56 +00:00
Sam Parker 90b7f4f72c [IRCE] Enable decreasing loops of non-const bound
As a follow-up to r328480, this updates the logic for the decreasing
safety checks in a similar manner:
- CanBeMax is replaced by CannotBeMaxInLoop which queries
  isLoopEntryGuardedByCond on the maximum value.
- SumCanReachMin is replaced by isSafeDecreasingBound which includes
  some logic from parseLoopStructure and, again, has been updated to
  use isLoopEntryGuardedByCond on the given bounds.

Differential Revision: https://reviews.llvm.org/D44776

llvm-svn: 328613
2018-03-27 08:24:53 +00:00
Max Kazantsev 7094c8deb2 [SCEV] Make exact taken count calculation more optimistic
Currently, `getExact` fails if it sees two exit counts in different blocks. There is
no solid reason to do so, given that we only calculate exact non-taken count
for exiting blocks that dominate latch. Using this fact, we can simply take min
out of all exits of all blocks to get the exact taken count.

This patch makes the calculation more optimistic with enforcing our assumption
with asserts. It allows us to calculate exact backedge taken count in trivial loops
like

  for (int i = 0; i < 100; i++) {
    if (i > 50) break;
    . . .
  }

Differential Revision: https://reviews.llvm.org/D44676
Reviewed By: fhahn

llvm-svn: 328611
2018-03-27 07:30:38 +00:00
Max Kazantsev a63d333881 [SCEV] Add one more case in computeConstantDifference
This patch teaches `computeConstantDifference` handle calculation of constant
difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`.

Differential Revision: https://reviews.llvm.org/D43759
Reviewed By: anna

llvm-svn: 328609
2018-03-27 04:54:00 +00:00
Eli Friedman 88e2bac94d [MemorySSA] Fix exponential compile-time updating MemorySSA.
MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for
each block, it computes the previous definition for each predecessor,
then takes those definitions and combines them. But currently it doesn't
remember results which it already computed; this means it can visit the
same block multiple times, which adds up to exponential time overall.

To fix this, this patch adds a cache. If we computed the result for a
block already, we don't need to visit it again because we'll come up
with the same result. Well, unless we RAUW a MemoryPHI; in that case,
the TrackingVH will be updated automatically.

This matches the original source paper for this algorithm.

The testcase isn't really a test for the bug, but it adds coverage for
the case where tryRemoveTrivialPhi erases an existing PHI node. (It's
hard to write a good regression test for a performance issue.)

Differential Revision: https://reviews.llvm.org/D44715

llvm-svn: 328577
2018-03-26 19:52:54 +00:00
Haicheng Wu b45f921678 [SLP] Add more checks to a test case. NFC.
llvm-svn: 328572
2018-03-26 18:59:28 +00:00
Haicheng Wu 0ec1dbe417 [SLP] Add a test case. NFC.
llvm-svn: 328546
2018-03-26 16:47:37 +00:00
Sebastian Pop d870aea03e [InstCombine] reassociate loop invariant GEP chains to enable LICM
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.

do.body:
  %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
  %idx.ext = zext i32 %cur_match.addr.0 to i64
  %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
  %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
  %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1

In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:

do.body:
  %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
  %idx.ext = zext i32 %cur_match.addr.0 to i64
  %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
  %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
  %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext

llvm-svn: 328539
2018-03-26 16:19:31 +00:00
Sanjay Patel 4fd4fd610c [InstCombine] distribute fmul over fadd/fsub
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.

We still need to loosen the FMF check.

llvm-svn: 328502
2018-03-26 15:03:57 +00:00
Sanjay Patel 2455fef497 [InstCombine] check uses before creating instructions for fmul distribution
As the tests show, we could create extra instructions without any obvious benefit.

llvm-svn: 328498
2018-03-26 14:25:43 +00:00
Max Kazantsev a55749312b [LoopUnroll] Fix dangling pointers in SCEV
Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on
fact that exit count of outer loops cannot rely on exiting blocks of
inner loops, which is true in current implementation of backedge taken count
calculation but is wrong in general. As result, when we only forget the loop that
we have just unrolled, we may still have cached data for its outer loops (in particular,
exit counts) which keeps references on blocks of inner loop that could have been
changed or even deleted.

The attached test demonstrates a situaton when after unrolling of innermost loop
the outermost loop contains a dangling pointer on non-existant block. The problem
shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV
smarter about exit count calculation. I am not sure if the bug exists without this patch,
it appears that now it is accidentally correct just because in practice exact backedge
taken count for outer loops with complex control flow inside is never calculated.
But when SCEV learns to do so, this problem shows up.

This patch replaces existing logic of SCEV loop invalidation with a correct one, which
happens to be invalidation of outermost loop (which also leads to invalidation of all
loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers
on removed blocks, or just outdated information that has changed after unrolling.

Differential Revision: https://reviews.llvm.org/D44818
Reviewed By: samparker

llvm-svn: 328483
2018-03-26 11:31:46 +00:00
Benjamin Kramer 8840f644b4 [DeadArgElim] Strip allocsize attributes when deleting an argument.
Since allocsize refers to the argument number it gets invalidated when
an argument is removed and the numbers shift.

llvm-svn: 328481
2018-03-26 09:44:24 +00:00
Sam Parker 53a423a417 [IRCE] Enable increasing loops of variable bounds
CanBeMin is currently used which will report true for any unknown
values, but often a check is performed outside the loop which covers
this situation:
    
for (int i = 0; i < N; ++i)
  ...
    
if (N > 0)
  for (int i = 0; i < N; ++i)
    ...
    
So I've add 'LoopGuardedAgainstMin' which reports whether N is
greater than the minimum value which then allows loop with a variable
loop count to be optimised. I've also moved the increasing bound
checking into its own function and replaced SumCanReachMax is another
isLoopEntryGuardedByCond function.

llvm-svn: 328480
2018-03-26 09:29:42 +00:00
Sanjay Patel 93e64dd9a1 [PatternMatch] allow undef elements when matching vector FP +0.0
This continues the FP constant pattern matching improvements from:
https://reviews.llvm.org/rL327627
https://reviews.llvm.org/rL327339
https://reviews.llvm.org/rL327307

Several integer constant matchers also have this ability. I'm
separating matching of integer/pointer null from FP positive zero
and renaming/commenting to make the functionality clearer.

llvm-svn: 328461
2018-03-25 21:16:33 +00:00
Sanjay Patel c84b48ec29 [InstSimplify, InstCombine] add/update tests with FP +0.0 vector with undef; NFC
llvm-svn: 328455
2018-03-25 17:48:20 +00:00
Sanjay Patel e94a85533f [InstCombine] adjust test comments; NFC
llvm-svn: 328450
2018-03-25 14:24:32 +00:00
Sanjay Patel 60d0e119c7 [InstCombine] consolidate casted icmp vector tests
We have thorough coverage of predicates and scalar types,
so we just need a sampling of vector tests to show that
things are working or not with vectors types.

llvm-svn: 328449
2018-03-25 14:19:25 +00:00
Sanjay Patel 841aac04d4 [InstCombine] peek through more icmp of FP cast + bitcast
This is an extension of rL328426 as noted in D44367. 

llvm-svn: 328448
2018-03-25 14:01:42 +00:00
Sanjay Patel 745a9c62c2 [InstCombine] peek through FP casts for sign-bit compares (PR36682)
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682
https://godbolt.org/g/LhuD9A

Equality checks are planned as a follow-up enhancement.

Differential Revision: https://reviews.llvm.org/D44367

llvm-svn: 328426
2018-03-24 15:45:02 +00:00
Sanjay Patel 6a176d527c [InstCombine] add multi-use/vector tests for intrinsic shrinking; NFC
llvm-svn: 328422
2018-03-24 14:45:41 +00:00
Fedor Sergeev 6660fd0f95 [PM][FunctionAttrs] add NoUnwind attribute inference to PostOrderFunctionAttrs pass
Summary:
This was motivated by absence of PrunEH functionality in new PM.
It was decided that a proper way to do PruneEH is to add NoUnwind inference
into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top.

This change generalizes attribute handling implemented for (a removal of)
Convergent attribute, by introducing a generic builder-like class
   AttributeInferer

It registers all the attribute inference requests, storing per-attribute
predicates into a vector, and then goes through an SCC Node, scanning all
the instructions for not breaking attribute assumptions.

The main idea is that as soon all the instructions from all the functions
of SCC Node conform to attribute assumptions then we are free to infer
the attribute as set for all the functions of SCC Node.

It handles two distinct cases of attributes:
   - those that might break due to derefinement of the function code

     for these attributes we are allowed to apply inference only if all the
     functions are "exact definitions". Example - NoUnwind.

   - those that do not care about derefinement

     for these attributes we are allowed to apply inference as soon as we see
     any function definition. Example - removal of Convergent attribute.

Also in this commit:
* Converted all the FunctionAttrs tests to use FileCheck and added new-PM
  invocations to them

* FunctionAttrs/convergent.ll test demonstrates a difference in behavior between
   new and old PM implementations. Marked with FIXME.

* PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg
  combo as intended

* some of "other" tests were updated since function-attrs now infers 'nounwind'
  even for old PM pipeline

* -disable-nounwind-inference hidden option added as a possible workaround for a supposedly
  rare case when nounwind being inferred by default presents a problem

Reviewers: chandlerc, jlebar

Reviewed By: jlebar

Subscribers: eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D44415

llvm-svn: 328377
2018-03-23 21:46:16 +00:00
Sanjay Patel 9b09fe9f66 [InstCombine] increase test coverage for intrinsic shrinking; NFC
There were no tests with vector types before this.

llvm-svn: 328371
2018-03-23 21:13:53 +00:00
Sanjay Patel cd1f3e7a78 [InstCombine] auto-generate checks; NFC
llvm-svn: 328329
2018-03-23 15:39:03 +00:00
Sanjay Patel 3547dcb3ae [InstSimplify] regenerate checks, move tests; NFC
llvm-svn: 328327
2018-03-23 15:31:31 +00:00
Sanjay Patel d189b596ae [InstCombine] regenerate test checks; NFC
llvm-svn: 328325
2018-03-23 15:19:35 +00:00
Matthew Simpson 6c289a1c74 [SLP] Stop counting cost of gather sequences with multiple uses
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.

Differential Revision: https://reviews.llvm.org/D44742

llvm-svn: 328316
2018-03-23 14:18:27 +00:00
Florian Hahn f73c3ece7f Revert r328307: [IPSCCP] Use constant range information for comparisons of parameters.
Reverted for now, due to it causing verifier failures.

llvm-svn: 328312
2018-03-23 12:49:39 +00:00
Florian Hahn b1feec087e [IPSCCP] Use constant range information for comparisons of parameters.
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 328307
2018-03-23 11:56:00 +00:00
Florian Hahn 52436a587e [LoopUnroll] Simplify induction variables after peeling too.
Loop peeling also has an impact on the induction variables, so we should
benefit from induction variable simplification after peeling too.

Reviewers: sanjoy, bogner, mzolotukhin, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D43878

llvm-svn: 328301
2018-03-23 10:38:12 +00:00
Evgeny Stupachenko 579507a53a Revert r325687 (workaround for PR36032).
Summary:
Revert r325687 workaround for PR36032 since
 a fix was committed in r326154.

Reviewers: sbaranga

Differential Revision: http://reviews.llvm.org/D44768

From: Evgeny Stupachenko <evstupac@gmail.com>
                         <evgeny.v.stupachenko@intel.com>
llvm-svn: 328257
2018-03-22 22:04:39 +00:00
Matt Morehouse 236cdaf84c [SimplifyCFG] Create attribute for fuzzing-specific optimizations.
Summary:
When building with libFuzzer, converting control flow to selects or
obscuring the original operands of CMPs reduces the effectiveness of
libFuzzer's heuristics.

This patch provides an attribute to disable or modify certain optimizations
for optimal fuzzing signal.

Provides a less aggressive alternative to https://reviews.llvm.org/D44057.

Reviewers: vitalybuka, davide, arsenm, hfinkel

Reviewed By: vitalybuka

Subscribers: junbuml, mehdi_amini, wdng, javed.absar, hiraditya, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44232

llvm-svn: 328214
2018-03-22 17:07:51 +00:00
Anna Thomas 9b1176b0ef [LoopPredication] Add profitability check based on BPI
Summary:
LoopPredication is not profitable when the loop is known to always exit
through some block other than the latch block.
A coarse grained latch check can cause loop predication to predicate the
loop, and unconditionally deoptimize.

However, without predicating the loop, the guard may never fail within the
loop during the dynamic execution because the non-latch loop termination
condition exits the loop before the latch condition causes the loop to
exit.
We teach LP about this using BranchProfileInfo pass.

Reviewers: apilipenko, skatkov, mkazantsev, reames

Reviewed by: skatkov

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44667

llvm-svn: 328210
2018-03-22 16:03:59 +00:00
Sanjay Patel 94c91b78e7 [InstCombine] add folds for xor-of-icmp signbit tests (PR36682)
This is a retry of r328119 which was reverted at r328145 because
it could crash by trying to combine icmps with different operand
types. This version has a check for that and additional tests.

Original commit message:

This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=36682

There's also a leftover improvement from the long-ago-closed:
https://bugs.llvm.org/show_bug.cgi?id=5438

https://rise4fun.com/Alive/dC1

llvm-svn: 328197
2018-03-22 14:08:16 +00:00
Sanjay Patel 1d0832d9a3 [InstCombine] move/add tests for fmul distribution; NFC
There are at least 3 problems:
1. We're distributing across large patterns, but fail to do that for the minimal patterns.
2. We're not checking uses, so we may create more instructions than we eliminate.
3. We should be able to do these transforms with less than full 'fast' fmuls.

llvm-svn: 328152
2018-03-21 21:28:19 +00:00
Reid Kleckner 762331be07 Revert r328119 "[InstCombine] add folds for xor-of-icmp signbit tests (PR36682)"
This asserts when compiling safe_numerics_unittest.cpp in Chromium with
MSan.

llvm-svn: 328145
2018-03-21 20:35:36 +00:00
Sanjay Patel e235942a1e [InstSimplify] fp_binop X, NaN --> NaN
We propagate the existing NaN value when possible.

Differential Revision: https://reviews.llvm.org/D44521

llvm-svn: 328140
2018-03-21 19:31:53 +00:00
Matthew Simpson b17fff79f0 [SLP] Add test case for a gather sequence with multiple uses
llvm-svn: 328133
2018-03-21 19:13:14 +00:00
Sanjay Patel 778032f39d [InstCombine] add folds for xor-of-icmp signbit tests (PR36682)
This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=36682

There's also a leftover improvement from the long-ago-closed:
https://bugs.llvm.org/show_bug.cgi?id=5438

https://rise4fun.com/Alive/dC1

llvm-svn: 328119
2018-03-21 17:17:13 +00:00
Sanjay Patel 3da85ae5a5 [InstCombine] move/add tests for xor-of-icmps (PR36682); NFC
llvm-svn: 328109
2018-03-21 15:54:48 +00:00
Daniel Neilson 6f1eb58e92 [MemCpyOpt] Update to new API for memory intrinsic alignment
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
MemCpyOpt pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.

We also add a few tests to fill gaps in the testing of this pass.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 328097
2018-03-21 14:14:55 +00:00
Justin Lebar 038cbc5c13 Re-re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions.
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.

Backed out for causing performance regressions.  Re-landing
because we've determined that these regressions were noise.

Original Differential Revision: https://reviews.llvm.org/D44102

llvm-svn: 328096
2018-03-21 14:08:21 +00:00
Shoaib Meenai 3f689c8632 [ObjCARC] Add funclet token to ARC marker
The inline assembly generated for the ARC autorelease elision marker
must have a funclet token if it's emitted inside a funclet, otherwise
the inline assembly (and all subsequent code in the funclet) will be
marked unreachable by WinEHPrepare.

Note that this only applies for the non-O0 case, since at O0, clang
emits the autorelease elision marker itself rather than deferring to the
backend. The fix for clang is handled in a separate change.

Differential Revision: https://reviews.llvm.org/D44641

llvm-svn: 328042
2018-03-20 20:45:41 +00:00
Xin Tong bdbd97ed9a [MergeICmp] Fix a bug in entry block shuffled to middle of the chain
Summary: Fix a bug in entry block shuffled to middle of the chain.

Reviewers: davide, courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44642

llvm-svn: 327971
2018-03-20 11:57:54 +00:00
Andrei Elovikov 8b8253fdc7 [LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast.
Summary:
It turned out to be error-prone to expect the callers to handle that - better to
leave the decision to this routine and make the required data to be explicitly
passed to the function.

This handles the case that was missed in the r322473 and fixes the assert
mentioned in PR36524.

Reviewers: dorit, mssimpso, Ayal, dcaballe

Reviewed By: dcaballe

Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits

Differential Revision: https://reviews.llvm.org/D43812

llvm-svn: 327960
2018-03-20 09:04:39 +00:00
Sanjay Patel 0ce3086777 [InstCombine] canonicalize fcmp+select to fabs
This is complicated by -0.0 and nan. This is based on the DAG patterns 
as shown in D44091. I'm hoping that we can just remove those DAG folds 
and always rely on IR canonicalization to handle the matching to fabs.

We would still need to delete the broken code from DAGCombiner to fix 
PR36600:
https://bugs.llvm.org/show_bug.cgi?id=36600

Differential Revision: https://reviews.llvm.org/D44550

llvm-svn: 327858
2018-03-19 15:14:30 +00:00
Serguei Katkov 529f42331e [SCEV] Re-land: Fix isKnownPredicate
This is re-land of https://reviews.llvm.org/rL327362 with a fix
and regression test.

The crash was due to it is possible that for found MDL loop,
LHS or RHS may contain an invariant unknown SCEV which
does not dominate the MDL. Please see regression
test for an example.

Reviewers: sanjoy, mkazantsev, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44553

llvm-svn: 327822
2018-03-19 06:35:30 +00:00
Anastasis Grammenos 3a589103a4 [LICM] Salvage DI from dying Instructions
LICM deletes trivially dead instructions which it won't attempt to sink.
Attempt to salvage debug values which reference these instructions.

llvm-svn: 327800
2018-03-18 15:59:19 +00:00
Roman Lebedev e6da3063a5 [InstCombine] peek through unsigned FP casts for zero-equality compares (PR36682)
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

See also D44416

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44424

llvm-svn: 327799
2018-03-18 15:53:02 +00:00
Sanjay Patel 63b1028953 [InstCombine] add nnan requirement for sqrt(x) * sqrt(y) -> sqrt(x*y)
This is similar to D43765.

llvm-svn: 327797
2018-03-18 14:32:54 +00:00
Sanjay Patel 95ec4a4dfe [InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X
As shown in the code comment, we don't need all of 'fast', 
but we do need reassoc + nsz + nnan.

Differential Revision: https://reviews.llvm.org/D43765

llvm-svn: 327796
2018-03-18 14:12:25 +00:00
Sanjay Patel 5a5c33d8b5 [InstSimplify] add NaN constant diversity; NFC
llvm-svn: 327743
2018-03-16 20:55:55 +00:00
Philip Reames 8a106272e8 [LICM/mustexec] Extend first iteration must execute logic to fcmps
This builds on the work from https://reviews.llvm.org/D44287. It turned out supporting fcmp was much easier than I realized, so let's do that now.

As an aside, our -O3 handling of a floating point IVs leaves a lot to be desired. We do convert the float IV to an integer IV, but do so late enough that many other optimizations are missed (e.g. we don't vectorize).

Differential Revision: https://reviews.llvm.org/D44542

llvm-svn: 327722
2018-03-16 16:33:49 +00:00
Sanjay Patel 2b94927f0d [InstCombine] add nnan requirement to potential fabs folds tests; NFC
As noted in D44550, we can't guarantee preserving the sign-bit of NaN 
if we convert these to fabs().

llvm-svn: 327718
2018-03-16 15:27:39 +00:00
Brian M. Rzycki f65ddc5fa2 [JumpThreading] Track unreachable BBs to avoid processing
JumpThreading iterates over F until the IR quiesces. Transforming
unreachable BBs increases compile time and it is also possible to
never stabilize causing JumpThreading to hang. An older attempt at
fixing this problem was D3991 where removeUnreachableBlocks(F)
was called before JumpThreading began. This has a few drawbacks:
 * expensive - the routine attempts to fix up the IR to identify
   additional BBs that can be removed along with unreachable BBs.
 * aggressive - does not identify and preserve the shape of the IR.
   At a minimum it does not preserve loop hierarchies.
 * invasive - altering reachable blocks it may disrupt IR shapes
   that could have otherwise been JumpThreaded.

This patch avoids removeUnreachableBlocks(F) and instead tracks
unreachable BBs in a SmallPtrSet using DominatorTree to validate the
initial state of all BBs. We then rely on subsequent passes to identify
and remove these unreachable blocks from F.

Reviewers: dberlin, sebpop, kuhar, dinesh.d

Reviewed by: sebpop, kuhar

Subscribers: hiraditya, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D44177

llvm-svn: 327713
2018-03-16 15:13:47 +00:00
Matthew Simpson eacfefd056 [AArch64] Implement getArithmeticReductionCost
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.

Differential Revision: https://reviews.llvm.org/D44490

llvm-svn: 327702
2018-03-16 11:34:15 +00:00
Sanjay Patel 728269c10d [InstCombine] add more tests for fcmp+select -> fabs; NFC
This should correspond to the patterns in D44091 and might
make handling these in the DAG unnecessary.

llvm-svn: 327689
2018-03-16 01:06:33 +00:00
Sanjay Patel 2d568ec0e4 [InstCombine] add tests for fcmp+select -> fabs; NFC
llvm-svn: 327680
2018-03-15 22:48:23 +00:00
Florian Hahn fc97b6173f [LoopUnroll] Peel off iterations if it makes conditions true/false.
If the loop body contains conditions of the form IndVar < #constant, we
can remove the checks by peeling off #constant iterations.

This improves codegen for PR34364.

Reviewers: mkuper, mkazantsev, efriedma

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D43876

llvm-svn: 327671
2018-03-15 21:34:43 +00:00
Philip Reames a21d5f1e18 [LICM] Ignore exits provably not taken on first iteration when computing must execute
It is common to have conditional exits within a loop which are known not to be taken on some iterations, but not necessarily all. This patches extends our reasoning around guaranteed to execute (used when establishing whether it's safe to dereference a location from the preheader) to handle the case where an exit is known not to be taken on the first iteration and the instruction of interest *is* known to be taken on the first iteration.

This case comes up in two major ways:
* If we have a range check which we've been unable to eliminate, we frequently know that it doesn't fail on the first iteration.
* Pass ordering. We may have a check which will be eliminated through some sequence of other passes, but depending on the exact pass sequence we might never actually do so or we might miss other optimizations from passes run before the check is finally eliminated.

The initial version (here) is implemented via InstSimplify. At the moment, it catches a few cases, but misses a lot too. I added test cases for missing cases in InstSimplify which I'll follow up on separately. Longer term, we should probably wire SCEV through to here to get much smarter loop aware simplification of the first iteration predicate.

Differential Revision: https://reviews.llvm.org/D44287

llvm-svn: 327664
2018-03-15 21:04:28 +00:00
Philip Reames 422024a1b7 [EarlyCSE] Don't hide earler invariant.scopes
If we've already established an invariant scope with an earlier generation, we don't want to hide it in the scoped hash table with one with a later generation.  I noticed this when working on the invariant-load handling, but it also applies to the invariant.start case as well.

Without this change, my previous patch for invariant-load regresses some cases, so I'm pushing this without waiting for review.  This is why you don't make last minute tweaks to patches to catch "obvious cases" after it's already been reviewed.  Bad Philip!

llvm-svn: 327655
2018-03-15 18:12:27 +00:00
Philip Reames ca587fe0b4 [EarlyCSE] Reuse invariant scopes for invariant load
This is a follow up to https://reviews.llvm.org/D43716 which rewrites the invariant load handling using the new infrastructure. It's slightly more powerful, but only in somewhat minor ways for the moment. It's not clear that DSE of stores to invariant locations is actually interesting since why would your IR have such a construct to start with?

Note: The submitted version is slightly different than the reviewed one.  I realized the scope could start for an invariant load which was proven redundant and removed.  Added a test case to illustrate that as well.

Differential Revision: https://reviews.llvm.org/D44497

llvm-svn: 327646
2018-03-15 17:29:32 +00:00
Roman Lebedev 6aca33534b [InstSimplify] peek through unsigned FP casts for sign-bit compares (PR36682)
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

See also D44421, D44424

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44425

llvm-svn: 327642
2018-03-15 16:17:46 +00:00
Matthew Simpson c1c4ad6e64 [ConstantFolding, InstSimplify] Handle more vector GEPs
This patch addresses some additional cases where the compiler crashes upon
encountering vector GEPs. This should fix PR36116.

Differential Revision: https://reviews.llvm.org/D44219
Reference: https://bugs.llvm.org/show_bug.cgi?id=36116

llvm-svn: 327638
2018-03-15 16:00:29 +00:00
Sanjay Patel 43f71eade0 [InstSimplify] add tests with NaN operand for fp binops; NFC
llvm-svn: 327631
2018-03-15 14:48:39 +00:00
Sanjay Patel a4f42f2cfd [PatternMatch, InstSimplify] allow undef elements when matching any vector FP zero
This matcher implementation appears to be slightly more efficient than 
the generic constant check that it is replacing because every use was 
for matching FP patterns, but the previous code would check int and 
pointer type nulls too. 

llvm-svn: 327627
2018-03-15 14:29:27 +00:00
Sanjay Patel 8f063d0c70 [InstSimplify] remove 'nsz' requirement for frem 0, X
From the LangRef definition for frem: 
"The value produced is the floating-point remainder of the two operands. 
This is the same output as a libm ‘fmod‘ function, but without any 
possibility of setting errno. The remainder has the same sign as the 
dividend. This instruction is assumed to execute in the default 
floating-point environment."

llvm-svn: 327626
2018-03-15 14:04:31 +00:00
Ulrich Weigand f4ceef8d3f [Debug] Retain both copies of debug intrinsics in HoistThenElseCodeToIf
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", the HoistThenElseCodeToIf routine will attempt to merge
the debug location associated with the two original copies of the hoisted
instruction.

This is a problem in the special case where the hoisted instruction is a
debug info intrinsic, since for those the debug location is considered
part of the intrinsic and attempting to modify it may resut in invalid
IR.  This is the underlying cause of PR36410.

This patch fixes the problem by handling debug info intrinsics specially:
instead of hoisting one copy and merging the two locations, the code now
simply hoists both copies, each with its original location intact.  Note
that this is still only done in the case where both original copies are
otherwise (i.e. apart from location metadata) identical.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D44312

llvm-svn: 327622
2018-03-15 12:28:48 +00:00
Fedor Sergeev 194a407bda [New PM][IRCE] port of Inductive Range Check Elimination pass to the new pass manager
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
  so the code to add new loops was factored out

* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
  within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
  sequence (e.g. LoopSimplify) might invalidate BPI results.

  Complete solution for BPI will likely take some time to discuss and figure out,
  so for now this was partially solved by making BPI optional in IRCE
  (skipping a couple of profitability checks if it is absent).

Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.

Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795

llvm-svn: 327619
2018-03-15 11:01:19 +00:00
Andrei Elovikov f9b8035f3c [LoopUnroll] Ignore ephemeral values when checking full unroll profitability.
Summary:
Before this patch call graph is like this in the LoopUnrollPass:

  tryToUnrollLoop
    ApproximateLoopSize
      collectEphemeralValues
      /* Use collected ephemeral values */
    computeUnrollCount
      analyzeLoopUnrollCost
        /* Bail out from the analysis if loop contains CallInst */

This patch moves collection of the ephemeral values to the tryToUnrollLoop
function and passes the collected values into both ApproximateLoopsize (as
before) and additionally starts using them in analyzeLoopUnrollCost:

  tryToUnrollLoop
    collectEphemeralValues
    ApproximateLoopSize(EphValues)
      /* Use EphValues */
    computeUnrollCount(EphValues)
      analyzeLoopUnrollCost(EphValues)
        /* Ignore ephemeral values - they don't contribute to the final cost */
        /* Bail out from the analysis if loop contains CallInst */

Reviewers: mzolotukhin, evstupac, sanjoy

Reviewed By: evstupac

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43931

llvm-svn: 327617
2018-03-15 09:59:15 +00:00
Sanjay Patel e4e3f79b83 [InstSimplify] add tests for frem and vectors with undef; NFC
These should all be folded. The vector tests need to have
m_AnyZero updated to ignore undef elements, but we need to
be careful not to return the existing value in that case
and unintentionally propagate undef.

llvm-svn: 327585
2018-03-14 22:45:58 +00:00
Philip Reames 0adbb19409 [EarlyCSE] Exploit open ended invariant.start scopes
If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed.

This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.)

This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time.  I took a quick look at doing so, but probably need some input from a MemorySSA expert.

Differential Revision: https://reviews.llvm.org/D43716

llvm-svn: 327577
2018-03-14 21:35:06 +00:00
Sanjay Patel 11f7f9908b [InstSimplify] fix folds for (0.0 - X) + X --> 0 (PR27151)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.

The fold was also dependent on 'ninf' but that's not necessary.

From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two 
operands with like signs) is exactly zero, the sign of that sum (or difference) 
shall be +0...However, x + x = x − (−x) retains the same sign as x even when 
x is zero."

llvm-svn: 327575
2018-03-14 21:23:27 +00:00
Sanjay Patel fd82fd000c [InstSimplify] add tests to show missing/broken fadd folds (PR27151, PR26958); NFC
llvm-svn: 327554
2018-03-14 18:52:40 +00:00
Sanjay Patel e011d7964d [InstSimplify] regenerate checks; NFC
llvm-svn: 327553
2018-03-14 18:49:57 +00:00
Roman Lebedev 60d24445dd [InstSimplify] [NFC] cast-unsigned-icmp-cmp-0.ll - don't run instcombine
As disscussed in post-commit review of D44421, there is simply
no reason to run instcombine on this testcase.

llvm-svn: 327541
2018-03-14 17:59:12 +00:00
Roman Lebedev 978aae7614 [InstSimplify] [NFC] Add tests for peeking through unsigned FP casts for sign compares (PR36682)
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast`+`icmp` can be evaluated to a boolean:
* `slt 0`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5889242}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44421

llvm-svn: 327535
2018-03-14 17:31:08 +00:00
Roman Lebedev 6ab60358ca [InstCombine] [NFC] Add tests for peeking through unsigned FP casts for zero-equality compares (PR36682)
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682
https://reviews.llvm.org/D44390
https://godbolt.org/g/oKvT5H

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
I did not check vectors, but i'm guessing it's the same there.
{F5889189}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Generated with
{F5889196}

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44416

llvm-svn: 327534
2018-03-14 17:31:03 +00:00
Hiroshi Yamauchi e6a3dc7699 Simplify more cases of logical ops of masked icmps.
Summary:
For example,

((X & 255) != 0) && ((X & 15) == 8) -> ((X & 15) == 8).
((X & 7) != 0) && ((X & 15) == 8) -> false.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43835

llvm-svn: 327450
2018-03-13 21:13:18 +00:00
Brian M. Rzycki 252165b27a [LazyValueInfo] PR33357 prevent infinite recursion on BinaryOperator
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
  %tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed.  This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33357

Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar

Reviewed by: dberlin

Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D34135

llvm-svn: 327432
2018-03-13 18:14:10 +00:00
Sanjay Patel 204edeca56 [InstCombine] fix fmul reassociation to avoid creating an extra fdiv
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for: 
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by 
SimplifyAssociativeOrCommutative().

llvm-svn: 327404
2018-03-13 14:46:32 +00:00
Daniel Neilson 41e781d5f1 [SROA] Take advantage of separate alignments for memcpy source and destination
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: chandlerc, bollu, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D42974

llvm-svn: 327398
2018-03-13 14:25:33 +00:00
Eugene Leviant 6f42a2cd91 [Evaluator] Evaluate load/store with bitcast
Differential revision: https://reviews.llvm.org/D43457

llvm-svn: 327381
2018-03-13 10:19:50 +00:00
Clement Courbet 9f0b3170bc [MergeICmps] Make sure that the comparison only has one use.
Summary: Fixes PR36557.

Reviewers: trentxintong, spatel

Subscribers: mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44083

llvm-svn: 327372
2018-03-13 07:05:55 +00:00
Serguei Katkov bbfbf21ddc Revert [SCEV] Fix isKnownPredicate
It is a revert of rL327362 which causes build bot failures with assert like

Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.

llvm-svn: 327363
2018-03-13 06:36:00 +00:00
Serguei Katkov b05574c0d3 [SCEV] Fix isKnownPredicate
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
  Collect set S all loops on which either LHS or RHS depend.
  If S is non-empty
    a. Let PD be the element of S which is dominated by all other elements of S
    b. Let E(LHS) be value of LHS on entry of PD.
       To get E(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their entry values.
       Define E(RHS) in the same way.
    c. Let B(LHS) be value of L on backedge of PD.
       To get B(LHS), we should just take LHS and replace all AddRecs that
       are attached to PD on with their backedge values.
       Define B(RHS) in the same way.
    d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
       so we can assert on that.
    e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
                      isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.

Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507

llvm-svn: 327362
2018-03-13 06:10:27 +00:00
Vlad Tsyrklevich aab6000684 Reland r327041: [ThinLTO] Keep available_externally symbols live
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.

Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.

(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)

Reviewers: pcc, tejohnson

Reviewed By: tejohnson

Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D43690

llvm-svn: 327360
2018-03-13 05:08:48 +00:00
Sanjay Patel a71f95404f [InstCombine] add test to show fmul transform creates extra fdiv; NFC
Also, move fmul reassociation tests to the same file as other fmul transforms.

llvm-svn: 327342
2018-03-12 23:10:08 +00:00
Volkan Keles 4ecdb44a64 BlockExtractor: Don’t delete functions directly
Blocks may have function calls, so don’t erase functions
directly to avoid erasing a function that has a user.

llvm-svn: 327340
2018-03-12 22:28:18 +00:00
Sanjay Patel 1dbb6b7282 [PatternMatch] enhance m_NaN() to ignore undef elements in vectors
llvm-svn: 327339
2018-03-12 22:18:47 +00:00
Saleem Abdulrasool 8b342680bf ObjCARC: teach the cloner about funclets
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction.  The new site will have a different token and needs to be
reassociated with the new instruction.

Unfortunately, there is no way to alter the bundle after the
construction of the instruction.  Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.

llvm-svn: 327336
2018-03-12 21:46:09 +00:00
Sanjay Patel 5b034c83d6 [InstSimplify] add fcmp tests for constant NaN vector with undef elt; NFC
llvm-svn: 327335
2018-03-12 21:44:17 +00:00
Sanjay Patel a0d8d127c6 [PatternMatch, InstSimplify] allow undef elements when matching vector -0.0
This is the FP equivalent of D42818. Use it for the few cases in InstSimplify 
with -0.0 folds (that's the only current use of m_NegZero()).

Differential Revision: https://reviews.llvm.org/D43792

llvm-svn: 327307
2018-03-12 18:17:01 +00:00
Roman Lebedev 92e359ed67 [InstCombine] [NFC] Add tests for peeking through FP casts for sign-bit compares (PR36682)
Summary:
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682
https://godbolt.org/g/LhuD9A

Tests for proposed fix in D44367.

Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
* `slt 0`
* `sle 0`
* `sge 0`
* `sgt 0`
* `slt 1`
* `sge 1`
* `sle -1`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5887419}

Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.

Generated with {F5887551}

Reviewers: spatel, majnemer, efriedma, arsenm

Reviewed By: spatel

Subscribers: nlopes, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D44390

llvm-svn: 327301
2018-03-12 17:43:02 +00:00