Commit Graph

8347 Commits

Author SHA1 Message Date
Sanjay Patel 6b029a5380 [DAG] fix formatting; NFC
llvm-svn: 297015
2017-03-06 15:27:57 +00:00
Sanjay Patel 5273afd4bb [DAG] fix typo in comment; NFC
llvm-svn: 297011
2017-03-06 15:07:43 +00:00
Simon Pilgrim 584d6d9d91 [SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions
Found by fuzz testing after rL296985 landed

llvm-svn: 296989
2017-03-05 15:52:18 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Craig Topper 6ffc044b2f [DAGCombine] Use APInt::operator|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.

llvm-svn: 296981
2017-03-05 01:08:16 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Florian Hahn 6406f98342 [legalize-types] Remove stale entries from SoftenedFloats.
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).

When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).

This fixes the fp128 test failures with expensive checks for X86.

I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.

Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon

Reviewed By: chh

Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D29265

llvm-svn: 296964
2017-03-04 12:00:35 +00:00
Simon Pilgrim 6dfab414db Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call
llvm-svn: 296886
2017-03-03 17:03:52 +00:00
Simon Pilgrim cf12b5e1a6 Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296879
2017-03-03 16:35:57 +00:00
Simon Pilgrim 10754abe7e Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296871
2017-03-03 14:25:46 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Sanjay Patel 7884dcb788 [DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI
llvm-svn: 296824
2017-03-02 21:56:43 +00:00
Sanjay Patel 209b0f9aad [DAG] improve documentation comments; NFC
llvm-svn: 296808
2017-03-02 20:48:08 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Sanjay Patel f7aba7ba22 fix typo in comment; NFC
llvm-svn: 296760
2017-03-02 16:37:24 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Benjamin Kramer 0e429606b0 [DAGCombiner] Remove non-ascii character and reflow comment.
llvm-svn: 296690
2017-03-01 22:10:43 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Ahmed Bougacha 20b3e9a835 [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall.
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.

If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).

We need to erase them, as we know they aren't necessary (in addition to
being incorrect).

We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.

llvm-svn: 296552
2017-03-01 00:43:42 +00:00
Sanjay Patel ea61ea9f19 [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC
llvm-svn: 296502
2017-02-28 18:41:49 +00:00
Craig Topper 419f145ebb [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 296486
2017-02-28 16:52:05 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Eugene Zelenko fa912a7151 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 296404
2017-02-27 22:45:06 +00:00
Arnold Schwaighofer b2605f31ed ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.

rdar://30536186

llvm-svn: 296395
2017-02-27 22:12:06 +00:00
Matt Arsenault 4a7cc16e89 Revert "DAG: Check if extract_vector_elt is legal or custom"
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.

llvm-svn: 296393
2017-02-27 21:59:07 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Artur Pilipenko f7196c8d9e [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets
This pattern is essentially a i16 load from p+1 address:

  %p1.i16 = bitcast i8* %p to i16*
  %p2.i8 = getelementptr i8, i8* %p, i64 2
  %v1 = load i16, i16* %p1.i16
  %v2.i8 = load i8, i8* %p2.i8
  %v2 = zext i8 %v2.i8 to i16
  %v1.shl = shl i16 %v1, 8
  %res = or i16 %v1.shl, %v2

Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.

llvm-svn: 296336
2017-02-27 13:04:23 +00:00
Artur Pilipenko c43b20a43b [DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper
This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296332
2017-02-27 11:42:54 +00:00
Artur Pilipenko f2c26e0bf2 [DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node
This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296331
2017-02-27 11:40:14 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Artyom Skrobov ac56719231 No need to copy the variable [NFC]
llvm-svn: 296259
2017-02-25 17:18:09 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Sanjay Patel 4a4fbe162f [DAG] add convenience function to get -1 constant; NFCI
llvm-svn: 296004
2017-02-23 19:02:33 +00:00
Bill Seurer 8e48f416ad [DAGCombiner] revert r295336
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.

Reverting as per the developer's request.

llvm-svn: 295849
2017-02-22 16:27:33 +00:00
Matt Arsenault f0a4823b91 DAG: Check if extract_vector_elt is legal or custom
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.

llvm-svn: 295782
2017-02-21 22:47:27 +00:00
Simon Pilgrim c0dc9a4913 Strip trailing whitespace.
llvm-svn: 295653
2017-02-20 11:56:43 +00:00
Simon Pilgrim 50b958c07a [SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes.
Thanks to Mikael Holmén for the initial test case

llvm-svn: 295652
2017-02-20 11:55:58 +00:00
Artyom Skrobov be31754094 Remove redundant call to GluedNodes.back() [NFC]
llvm-svn: 295607
2017-02-19 16:56:18 +00:00
Sanjay Patel 7f2e58972c [DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI
I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.

llvm-svn: 295463
2017-02-17 17:13:27 +00:00
Simon Pilgrim 0429c0cf8b Fix signed/unsigned comparison warning.
llvm-svn: 295453
2017-02-17 16:01:16 +00:00
Simon Pilgrim 511d788a95 [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).

This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.

The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).

Differential Revision: https://reviews.llvm.org/D29454

llvm-svn: 295451
2017-02-17 15:14:48 +00:00
Sanjay Patel 5573042035 [DAGCombiner] improve readability; NFCI
llvm-svn: 295447
2017-02-17 14:21:59 +00:00
Artur Pilipenko 85d758299e [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit -r295314 with PowerPC and AMDGPU tests updated.

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295336
2017-02-16 17:07:27 +00:00
Artur Pilipenko a1b384c4ce Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

llvm-svn: 295316
2017-02-16 13:04:46 +00:00
Artur Pilipenko daaa0c0f7d [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295314
2017-02-16 12:53:26 +00:00
Matt Arsenault 5de8dc9cf5 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

llvm-svn: 295242
2017-02-15 22:02:42 +00:00
Michael Kuperstein ba80db39d7 [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

llvm-svn: 295213
2017-02-15 18:37:26 +00:00
Craig Topper 96ec7a23e3 [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

llvm-svn: 295152
2017-02-15 05:57:16 +00:00
Aditya Nandakumar bb0483bc8e [Tablegen] Instrumenting table gen DAGGenISelDAG
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV

llvm-svn: 295081
2017-02-14 18:32:41 +00:00
Artyom Skrobov dc66a82dc7 Removing a redundant assignment
llvm-svn: 295055
2017-02-14 14:44:01 +00:00
Arnold Schwaighofer 8f3df731dc swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

llvm-svn: 294982
2017-02-13 19:58:28 +00:00
Quentin Colombet fbae5fcb96 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

llvm-svn: 294970
2017-02-13 17:38:59 +00:00
Craig Topper 3668bde371 [DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
2017-02-13 04:53:33 +00:00
Craig Topper aa46204ed9 [DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.

llvm-svn: 294930
2017-02-12 23:49:49 +00:00
Sanjay Patel 0557a44287 [TargetLowering] fix SETCC SETLT folding with FP types
The bug was introduced with:
https://reviews.llvm.org/rL294863

...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.

llvm-svn: 294924
2017-02-12 23:07:52 +00:00
Craig Topper b633adedc7 [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats.
llvm-svn: 294875
2017-02-11 22:57:09 +00:00
Sanjay Patel 63499b61c9 [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits
I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.

The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210

Differential Revision: https://reviews.llvm.org/D29687

llvm-svn: 294863
2017-02-11 18:01:55 +00:00
Simon Pilgrim bfb1747806 [DAGCombine] Allow vector constant folding of any value type before type legalization
The patch comes in 2 parts:

1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.

2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.

Fix for PR30760

Differential Revision: https://reviews.llvm.org/D29568

llvm-svn: 294749
2017-02-10 14:37:25 +00:00
Craig Topper a9f1121896 [SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.

I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.

Having said that though, I think we should probably fix legalize vector ops to log what its doing.

Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc

Reviewed By: RKSimon

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D29554

llvm-svn: 294711
2017-02-10 05:05:57 +00:00
Geoff Berry 7e320c2485 [SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.

Reviewers: bogner, MatzeB, qcolombet

Subscribers: mcrosier, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D29680

llvm-svn: 294605
2017-02-09 18:28:17 +00:00
Artur Pilipenko 4a64031954 [DAGCombiner] Support non-zero offset in load combine
Enable folding patterns which load the value from non-zero offset:

  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29394

llvm-svn: 294582
2017-02-09 12:06:01 +00:00
Artur Pilipenko 045ab08252 [DAGCombiner] NFC. Mark ByteProvider accessors as const
llvm-svn: 294494
2017-02-08 17:59:34 +00:00
Amaury Sechet 4b946916ac [DAGCombiner] Push truncate through adde when the carry isn't used.
Summary: As per title.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29528

llvm-svn: 294394
2017-02-08 00:32:36 +00:00
Reid Kleckner 0887d44a61 [SDAGISel] Simplify some SDAGISel code, NFC
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.

Also use range-based for loops where possible.

llvm-svn: 294329
2017-02-07 18:42:53 +00:00
Sanjay Patel 8c99ca3df0 [TargetLowering] fix formatting and comments for ShrinkDemandedConstant; NFC
llvm-svn: 294325
2017-02-07 18:04:26 +00:00
Daniel Jasper 84b3cc394d Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)"
This reverts commit r294186.

On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.

Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.

llvm-svn: 294288
2017-02-07 08:57:50 +00:00
Artur Pilipenko d3464bf9ad [DAGCombiner] Support bswap as a part of load combine patterns
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29397

llvm-svn: 294201
2017-02-06 17:48:08 +00:00
Amaury Sechet e674f5c758 Add ADDC to SelectionDAG::computeKnownBits and ComputeNumSignBits.
Summary: As per title.

Reviewers: bkramer, sunfish, lattner, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29521

llvm-svn: 294188
2017-02-06 14:59:06 +00:00
Amaury Sechet 8a3b32941d [DAGCombiner] Make DAGCombiner smarter about overflow
Summary: Leverage it to transform addc into add.

Reviewers: mkuper, spatel, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29524

llvm-svn: 294187
2017-02-06 14:54:49 +00:00
Amaury Sechet 1d466f598e [DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)
Summary: This is extracted from D29443 .

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29564

llvm-svn: 294186
2017-02-06 14:28:39 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Geoff Berry 76ca8c2b34 [SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.

Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.

Reviewers: bogner, qcolombet, MatzeB, atrick

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D29495

llvm-svn: 294129
2017-02-05 18:28:14 +00:00
Amaury Sechet 143902c29f [DAGCombiner] Leverage add's commutativity
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29541

llvm-svn: 294125
2017-02-05 14:22:20 +00:00
Craig Topper 42b83f8d6e [DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.
Based on similar code for INSERT_VECTOR_ELT.

llvm-svn: 294110
2017-02-04 23:26:39 +00:00
Craig Topper 04dce84ead [DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC
llvm-svn: 294109
2017-02-04 23:26:37 +00:00
Craig Topper ceaf9c1633 [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
llvm-svn: 294108
2017-02-04 23:26:34 +00:00
Amaury Sechet 6e2d8e49ec Formatting in DAGCombiner. NFC
llvm-svn: 294091
2017-02-04 13:01:53 +00:00
Eugene Zelenko 502d0bc28e [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce TargetInstrInfo.h dependencies.

llvm-svn: 294084
2017-02-04 02:00:53 +00:00
Ahmed Bougacha 9677cc6fb7 [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.

I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.

llvm-svn: 294028
2017-02-03 19:11:19 +00:00
Alexey Bataev a0d9f2582b [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003
2017-02-03 12:28:40 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Amaury Sechet f3e421d6e9 Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC
llvm-svn: 293903
2017-02-02 16:07:44 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Florian Hahn 7a5ec55fb3 [legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765
2017-02-01 13:01:33 +00:00
Nicolai Haehnle 8813d5d221 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635
2017-01-31 14:35:37 +00:00
Simon Pilgrim ffe2535cf6 Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Justin Bogner 8f520a73b2 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim 0a5ab5c4db Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Matt Arsenault 32e6bfa20f DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
Matt Arsenault 0c687390fe DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Craig Topper 135da1faf5 [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
2017-01-29 06:08:02 +00:00
Craig Topper 4753736abf [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414
2017-01-29 04:38:21 +00:00
Craig Topper d15730902b [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413
2017-01-29 04:38:19 +00:00
Craig Topper 24cdbe8fa6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
llvm-svn: 293412
2017-01-29 04:38:16 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Andrew Kaylor a0a1164ce4 Add intrinsics for constrained floating point operations
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226
2017-01-26 23:27:59 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Craig Topper 001aad7da7 [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors.
llvm-svn: 293152
2017-01-26 05:38:46 +00:00
Tim Northover 470f070b7d SDag: fix how initial loads are formed when splitting vector ops.
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088
2017-01-25 20:58:26 +00:00
Krzysztof Parzyszek ee9aa3ffee Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
llvm-svn: 293077
2017-01-25 19:29:04 +00:00
Artur Pilipenko bc93452420 Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
2017-01-25 09:10:07 +00:00
Artur Pilipenko 41c0005aa3 [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Matt Arsenault 8a27aee6ae DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00
Geoff Berry 92a286ae5a [SelectionDAG] Handle inverted conditions when splitting into multiple branches.
Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions.  These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use.  This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.

For example codegen X & !(Y | Z) as:
    jmp_if_X TmpBB
    jmp FBB
  TmpBB:
    jmp_if_notY Tmp2BB
    jmp FBB
  Tmp2BB:
    jmp_if_notZ TBB
    jmp FBB

Reviewers: bogner, MatzeB, qcolombet

Subscribers: llvm-commits, hiraditya, mcrosier, sebpop

Differential Revision: https://reviews.llvm.org/D28380

llvm-svn: 292944
2017-01-24 16:36:07 +00:00
Craig Topper ff272ad4f3 [SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR
Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.

For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.

Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.

Reviewers: RKSimon, delena

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29000

llvm-svn: 292876
2017-01-24 02:36:59 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Matt Arsenault 4e305c6c1e DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

llvm-svn: 292842
2017-01-23 22:48:53 +00:00
Matt Arsenault 7030661364 DAG: Allow legalization of fcanonicalize vector types
llvm-svn: 292814
2017-01-23 18:52:26 +00:00
Simon Pilgrim fb32eea1b4 [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)
This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

llvm-svn: 292528
2017-01-19 22:41:22 +00:00
Mikael Holmen 2074e7497b [DAG] Don't increase SDNodeOrder for dbg.value/declare.
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

llvm-svn: 292485
2017-01-19 13:55:55 +00:00
Matt Arsenault f411071d63 DAG: Consider nnan in isKnownNeverNaN
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Ahmed Bougacha 9e5a085cf1 Revert "[TLI] Robustize SDAG proto checking by merging it into TLI."
This reverts commit r292189, as it causes issues on SystemZ bots.

llvm-svn: 292191
2017-01-17 03:31:00 +00:00
Ahmed Bougacha c018efd680 [TLI] Robustize SDAG proto checking by merging it into TLI.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.

I don't think it's worth supporting either of these (IMO) broken
testcases.  Instead, fix them to be more correct.

llvm-svn: 292189
2017-01-17 03:10:06 +00:00
Simon Pilgrim 3e91519a1c [SelectionDAG] Add knownbits support for BITREVERSE
llvm-svn: 292130
2017-01-16 14:49:26 +00:00
Simon Pilgrim db73dbcc7c [SelectionDAG] Add support for BITREVERSE constant folding
We were relying on constant folding of the legalized instructions to do what constant folding we had previously

llvm-svn: 292114
2017-01-16 13:39:00 +00:00
Malcolm Parsons 17d266bc96 Remove unused lambda captures. NFC
llvm-svn: 291916
2017-01-13 17:12:16 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Craig Topper 7af39837a9 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

llvm-svn: 291650
2017-01-11 04:59:25 +00:00
Craig Topper 577d258569 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
llvm-svn: 291645
2017-01-11 04:02:23 +00:00
Matt Arsenault e482403e1c DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

llvm-svn: 291642
2017-01-11 02:02:12 +00:00
Eugene Zelenko c4ad1ce068 [Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291641
2017-01-11 01:45:03 +00:00
Matt Arsenault def496c04b Remove unused CONVERT_RNDSAT intrinsics
llvm-svn: 291607
2017-01-10 22:38:02 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Simon Dardis 548a53f5ee [mips] Fix Mips MSA instrinsics
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.

The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.

Reviewers: vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D25438

llvm-svn: 291571
2017-01-10 16:40:57 +00:00
Craig Topper 588c734b0f [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC
I guess previously we just assumed if the result type was i1, then the condition type must also be i1?

llvm-svn: 291548
2017-01-10 07:42:57 +00:00
Craig Topper d915d6ba57 [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it.
llvm-svn: 291534
2017-01-10 04:12:19 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
David Majnemer 9e04befb09 [SelectionDAG] Rework lowerRangeToAssertZExt
Utilize ConstantRange to make it easier to interpret range metadata.

llvm-svn: 291211
2017-01-06 02:43:28 +00:00
David Majnemer eaba06cffa [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Florian Hahn f872d230ad [selectiondag] Check PromotedFloats map during expansive checks.
Summary:
`PromotedFloats` needs to be checked in 
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.

Reviewers: baldrick, bogner, arsenm

Subscribers: arsenm, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28187

llvm-svn: 290796
2017-01-01 13:58:27 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Simon Pilgrim 0d66d29678 [SelectionDAG] Early out from computeKnownBits when we know we will have no common bits.
Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits.

llvm-svn: 290490
2016-12-24 12:59:35 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Matt Arsenault 485dacd90c DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.

llvm-svn: 290317
2016-12-22 04:39:45 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Joel Jones 8980ba643e Fix name typo in SelectonDAG
llvm-svn: 289969
2016-12-16 18:22:54 +00:00
Chandler Carruth ba5de63bc3 Add extra headers that got deleted by my revert in r289916 but for which
new usage had already grown in the file.

llvm-svn: 289917
2016-12-16 04:08:31 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjay Patel afee21a5b2 [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489

llvm-svn: 289738
2016-12-14 22:59:14 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Artur Pilipenko f3ee444010 Add a couple of assertions to the load combine code introduced by r289538
llvm-svn: 289646
2016-12-14 11:55:47 +00:00
Artur Pilipenko 469fcd2afd Use more detailed assertion messages in the code introduced by r289538
llvm-svn: 289545
2016-12-13 16:26:15 +00:00
Artur Pilipenko 79d1255e26 Fix a buildbot failure introduced by r289538
Build failed because of unused variable in product mode.

llvm-svn: 289540
2016-12-13 14:55:31 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Artur Pilipenko 01e86444a0 Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user
llvm-svn: 289537
2016-12-13 14:16:02 +00:00
Simon Pilgrim 9dc67c0101 [SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI.
We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this.

llvm-svn: 289534
2016-12-13 13:36:27 +00:00
Philip Reames 51387a8c28 [Statepoints] Reuse stack slots more than once within a basic block
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.

The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.

Differential Revision: https://reviews.llvm.org/D25243

llvm-svn: 289509
2016-12-13 01:21:15 +00:00
Simon Pilgrim 040a36c176 [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits
Pre-commit as discussed on D27657

llvm-svn: 289425
2016-12-12 10:29:43 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Nicolai Haehnle f08dc90253 [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
Summary:
Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types.

Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and
a sequence of half-width multiplications otherwise. Promote uses a wider
multiply.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD

Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24956

llvm-svn: 289050
2016-12-08 14:08:14 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim ff79f31328 [SelectionDAG] Removed old knownbits TODO comment. NFCI.
EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range.

llvm-svn: 288913
2016-12-07 15:31:12 +00:00
Eli Friedman 0a76e3241f [CodeGen] Fix result type for SMULO/UMULO legalization
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.

This fixes issue https://github.com/rust-lang/rust/issues/37829

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D27154

llvm-svn: 288857
2016-12-06 22:49:36 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Simon Pilgrim 29c17f3f58 Avoid repeated calls to Op.getOpcode(). NFCI.
llvm-svn: 288814
2016-12-06 14:50:09 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Justin Bogner 35c5e58f8c SDAG: Avoid a large, usually empty SmallVector in a recursive function
This SmallVector is using up 128 bytes on the stack every time despite
almost always being empty[1], and since this function can recurse quite
deeply that adds up to a lot of overhead. We've seen this run afoul of
ulimits in some cases with ASAN on.

Replacing the SmallVector with a std::vector trades an occasional heap
allocation for vastly less stack usage.

[1]: I gathered some stats on an internal test suite and the vector
was non-empty in only 45,000 of 10,000,000 calls to this function.

llvm-svn: 288441
2016-12-02 00:11:01 +00:00
Matthias Braun d0ee66c2e9 Move most EH from MachineModuleInfo to MachineFunction
Recommitting r288293 with some extra fixes for GlobalISel code.

Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288405
2016-12-01 19:32:15 +00:00
Nicolai Haehnle da7e4017c6 [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC)
Summary: Suggested by @spatel in D26602.

Reviewers: spatel, hfinkel

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27260

llvm-svn: 288336
2016-12-01 14:04:13 +00:00
Eric Christopher e70b7c3dfb Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction"
This apprears to have broken the global isel bot:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console

This reverts commit r288293.

llvm-svn: 288322
2016-12-01 07:50:12 +00:00
Matthias Braun ed14cb0604 Move most EH from MachineModuleInfo to MachineFunction
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288293
2016-11-30 23:49:01 +00:00
Matthias Braun ef331eff5a Move VariableDbgInfo from MachineModuleInfo to MachineFunction
VariableDbgInfo is per function data, so it makes sense to have it with
the function instead of the module.

This is a necessary step to have machine module passes work properly.

Differential Revision: https://reviews.llvm.org/D27186

llvm-svn: 288292
2016-11-30 23:48:50 +00:00
Nicolai Haehnle 73a9a27b5a [SelectionDAG] Refactor TargetLowering::expandMUL (NFC)
Summary: Further preparation for the expansion of MUL_LOHI added in D24956.

Reviewers: efriedma, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27064

llvm-svn: 288248
2016-11-30 16:26:33 +00:00
Warren Ristow d9777c1dbb Test commit. Comment changes. NFC.
llvm-svn: 288100
2016-11-29 02:37:13 +00:00
Sanjay Patel 2bd32b05fb [DAG] clean up foldSelectCCToShiftAnd(); NFCI
llvm-svn: 288088
2016-11-28 23:05:55 +00:00
Sanjay Patel 1cf9aff659 [DAG] add helper function for selectcc --> and+shift transforms; NFC
llvm-svn: 288073
2016-11-28 21:47:41 +00:00
Nirav Dave a413361798 Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"
This reverts commit r287773 which caused issues with ppc64le builds.

llvm-svn: 288035
2016-11-28 14:30:29 +00:00
Simon Pilgrim c5fb167df0 Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287941
2016-11-25 17:25:21 +00:00
Craig Topper 8c4cdf06db [DAGCombine] Teach DAG combine that if both inputs of a vselect are the same, then the condition doesn't matter and the vselect can be removed.
Selects with scalar condition already handle this correctly.

llvm-svn: 287904
2016-11-24 21:48:52 +00:00
Nicolai Haehnle 934470f536 [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC)
Summary: Reduce indentation level; preparation for D24956.

Reviewers: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27063

llvm-svn: 287831
2016-11-23 22:14:20 +00:00
Nirav Dave cf34556330 [DAG] Improve loads-from-store forwarding to handle TokenFactor
Forward store values to matching loads down through token
factors. Factored from D14834.

Reviewers: jyknight, hfinkel

Subscribers: hfinkel, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D26080

llvm-svn: 287773
2016-11-23 16:48:35 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Matt Arsenault b30d2aca58 DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

llvm-svn: 287593
2016-11-21 22:56:42 +00:00
Simon Pilgrim 5662074ba3 [VectorLegalizer] Remove EVT::getSizeInBits code duplications. NFCI.
We were calling SVT.getSizeInBits() several times in a row - just call it once and reuse the result.

llvm-svn: 287556
2016-11-21 18:24:44 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00
Simon Pilgrim 7a6b6d5656 Fix spelling mistakes in SelectionDAG comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287487
2016-11-20 13:14:57 +00:00
Simon Pilgrim e40900dddd [SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode
llvm-svn: 287387
2016-11-18 22:21:22 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Simon Pilgrim c4d733cd6a Fix spelling in comment. NFC.
llvm-svn: 287222
2016-11-17 12:03:05 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Ahmed Bougacha bd6ce9a247 [CodeGen] Pass references, not pointers, to MMI helpers. NFC.
While there, rename them to follow the coding style.

llvm-svn: 287169
2016-11-16 22:25:03 +00:00
Ahmed Bougacha 456dce8a84 [CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC.
They're not SelectionDAG- or FunctionLoweringInfo-specific.  They
are, however, specific to building MMI from IR.
We could make them members, but it's nice having MMI be a "simple" data
structure and this logic kept separate.

This also lets us reuse them from GlobalISel.

llvm-svn: 287167
2016-11-16 22:24:56 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Joerg Sonnenberger 1a7eec68a9 Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases.  As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

llvm-svn: 286951
2016-11-15 12:39:46 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Simon Pilgrim 807f9cf243 [SelectionDAG] Add support for vector demandedelts in BSWAP opcodes
llvm-svn: 286582
2016-11-11 11:51:29 +00:00
Simon Pilgrim 813721e98a [SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes
llvm-svn: 286578
2016-11-11 11:23:43 +00:00
Simon Pilgrim 0652227814 [SelectionDAG] Add support for vector demandedelts in UDIV opcodes
llvm-svn: 286576
2016-11-11 10:47:24 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Simon Pilgrim 38f0045cb0 [SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes
llvm-svn: 286516
2016-11-10 22:41:49 +00:00
Simon Pilgrim fe3a54371d [SelectionDAG] Add support for splatted vectors in SUB opcode
llvm-svn: 286509
2016-11-10 21:57:42 +00:00
Simon Pilgrim d67af68f06 [SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes
llvm-svn: 286481
2016-11-10 17:43:52 +00:00
Simon Pilgrim 33fef8e865 Use common SDLoc. NFCI.
llvm-svn: 286473
2016-11-10 16:47:09 +00:00
Simon Pilgrim ee187fd6e7 [SelectionDAG] Add support for vector demandedelts in MUL opcodes
llvm-svn: 286471
2016-11-10 16:27:42 +00:00
Simon Pilgrim ca57e53ded [SelectionDAG] Add support for vector demandedelts in SRA opcodes
llvm-svn: 286461
2016-11-10 15:05:09 +00:00
Simon Pilgrim 37c9034bd6 [DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes
We were failing to extract a constant splat shift value if the shifted value was being masked.

The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.

llvm-svn: 286454
2016-11-10 14:35:09 +00:00
Simon Pilgrim 3bf99c056a [SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes
llvm-svn: 286448
2016-11-10 13:52:42 +00:00
Simon Pilgrim 778596bf59 [TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.

The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....

This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).

The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.

Differential Revision: https://reviews.llvm.org/D26031

llvm-svn: 286238
2016-11-08 15:07:01 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Richard Smith 857efb0880 Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel.
Differential Revision: https://reviews.llvm.org/D26292

llvm-svn: 286119
2016-11-07 16:47:20 +00:00
Simon Pilgrim 39df78e384 [SelectionDAG] Add support for vector demandedelts in XOR opcodes
llvm-svn: 286075
2016-11-06 16:49:19 +00:00
Simon Pilgrim dd4809a603 [SelectionDAG] Add support for vector demandedelts in OR opcodes
llvm-svn: 286071
2016-11-06 16:29:09 +00:00
Nicolai Haehnle bea772c6dc DAGCombiner: fix use-after-free when merging consecutive stores
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.

Reviewers: chandlerc, bogner, niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25601

llvm-svn: 285916
2016-11-03 14:25:04 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Simon Pilgrim 93f2f7fb6c Use !operator to test if APInt is zero/non-zero. NFCI.
Avoids APInt construction and slower comparisons.

llvm-svn: 285822
2016-11-02 15:41:15 +00:00
Joerg Sonnenberger 5e31b3ad93 Simplify.
llvm-svn: 285802
2016-11-02 12:45:28 +00:00
Sanjay Patel 70c5f02d25 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 

llvm-svn: 285656
2016-10-31 23:28:45 +00:00
Sanjay Patel 339a51ac13 [DAG] x | x --> x
llvm-svn: 285522
2016-10-30 18:19:35 +00:00
Sanjay Patel 13aee345ca [DAG] x & x --> x
llvm-svn: 285521
2016-10-30 18:13:30 +00:00
Simon Pilgrim 75a697a17e [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285494
2016-10-29 11:29:39 +00:00
Davide Italiano 86168b23cf [DAGCombiner] Fix a crash visiting `AND` nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

llvm-svn: 285480
2016-10-28 23:55:32 +00:00
Justin Bogner db6b6a7f0c SDAG: Make sure we use an allocatable reg class when we create this vreg
As per the discussion on r280783, if constrainRegClass fails we need
to call getAllocatableClass like we did before that commit.

llvm-svn: 285467
2016-10-28 22:42:54 +00:00
Simon Pilgrim d9189891fc [SelectionDAG] computeKnownBits - early-out if any BUILD_VECTOR element has no known bits
No need to check the remaining elements - no common known bits are available.

llvm-svn: 285399
2016-10-28 14:07:44 +00:00
Simon Pilgrim 8c043061e5 [SelectionDAG] Tidyup UDIV computeKnownBits implementation
No need to clear KnownOne2/KnownZero2 bits as the next call to computeKnownBits will overwrite them anyway

llvm-svn: 285398
2016-10-28 13:42:23 +00:00
Simon Pilgrim 755cef1ba8 [SelectionDAG] Increment computeKnownBits recursion depth for SMIN/SMAX/UMIN/UMAX like all other ops
llvm-svn: 285397
2016-10-28 13:13:16 +00:00
Juergen Ributzka 5cee232be4 Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits"
This seems to have increased LTO compile time bejond 2x of previous builds.
See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/

llvm-svn: 285381
2016-10-28 04:01:12 +00:00
Simon Pilgrim 01e755eab1 [DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285296
2016-10-27 14:29:28 +00:00
Nemanja Ivanovic 275853e777 Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.

This fixes PR 30715.

llvm-svn: 285231
2016-10-26 19:51:35 +00:00
Tom Stellard 284cf32ab4 LegalizeDAG: Support promoting [US]DIV and [US]REM operations
Summary:
AMDGPU will need this one i16 is added as a legal type.  This is tested by:

test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll

Reviewers: bogner, efriedma

Subscribers: efriedma, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25699

llvm-svn: 285199
2016-10-26 14:52:25 +00:00
Simon Pilgrim de86241a09 [DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1)) combine for splatted vectors
llvm-svn: 285129
2016-10-25 22:01:09 +00:00
Simon Pilgrim f534573e8c [DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285123
2016-10-25 21:20:18 +00:00
Simon Pilgrim 4ebb04510a [DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285118
2016-10-25 20:56:42 +00:00
Evandro Menezes 601f4cb9f7 Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases.  The motivation is
that many contemporary processors typically perform case switches fairly
quickly.

Differential revision: https://reviews.llvm.org/D25212

llvm-svn: 285099
2016-10-25 19:11:43 +00:00
Zvi Rackover 124470a202 [DAGCombine] Preserve shuffles when one of the vector operands is constant
Summary:
Do *not* perform combines such as:

    vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
    ->
    build_vector(X, C0, C1, C2)

Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).

Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25524

llvm-svn: 285063
2016-10-25 12:14:19 +00:00
Simon Pilgrim e3e6585c2d [SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar or vector splats
Use isConstOrConstSplat helper.

Also use APInt instead of getZExtValue directly to avoid out of range issues.

llvm-svn: 285033
2016-10-24 21:47:19 +00:00
Simon Pilgrim d8ec09c74f Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 285025
2016-10-24 20:56:52 +00:00
Peter Collingbourne 16e9b944e9 CodeGen: Do not add a global's address space to the folding set profile.
It is already part of the type (which is part of the global, which is already
being added), so there's no need to do it.

llvm-svn: 285002
2016-10-24 18:56:09 +00:00
Sanjay Patel 9ca028c2d6 [DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constant
llvm-svn: 284953
2016-10-23 23:13:31 +00:00
Simon Pilgrim d06641d3dc Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 284949
2016-10-23 20:17:21 +00:00
Sanjay Patel ca92c36e01 [DAG] enhance computeKnownBits to handle SHL with vector splat constant
Also, use APInt to avoid crashing on types larger than vNi64.

llvm-svn: 284874
2016-10-21 20:16:27 +00:00
Sanjay Patel 81029f6a76 [DAG] fold negation of sign-bit
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value

This is the DAG equivalent of:
https://reviews.llvm.org/rL284649

plus the fold for the NUW case which already existed in InstSimplify.

Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().

llvm-svn: 284844
2016-10-21 17:24:26 +00:00
Sanjay Patel cbaba93ce8 [DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.

Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x

Note that although the test cases include a 'sin' function call, I'm side-stepping the 
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.

Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.

Differential Revision: https://reviews.llvm.org/D25297

llvm-svn: 284824
2016-10-21 14:36:58 +00:00
Pirama Arumuga Nainar 05b0f93ad3 Fix *_EXTEND_VECTOR_INREG legalization
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation.  If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)

Updated the promote-vec3.ll test after this change.  The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.

Reviewers: RKSimon

Subscribers: llvm-commits, srhines

Differential Revision: https://reviews.llvm.org/D25790

llvm-svn: 284752
2016-10-20 17:56:36 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Simon Pilgrim 618d3aedaf [DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> (and x, cst2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284717
2016-10-20 11:10:21 +00:00
Simon Pilgrim e32d0f8413 Merged nested ifs. NFCI.
llvm-svn: 284616
2016-10-19 17:30:24 +00:00
Simon Pilgrim a20aeea998 [DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284613
2016-10-19 17:12:22 +00:00
Reid Kleckner f7ad5341d0 [WinEH] Allow catchpads to reuse the same catch object
This code used a regular when it should have used a multimap.

llvm-svn: 284612
2016-10-19 17:08:23 +00:00
Sanjay Patel 3a3aaf67e0 [DAG] optimize negation of bool
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.

This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395

There may be others, so please let me know if you see any regressions.

Differential Revision: https://reviews.llvm.org/D25485

llvm-svn: 284611
2016-10-19 16:58:59 +00:00
Simon Pilgrim 4554e161be [DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) -> (and x, (shl -1, c1))
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284608
2016-10-19 16:15:30 +00:00
Simon Pilgrim c2e9724909 [DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) -> (mul x, c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284607
2016-10-19 15:59:28 +00:00
Simon Pilgrim 7dcb6e572e [DAGCombiner] Just call isConstOrConstSplat directly. NFCI.
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.

llvm-svn: 284578
2016-10-19 11:28:15 +00:00
Simon Pilgrim b2ca2505cc [DAGCombine] Generalize distributeTruncateThroughAnd to work with any non-opaque constant or constant vector
llvm-svn: 284574
2016-10-19 08:57:37 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Simon Pilgrim 25e9628978 [DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x >>u (log2(pow2)+y)
llvm-svn: 284491
2016-10-18 16:36:00 +00:00
Simon Pilgrim 65e0c73875 Strip trailing whitespace (NFCI)
llvm-svn: 284478
2016-10-18 13:44:00 +00:00
Sanjay Patel 523cd8290a [DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA
The scalar version of this pattern was noted in:
https://reviews.llvm.org/D25485

and fixed with:
https://reviews.llvm.org/rL284395

More refactoring of the constant/splat helpers is needed and will happen in follow-up patches.

Differential Revision: https://reviews.llvm.org/D25685

llvm-svn: 284424
2016-10-17 20:41:39 +00:00
Sanjay Patel a7cab58055 [DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC
As noted in:
https://reviews.llvm.org/D25685

This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch. 
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.

llvm-svn: 284421
2016-10-17 20:26:46 +00:00
Sanjay Patel 2cf6bfaf73 [DAG] optimize away an arithmetic-right-shift of a 0 or -1 value
This came up as part of:
https://reviews.llvm.org/D25485

Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.

llvm-svn: 284395
2016-10-17 15:58:28 +00:00
James Molloy aa79b19a3e [SDAG] Use ABI type alignment for constant pools when optimizing for size
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.

In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.

llvm-svn: 284381
2016-10-17 12:54:07 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Sanjay Patel 72b5ff646d [DAG] avoid creating illegal node when transforming negated shifted sign bit
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.

llvm-svn: 284268
2016-10-14 19:46:31 +00:00
Tom Stellard ab61007914 TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses.  SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:

and v1, v0, 0xffffff
mul24 v2, v1, v1      ; Multiply ignoring high 8-bits.

To:

mul24 v2, v0, v0

Where before this would not be optimized, because v1 has multiple uses.

Reviewers: bogner, arsenm

Subscribers: nhaehnle, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24964

llvm-svn: 284266
2016-10-14 19:14:26 +00:00
Sanjay Patel 00fc7a6159 [DAG] add folds for negated shifted sign bit
The same folds exist in InstCombine already.

This came up as part of:
https://reviews.llvm.org/D25485

llvm-svn: 284239
2016-10-14 14:26:47 +00:00
Nicolai Haehnle 86e72d98dd Fix use-after-frees
Extracted from D25313, as suggested by Justin Bogner.

llvm-svn: 284220
2016-10-14 09:49:51 +00:00
Craig Topper 40feb7f157 [DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size.
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.

llvm-svn: 284204
2016-10-14 06:00:42 +00:00
Sanjay Patel 98d0ea64ca [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 284170
2016-10-13 22:27:10 +00:00
Tom Stellard f80c1875a3 LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE
Summary:
This operation is promoted the same way was ISD::BSWAP.  This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.

Reviewers: bogner, hfinkel

Subscribers: hfinkel, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25202

llvm-svn: 284163
2016-10-13 21:03:49 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Simon Pilgrim cb59b5257c [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines
llvm-svn: 284122
2016-10-13 14:04:35 +00:00
Simon Pilgrim fa8fadc0e5 [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding
llvm-svn: 284117
2016-10-13 12:49:31 +00:00
Simon Pilgrim 833b8a2071 [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization
Improves commutation potential

llvm-svn: 284113
2016-10-13 12:05:20 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Simon Pilgrim 08190943cb [DAGCombiner] Update most ADD combines to support general vector combines
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.

Differential Revision: https://reviews.llvm.org/D25374

llvm-svn: 284015
2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov 081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Michael Kuperstein 7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Arnold Schwaighofer 9103e268cf Silence -Wunused-but-set-variable warning
llvm-svn: 283927
2016-10-11 19:49:29 +00:00
Sanjay Patel 8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel 8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel 38a42e4bfa [DAG] simplify logic; NFC
llvm-svn: 283885
2016-10-11 14:14:30 +00:00
Sanjay Patel 907ae69125 [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 283884
2016-10-11 14:04:24 +00:00
Sanjay Patel 9609f3d6c7 [DAG] fix formatting; NFC
llvm-svn: 283878
2016-10-11 13:47:43 +00:00
Hal Finkel fcd2421667 [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.

Patch by Edward Jones, thanks!

Differential Revision: https://reviews.llvm.org/D24459

llvm-svn: 283797
2016-10-10 20:45:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Arnold Schwaighofer 3f25658143 swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.

Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.

Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.

When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.

This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.

rdar://28300923

llvm-svn: 283617
2016-10-07 22:06:55 +00:00
Sanjay Patel 14c02052d6 [DAG] clean up foldSelectOfConstants(); NFCI
Rename variables, simplify logic. 
Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too.

llvm-svn: 283613
2016-10-07 21:55:42 +00:00
Sanjay Patel ecaf343fe7 [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC
We're missing at least 3 other similar folds based on what we have in InstCombine. 

llvm-svn: 283596
2016-10-07 20:47:51 +00:00
Vedant Kumar 7beb423765 Delete some dead code in SelectionDAG (NFC)
Differential Revision: https://reviews.llvm.org/D24435

llvm-svn: 283505
2016-10-06 22:53:43 +00:00
Pirama Arumuga Nainar cc152ac794 Handle *_EXTEND_VECTOR_INREG during Integer Legalization
Summary:
These nodes need legalization for 3-element vectors.  This commit
handles the legalization and adds tests for zext and sext.

This fixes PR30614.

Reviewers: RKSimon, srhines

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25268

llvm-svn: 283496
2016-10-06 21:27:05 +00:00
Michael Kuperstein 7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Peter Collingbourne d799d28540 FastISel: Remove unused/un-overridden entry points. NFCI.
llvm-svn: 283366
2016-10-05 19:25:20 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Mehdi Amini 3e021be3b6 Use StringRef in FastISel API (NFC)
llvm-svn: 283291
2016-10-05 01:37:29 +00:00
whitequark 7c4fe0e9a3 [SelectionDAG] Fix calling convention in expansion of ?MULO.
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.

The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.

Thanks to James Miller <james@aatch.net> for help in debugging.

Differential Revision: https://reviews.llvm.org/D25223

llvm-svn: 283203
2016-10-04 09:07:49 +00:00
Sanjay Patel f7df85af87 fix formatting; NFC
llvm-svn: 283115
2016-10-03 15:18:36 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Sanjay Patel 284582b6d4 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Pawel Bylica c397f0b272 [CodeGen] Fix invalid shift in mul expansion
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478

llvm-svn: 281403
2016-09-13 21:55:41 +00:00
Michael Kuperstein 59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Simon Pilgrim 4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Simon Pilgrim bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Ayman Musa 0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Michael Kuperstein efc0667583 [DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI.
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.

llvm-svn: 281283
2016-09-13 00:57:43 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00