Commit Graph

8347 Commits

Author SHA1 Message Date
Matt Arsenault 5de8dc9cf5 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

llvm-svn: 295242
2017-02-15 22:02:42 +00:00
Michael Kuperstein ba80db39d7 [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

llvm-svn: 295213
2017-02-15 18:37:26 +00:00
Craig Topper 96ec7a23e3 [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

llvm-svn: 295152
2017-02-15 05:57:16 +00:00
Aditya Nandakumar bb0483bc8e [Tablegen] Instrumenting table gen DAGGenISelDAG
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV

llvm-svn: 295081
2017-02-14 18:32:41 +00:00
Artyom Skrobov dc66a82dc7 Removing a redundant assignment
llvm-svn: 295055
2017-02-14 14:44:01 +00:00
Arnold Schwaighofer 8f3df731dc swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

llvm-svn: 294982
2017-02-13 19:58:28 +00:00
Quentin Colombet fbae5fcb96 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

llvm-svn: 294970
2017-02-13 17:38:59 +00:00
Craig Topper 3668bde371 [DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
2017-02-13 04:53:33 +00:00
Craig Topper aa46204ed9 [DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.

llvm-svn: 294930
2017-02-12 23:49:49 +00:00
Sanjay Patel 0557a44287 [TargetLowering] fix SETCC SETLT folding with FP types
The bug was introduced with:
https://reviews.llvm.org/rL294863

...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.

llvm-svn: 294924
2017-02-12 23:07:52 +00:00
Craig Topper b633adedc7 [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats.
llvm-svn: 294875
2017-02-11 22:57:09 +00:00
Sanjay Patel 63499b61c9 [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits
I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.

The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210

Differential Revision: https://reviews.llvm.org/D29687

llvm-svn: 294863
2017-02-11 18:01:55 +00:00
Simon Pilgrim bfb1747806 [DAGCombine] Allow vector constant folding of any value type before type legalization
The patch comes in 2 parts:

1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.

2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.

Fix for PR30760

Differential Revision: https://reviews.llvm.org/D29568

llvm-svn: 294749
2017-02-10 14:37:25 +00:00
Craig Topper a9f1121896 [SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.

I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.

Having said that though, I think we should probably fix legalize vector ops to log what its doing.

Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc

Reviewed By: RKSimon

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D29554

llvm-svn: 294711
2017-02-10 05:05:57 +00:00
Geoff Berry 7e320c2485 [SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.

Reviewers: bogner, MatzeB, qcolombet

Subscribers: mcrosier, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D29680

llvm-svn: 294605
2017-02-09 18:28:17 +00:00
Artur Pilipenko 4a64031954 [DAGCombiner] Support non-zero offset in load combine
Enable folding patterns which load the value from non-zero offset:

  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29394

llvm-svn: 294582
2017-02-09 12:06:01 +00:00
Artur Pilipenko 045ab08252 [DAGCombiner] NFC. Mark ByteProvider accessors as const
llvm-svn: 294494
2017-02-08 17:59:34 +00:00
Amaury Sechet 4b946916ac [DAGCombiner] Push truncate through adde when the carry isn't used.
Summary: As per title.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29528

llvm-svn: 294394
2017-02-08 00:32:36 +00:00
Reid Kleckner 0887d44a61 [SDAGISel] Simplify some SDAGISel code, NFC
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.

Also use range-based for loops where possible.

llvm-svn: 294329
2017-02-07 18:42:53 +00:00
Sanjay Patel 8c99ca3df0 [TargetLowering] fix formatting and comments for ShrinkDemandedConstant; NFC
llvm-svn: 294325
2017-02-07 18:04:26 +00:00
Daniel Jasper 84b3cc394d Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)"
This reverts commit r294186.

On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.

Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.

llvm-svn: 294288
2017-02-07 08:57:50 +00:00
Artur Pilipenko d3464bf9ad [DAGCombiner] Support bswap as a part of load combine patterns
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29397

llvm-svn: 294201
2017-02-06 17:48:08 +00:00
Amaury Sechet e674f5c758 Add ADDC to SelectionDAG::computeKnownBits and ComputeNumSignBits.
Summary: As per title.

Reviewers: bkramer, sunfish, lattner, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29521

llvm-svn: 294188
2017-02-06 14:59:06 +00:00
Amaury Sechet 8a3b32941d [DAGCombiner] Make DAGCombiner smarter about overflow
Summary: Leverage it to transform addc into add.

Reviewers: mkuper, spatel, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29524

llvm-svn: 294187
2017-02-06 14:54:49 +00:00
Amaury Sechet 1d466f598e [DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)
Summary: This is extracted from D29443 .

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29564

llvm-svn: 294186
2017-02-06 14:28:39 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Geoff Berry 76ca8c2b34 [SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.

Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.

Reviewers: bogner, qcolombet, MatzeB, atrick

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D29495

llvm-svn: 294129
2017-02-05 18:28:14 +00:00
Amaury Sechet 143902c29f [DAGCombiner] Leverage add's commutativity
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29541

llvm-svn: 294125
2017-02-05 14:22:20 +00:00
Craig Topper 42b83f8d6e [DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.
Based on similar code for INSERT_VECTOR_ELT.

llvm-svn: 294110
2017-02-04 23:26:39 +00:00
Craig Topper 04dce84ead [DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC
llvm-svn: 294109
2017-02-04 23:26:37 +00:00
Craig Topper ceaf9c1633 [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
llvm-svn: 294108
2017-02-04 23:26:34 +00:00
Amaury Sechet 6e2d8e49ec Formatting in DAGCombiner. NFC
llvm-svn: 294091
2017-02-04 13:01:53 +00:00
Eugene Zelenko 502d0bc28e [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce TargetInstrInfo.h dependencies.

llvm-svn: 294084
2017-02-04 02:00:53 +00:00
Ahmed Bougacha 9677cc6fb7 [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.

I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.

llvm-svn: 294028
2017-02-03 19:11:19 +00:00
Alexey Bataev a0d9f2582b [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003
2017-02-03 12:28:40 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Amaury Sechet f3e421d6e9 Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC
llvm-svn: 293903
2017-02-02 16:07:44 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Florian Hahn 7a5ec55fb3 [legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765
2017-02-01 13:01:33 +00:00
Nicolai Haehnle 8813d5d221 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635
2017-01-31 14:35:37 +00:00
Simon Pilgrim ffe2535cf6 Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Justin Bogner 8f520a73b2 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim 0a5ab5c4db Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Matt Arsenault 32e6bfa20f DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
Matt Arsenault 0c687390fe DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Craig Topper 135da1faf5 [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
2017-01-29 06:08:02 +00:00
Craig Topper 4753736abf [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414
2017-01-29 04:38:21 +00:00
Craig Topper d15730902b [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413
2017-01-29 04:38:19 +00:00
Craig Topper 24cdbe8fa6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
llvm-svn: 293412
2017-01-29 04:38:16 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Andrew Kaylor a0a1164ce4 Add intrinsics for constrained floating point operations
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226
2017-01-26 23:27:59 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Craig Topper 001aad7da7 [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors.
llvm-svn: 293152
2017-01-26 05:38:46 +00:00
Tim Northover 470f070b7d SDag: fix how initial loads are formed when splitting vector ops.
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088
2017-01-25 20:58:26 +00:00
Krzysztof Parzyszek ee9aa3ffee Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
llvm-svn: 293077
2017-01-25 19:29:04 +00:00
Artur Pilipenko bc93452420 Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
2017-01-25 09:10:07 +00:00
Artur Pilipenko 41c0005aa3 [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Matt Arsenault 8a27aee6ae DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00
Geoff Berry 92a286ae5a [SelectionDAG] Handle inverted conditions when splitting into multiple branches.
Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions.  These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use.  This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.

For example codegen X & !(Y | Z) as:
    jmp_if_X TmpBB
    jmp FBB
  TmpBB:
    jmp_if_notY Tmp2BB
    jmp FBB
  Tmp2BB:
    jmp_if_notZ TBB
    jmp FBB

Reviewers: bogner, MatzeB, qcolombet

Subscribers: llvm-commits, hiraditya, mcrosier, sebpop

Differential Revision: https://reviews.llvm.org/D28380

llvm-svn: 292944
2017-01-24 16:36:07 +00:00
Craig Topper ff272ad4f3 [SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR
Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.

For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.

Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.

Reviewers: RKSimon, delena

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29000

llvm-svn: 292876
2017-01-24 02:36:59 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Matt Arsenault 4e305c6c1e DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

llvm-svn: 292842
2017-01-23 22:48:53 +00:00
Matt Arsenault 7030661364 DAG: Allow legalization of fcanonicalize vector types
llvm-svn: 292814
2017-01-23 18:52:26 +00:00
Simon Pilgrim fb32eea1b4 [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)
This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

llvm-svn: 292528
2017-01-19 22:41:22 +00:00
Mikael Holmen 2074e7497b [DAG] Don't increase SDNodeOrder for dbg.value/declare.
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

llvm-svn: 292485
2017-01-19 13:55:55 +00:00
Matt Arsenault f411071d63 DAG: Consider nnan in isKnownNeverNaN
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Ahmed Bougacha 9e5a085cf1 Revert "[TLI] Robustize SDAG proto checking by merging it into TLI."
This reverts commit r292189, as it causes issues on SystemZ bots.

llvm-svn: 292191
2017-01-17 03:31:00 +00:00
Ahmed Bougacha c018efd680 [TLI] Robustize SDAG proto checking by merging it into TLI.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.

I don't think it's worth supporting either of these (IMO) broken
testcases.  Instead, fix them to be more correct.

llvm-svn: 292189
2017-01-17 03:10:06 +00:00
Simon Pilgrim 3e91519a1c [SelectionDAG] Add knownbits support for BITREVERSE
llvm-svn: 292130
2017-01-16 14:49:26 +00:00
Simon Pilgrim db73dbcc7c [SelectionDAG] Add support for BITREVERSE constant folding
We were relying on constant folding of the legalized instructions to do what constant folding we had previously

llvm-svn: 292114
2017-01-16 13:39:00 +00:00
Malcolm Parsons 17d266bc96 Remove unused lambda captures. NFC
llvm-svn: 291916
2017-01-13 17:12:16 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Craig Topper 7af39837a9 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

llvm-svn: 291650
2017-01-11 04:59:25 +00:00
Craig Topper 577d258569 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
llvm-svn: 291645
2017-01-11 04:02:23 +00:00
Matt Arsenault e482403e1c DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

llvm-svn: 291642
2017-01-11 02:02:12 +00:00
Eugene Zelenko c4ad1ce068 [Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291641
2017-01-11 01:45:03 +00:00
Matt Arsenault def496c04b Remove unused CONVERT_RNDSAT intrinsics
llvm-svn: 291607
2017-01-10 22:38:02 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Simon Dardis 548a53f5ee [mips] Fix Mips MSA instrinsics
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.

The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.

Reviewers: vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D25438

llvm-svn: 291571
2017-01-10 16:40:57 +00:00
Craig Topper 588c734b0f [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC
I guess previously we just assumed if the result type was i1, then the condition type must also be i1?

llvm-svn: 291548
2017-01-10 07:42:57 +00:00
Craig Topper d915d6ba57 [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it.
llvm-svn: 291534
2017-01-10 04:12:19 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
David Majnemer 9e04befb09 [SelectionDAG] Rework lowerRangeToAssertZExt
Utilize ConstantRange to make it easier to interpret range metadata.

llvm-svn: 291211
2017-01-06 02:43:28 +00:00
David Majnemer eaba06cffa [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
Tim Shen 5480eb8445 [Legalizer] Fix fp-to-uint to fp-tosint promotion assertion.
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:

  fp-to-uint16: 65534.0 -> 0xfffe

With the legalization:

  fp-to-sint32: 65534.0 -> 0x0000fffe

Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.

Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.

This patch reverts r279223 behavior and adds more tests and
documentations.

In PR29041's context, James Molloy mentioned that:

  We don't need to mask because conversion from float->uint8_t is
  undefined if the integer part of the float value is not representable in
  uint8_t. Therefore we can assume this doesn't happen!

which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.

Reviewers: jmolloy, nadav, echristo, kbarton

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28284

llvm-svn: 291015
2017-01-04 22:11:42 +00:00
Evgeny Stupachenko c88697dc16 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
2017-01-04 21:43:39 +00:00
Florian Hahn f872d230ad [selectiondag] Check PromotedFloats map during expansive checks.
Summary:
`PromotedFloats` needs to be checked in 
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.

Reviewers: baldrick, bogner, arsenm

Subscribers: arsenm, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28187

llvm-svn: 290796
2017-01-01 13:58:27 +00:00
Reid Kleckner 0e7c84c682 Simplify FunctionLoweringInfo.cpp with range for loops
I'm preparing to add some pattern matching code here, so simplify the
code before I do. NFC

llvm-svn: 290731
2016-12-30 00:21:38 +00:00
Igor Laevsky 4f31e52f94 Introduce element-wise atomic memcpy intrinsic
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.

Differential Revision: https://reviews.llvm.org/D27133

llvm-svn: 290708
2016-12-29 14:31:07 +00:00
Simon Pilgrim 0d66d29678 [SelectionDAG] Early out from computeKnownBits when we know we will have no common bits.
Avoid extra (recursive) calls to computeKnownBits if we already know that there are no common known bits.

llvm-svn: 290490
2016-12-24 12:59:35 +00:00
Zijiao Ma bf6007bd1b Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

llvm-svn: 290410
2016-12-23 02:56:07 +00:00
Wei Mi f3f01aba48 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707

llvm-svn: 290363
2016-12-22 19:38:22 +00:00
Matt Arsenault 485dacd90c DAG: Add helper for testing constant values
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.

llvm-svn: 290317
2016-12-22 04:39:45 +00:00
Oren Ben Simhon 3b95157090 [X86] Vectorcall Calling Convention - Adding CodeGen Complete Support
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. 
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.

Differential Revision: https://reviews.llvm.org/D27392

llvm-svn: 290240
2016-12-21 08:31:45 +00:00
Joel Jones 8980ba643e Fix name typo in SelectonDAG
llvm-svn: 289969
2016-12-16 18:22:54 +00:00
Chandler Carruth ba5de63bc3 Add extra headers that got deleted by my revert in r289916 but for which
new usage had already grown in the file.

llvm-svn: 289917
2016-12-16 04:08:31 +00:00
Chandler Carruth 4154062b69 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

llvm-svn: 289916
2016-12-16 04:05:22 +00:00
Eli Friedman 379294676d Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793

llvm-svn: 289882
2016-12-15 22:41:40 +00:00
Eli Friedman 34505083c6 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787

llvm-svn: 289874
2016-12-15 21:36:59 +00:00
Sanjay Patel afee21a5b2 [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489

llvm-svn: 289738
2016-12-14 22:59:14 +00:00
Nirav Dave f5bf03c7ef Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

llvm-svn: 289667
2016-12-14 16:43:44 +00:00
Nirav Dave 8527ab0ad2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
Simon Pilgrim 05ab8ffc7e [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

llvm-svn: 289654
2016-12-14 15:08:13 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Artur Pilipenko f3ee444010 Add a couple of assertions to the load combine code introduced by r289538
llvm-svn: 289646
2016-12-14 11:55:47 +00:00
Artur Pilipenko 469fcd2afd Use more detailed assertion messages in the code introduced by r289538
llvm-svn: 289545
2016-12-13 16:26:15 +00:00
Artur Pilipenko 79d1255e26 Fix a buildbot failure introduced by r289538
Build failed because of unused variable in product mode.

llvm-svn: 289540
2016-12-13 14:55:31 +00:00
Artur Pilipenko c93cc5955f [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149

llvm-svn: 289538
2016-12-13 14:21:14 +00:00
Artur Pilipenko 01e86444a0 Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user
llvm-svn: 289537
2016-12-13 14:16:02 +00:00
Simon Pilgrim 9dc67c0101 [SelectionDAG] computeKnownBits - simplified knownbits sign extension. NFCI.
We don't need to extract+test the sign bit of the known ones/zeros, we can use sext which will handle all of this.

llvm-svn: 289534
2016-12-13 13:36:27 +00:00
Philip Reames 51387a8c28 [Statepoints] Reuse stack slots more than once within a basic block
The stack slot reuse code had a really amusing bug. We ended up only reusing a stack slot exact once (initial use + reuse) within a basic block. If we had a third statepoint to process, we ended up allocating a new set of stack slots. If we crossed a basic block boundary, the set got cleared. As a result, code which is invoke heavy doesn't see the problem, but multiple calls within a basic block does. Net result: as we optimize invokes into calls, lowering gets worse.

The root error here is that the bitmap uses by the custom allocator wasn't kept in sync. The result was that we ended up resizing the bitmap on the next statepoint (to handle the cross block case), reset the bit once, but then never reset it again.

Differential Revision: https://reviews.llvm.org/D25243

llvm-svn: 289509
2016-12-13 01:21:15 +00:00
Simon Pilgrim 040a36c176 [SelectionDAG] Add support for EXTRACT_SUBVECTOR to ComputeNumSignBits
Pre-commit as discussed on D27657

llvm-svn: 289425
2016-12-12 10:29:43 +00:00
Simon Pilgrim 54945a12ec [SelectionDAG] Add ability for computeKnownBits to peek through bitcasts from 'large element' scalar/vector to 'small element' vector.
Extension to D27129 which already supported bitcasts from 'small element' vector to 'large element' scalar/vector types.

llvm-svn: 289329
2016-12-10 17:00:00 +00:00
Simon Pilgrim 017b7a71d8 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED)
Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type.

llvm-svn: 289232
2016-12-09 17:53:11 +00:00
Matt Arsenault 38d8ed2b75 AMDGPU: Fix i128 mul
llvm-svn: 289231
2016-12-09 17:49:14 +00:00
Nirav Dave bedb5d906c Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

llvm-svn: 289226
2016-12-09 17:18:24 +00:00
Nirav Dave fd51ff4fd8 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289221
2016-12-09 16:15:12 +00:00
Simon Pilgrim b9eb99f570 Use SelectionDAG.getSplatBuildVector helper. NFCI.
llvm-svn: 289220
2016-12-09 16:01:50 +00:00
Simon Pilgrim bf9c0e7434 [SelectionDAG] Use SelectionDAG.getBuildVector helper. NFCI.
Makes interception of BUILD_VECTOR creation easier for debugging.

llvm-svn: 289218
2016-12-09 15:23:41 +00:00
Simon Pilgrim 15f1f828b5 [SelectionDAG] Add additional checks to CONCAT_VECTORS creation
Part of the work for PR31323 - add extra asserts checking that the input vectors are of consistent type and result in the correct number of vector elements.

llvm-svn: 289214
2016-12-09 14:27:52 +00:00
Simon Pilgrim e4050a2961 [SelectionDAG] Add partial BITCAST support to computeKnownBits
Adds support for bitcasting a little endian 'small element' vector to 'large element' scalar/vector (e.g. v16i8 to v4i32 or v2i32 to i64), which is required for PR30845. We extract the knownbits for each 'small element' part and concatenate the results together.

We can add support for big endian and 'large element' scalar/vector to 'small element' vector bitcasting once we have test cases for them.

Differential Revision: https://reviews.llvm.org/D27129

llvm-svn: 289200
2016-12-09 10:13:45 +00:00
Daniel Jasper f51e05ffbc Revert "[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes"
This reverts commit r288916 as it is currently causing a crasher in
Halide. Reproducer on llvm.org/PR31323. While it might be that halide is
generating invalid IR, llc shouldn't crash.

llvm-svn: 289194
2016-12-09 09:04:51 +00:00
Nicolai Haehnle f08dc90253 [SelectionDAG] Add expansion and promotion of [US]MUL_LOHI
Summary:
Most targets set the action for these nodes to Expand even though there
isn't actually any code for them in ExpandNode. Instead, targets simply
relied on the fact that no code generates these nodes as long as the
nodes aren't legal or custom.

However, generating these nodes can be useful e.g. for divide-by-constant
in wider integer types.

Expand of [US]MUL_LOHI will use MULH[US] when legal or custom, and
a sequence of half-width multiplications otherwise. Promote uses a wider
multiply.

This patch intends to not change the generated code, but indirect effects
are possible since expansions/promotions that were previously done in
DAGCombine may now be done in LegalizeDAG.

See D24822 for a change that actually uses the new expansion.

Reviewers: spatel, bkramer, venkatra, efriedma, hfinkel, ast, nadav, tstellarAMD

Subscribers: arsenm, jyknight, nemanjai, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24956

llvm-svn: 289050
2016-12-08 14:08:14 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim ff79f31328 [SelectionDAG] Removed old knownbits TODO comment. NFCI.
EXTRACT_VECTOR_ELT does support demanded elts if the element index is known and in range.

llvm-svn: 288913
2016-12-07 15:31:12 +00:00
Eli Friedman 0a76e3241f [CodeGen] Fix result type for SMULO/UMULO legalization
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.

This fixes issue https://github.com/rust-lang/rust/issues/37829

Patch by Vadzim Dambrouski!

Differential Revision: https://reviews.llvm.org/D27154

llvm-svn: 288857
2016-12-06 22:49:36 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Simon Pilgrim 29c17f3f58 Avoid repeated calls to Op.getOpcode(). NFCI.
llvm-svn: 288814
2016-12-06 14:50:09 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Justin Bogner 35c5e58f8c SDAG: Avoid a large, usually empty SmallVector in a recursive function
This SmallVector is using up 128 bytes on the stack every time despite
almost always being empty[1], and since this function can recurse quite
deeply that adds up to a lot of overhead. We've seen this run afoul of
ulimits in some cases with ASAN on.

Replacing the SmallVector with a std::vector trades an occasional heap
allocation for vastly less stack usage.

[1]: I gathered some stats on an internal test suite and the vector
was non-empty in only 45,000 of 10,000,000 calls to this function.

llvm-svn: 288441
2016-12-02 00:11:01 +00:00
Matthias Braun d0ee66c2e9 Move most EH from MachineModuleInfo to MachineFunction
Recommitting r288293 with some extra fixes for GlobalISel code.

Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288405
2016-12-01 19:32:15 +00:00
Nicolai Haehnle da7e4017c6 [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC)
Summary: Suggested by @spatel in D26602.

Reviewers: spatel, hfinkel

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27260

llvm-svn: 288336
2016-12-01 14:04:13 +00:00
Eric Christopher e70b7c3dfb Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction"
This apprears to have broken the global isel bot:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console

This reverts commit r288293.

llvm-svn: 288322
2016-12-01 07:50:12 +00:00
Matthias Braun ed14cb0604 Move most EH from MachineModuleInfo to MachineFunction
Most of the exception handling members in MachineModuleInfo is actually
per function data (talks about the "current function") so it is better
to keep it at the function instead of the module.

This is a necessary step to have machine module passes work properly.

Also:
- Rename TidyLandingPads() to tidyLandingPads()
- Use doxygen member groups instead of "//===- EH ---"... so it is clear
  where a group ends.
- I had to add an ugly const_cast at two places in the AsmPrinter
  because the available MachineFunction pointers are const, but the code
  wants to call tidyLandingPads() in between
  (markFunctionEnd()/endFunction()).

Differential Revision: https://reviews.llvm.org/D27227

llvm-svn: 288293
2016-11-30 23:49:01 +00:00
Matthias Braun ef331eff5a Move VariableDbgInfo from MachineModuleInfo to MachineFunction
VariableDbgInfo is per function data, so it makes sense to have it with
the function instead of the module.

This is a necessary step to have machine module passes work properly.

Differential Revision: https://reviews.llvm.org/D27186

llvm-svn: 288292
2016-11-30 23:48:50 +00:00
Nicolai Haehnle 73a9a27b5a [SelectionDAG] Refactor TargetLowering::expandMUL (NFC)
Summary: Further preparation for the expansion of MUL_LOHI added in D24956.

Reviewers: efriedma, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27064

llvm-svn: 288248
2016-11-30 16:26:33 +00:00
Warren Ristow d9777c1dbb Test commit. Comment changes. NFC.
llvm-svn: 288100
2016-11-29 02:37:13 +00:00
Sanjay Patel 2bd32b05fb [DAG] clean up foldSelectCCToShiftAnd(); NFCI
llvm-svn: 288088
2016-11-28 23:05:55 +00:00
Sanjay Patel 1cf9aff659 [DAG] add helper function for selectcc --> and+shift transforms; NFC
llvm-svn: 288073
2016-11-28 21:47:41 +00:00
Nirav Dave a413361798 Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"
This reverts commit r287773 which caused issues with ppc64le builds.

llvm-svn: 288035
2016-11-28 14:30:29 +00:00
Simon Pilgrim c5fb167df0 Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287941
2016-11-25 17:25:21 +00:00
Craig Topper 8c4cdf06db [DAGCombine] Teach DAG combine that if both inputs of a vselect are the same, then the condition doesn't matter and the vselect can be removed.
Selects with scalar condition already handle this correctly.

llvm-svn: 287904
2016-11-24 21:48:52 +00:00
Nicolai Haehnle 934470f536 [SelectionDAG] Early-out in TargetLowering::expandMUL (NFC)
Summary: Reduce indentation level; preparation for D24956.

Reviewers: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27063

llvm-svn: 287831
2016-11-23 22:14:20 +00:00
Nirav Dave cf34556330 [DAG] Improve loads-from-store forwarding to handle TokenFactor
Forward store values to matching loads down through token
factors. Factored from D14834.

Reviewers: jyknight, hfinkel

Subscribers: hfinkel, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D26080

llvm-svn: 287773
2016-11-23 16:48:35 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Matt Arsenault b30d2aca58 DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

llvm-svn: 287593
2016-11-21 22:56:42 +00:00
Simon Pilgrim 5662074ba3 [VectorLegalizer] Remove EVT::getSizeInBits code duplications. NFCI.
We were calling SVT.getSizeInBits() several times in a row - just call it once and reuse the result.

llvm-svn: 287556
2016-11-21 18:24:44 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00
Simon Pilgrim 7a6b6d5656 Fix spelling mistakes in SelectionDAG comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287487
2016-11-20 13:14:57 +00:00
Simon Pilgrim e40900dddd [SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode
llvm-svn: 287387
2016-11-18 22:21:22 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Simon Pilgrim c4d733cd6a Fix spelling in comment. NFC.
llvm-svn: 287222
2016-11-17 12:03:05 +00:00
Chris Bieneman 05c279fc4b [CMake] NFC. Updating CMake dependency specifications
This patch updates a bunch of places where add_dependencies was being explicitly called to add dependencies on intrinsics_gen to instead use the DEPENDS named parameter. This cleanup is needed for a patch I'm working on to add a dependency debugging mode to the build system.

llvm-svn: 287206
2016-11-17 04:36:50 +00:00
Ahmed Bougacha bd6ce9a247 [CodeGen] Pass references, not pointers, to MMI helpers. NFC.
While there, rename them to follow the coding style.

llvm-svn: 287169
2016-11-16 22:25:03 +00:00
Ahmed Bougacha 456dce8a84 [CodeGen] Pull MMI helpers from FunctionLoweringInfo to MMI. NFC.
They're not SelectionDAG- or FunctionLoweringInfo-specific.  They
are, however, specific to building MMI from IR.
We could make them members, but it's nice having MMI be a "simple" data
structure and this logic kept separate.

This also lets us reuse them from GlobalISel.

llvm-svn: 287167
2016-11-16 22:24:56 +00:00
Pawel Bylica c3f6c97f71 Integer legalization: fix MUL expansion
Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

llvm-svn: 286998
2016-11-15 18:29:24 +00:00
Joerg Sonnenberger 1a7eec68a9 Introduce TLI predicative for base-relative Jump Tables.
For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases.  As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

llvm-svn: 286951
2016-11-15 12:39:46 +00:00
Asaf Badouh b573553424 DAGCombiner: fix combine of trunc and select
bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449


 

llvm-svn: 286938
2016-11-15 07:55:22 +00:00
Simon Pilgrim 807f9cf243 [SelectionDAG] Add support for vector demandedelts in BSWAP opcodes
llvm-svn: 286582
2016-11-11 11:51:29 +00:00
Simon Pilgrim 813721e98a [SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes
llvm-svn: 286578
2016-11-11 11:23:43 +00:00
Simon Pilgrim 0652227814 [SelectionDAG] Add support for vector demandedelts in UDIV opcodes
llvm-svn: 286576
2016-11-11 10:47:24 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Simon Pilgrim 38f0045cb0 [SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes
llvm-svn: 286516
2016-11-10 22:41:49 +00:00
Simon Pilgrim fe3a54371d [SelectionDAG] Add support for splatted vectors in SUB opcode
llvm-svn: 286509
2016-11-10 21:57:42 +00:00
Simon Pilgrim d67af68f06 [SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes
llvm-svn: 286481
2016-11-10 17:43:52 +00:00
Simon Pilgrim 33fef8e865 Use common SDLoc. NFCI.
llvm-svn: 286473
2016-11-10 16:47:09 +00:00
Simon Pilgrim ee187fd6e7 [SelectionDAG] Add support for vector demandedelts in MUL opcodes
llvm-svn: 286471
2016-11-10 16:27:42 +00:00
Simon Pilgrim ca57e53ded [SelectionDAG] Add support for vector demandedelts in SRA opcodes
llvm-svn: 286461
2016-11-10 15:05:09 +00:00
Simon Pilgrim 37c9034bd6 [DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes
We were failing to extract a constant splat shift value if the shifted value was being masked.

The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.

llvm-svn: 286454
2016-11-10 14:35:09 +00:00
Simon Pilgrim 3bf99c056a [SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes
llvm-svn: 286448
2016-11-10 13:52:42 +00:00
Simon Pilgrim 778596bf59 [TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.

The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....

This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).

The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.

Differential Revision: https://reviews.llvm.org/D26031

llvm-svn: 286238
2016-11-08 15:07:01 +00:00
Simon Pilgrim d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Richard Smith 857efb0880 Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel.
Differential Revision: https://reviews.llvm.org/D26292

llvm-svn: 286119
2016-11-07 16:47:20 +00:00
Simon Pilgrim 39df78e384 [SelectionDAG] Add support for vector demandedelts in XOR opcodes
llvm-svn: 286075
2016-11-06 16:49:19 +00:00
Simon Pilgrim dd4809a603 [SelectionDAG] Add support for vector demandedelts in OR opcodes
llvm-svn: 286071
2016-11-06 16:29:09 +00:00
Nicolai Haehnle bea772c6dc DAGCombiner: fix use-after-free when merging consecutive stores
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.

Reviewers: chandlerc, bogner, niravd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25601

llvm-svn: 285916
2016-11-03 14:25:04 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Simon Pilgrim 93f2f7fb6c Use !operator to test if APInt is zero/non-zero. NFCI.
Avoids APInt construction and slower comparisons.

llvm-svn: 285822
2016-11-02 15:41:15 +00:00
Joerg Sonnenberger 5e31b3ad93 Simplify.
llvm-svn: 285802
2016-11-02 12:45:28 +00:00
Sanjay Patel 70c5f02d25 [DAG] disable nsw/nuw for add/sub/mul when simplifying based on demanded bits (PR30841)
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844

The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841 

llvm-svn: 285656
2016-10-31 23:28:45 +00:00
Sanjay Patel 339a51ac13 [DAG] x | x --> x
llvm-svn: 285522
2016-10-30 18:19:35 +00:00
Sanjay Patel 13aee345ca [DAG] x & x --> x
llvm-svn: 285521
2016-10-30 18:13:30 +00:00
Simon Pilgrim 75a697a17e [DAGCombiner] (REAPPLIED) Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285494
2016-10-29 11:29:39 +00:00
Davide Italiano 86168b23cf [DAGCombiner] Fix a crash visiting `AND` nodes.
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.

Differential Revision:  https://reviews.llvm.org/D26098

llvm-svn: 285480
2016-10-28 23:55:32 +00:00
Justin Bogner db6b6a7f0c SDAG: Make sure we use an allocatable reg class when we create this vreg
As per the discussion on r280783, if constrainRegClass fails we need
to call getAllocatableClass like we did before that commit.

llvm-svn: 285467
2016-10-28 22:42:54 +00:00
Simon Pilgrim d9189891fc [SelectionDAG] computeKnownBits - early-out if any BUILD_VECTOR element has no known bits
No need to check the remaining elements - no common known bits are available.

llvm-svn: 285399
2016-10-28 14:07:44 +00:00
Simon Pilgrim 8c043061e5 [SelectionDAG] Tidyup UDIV computeKnownBits implementation
No need to clear KnownOne2/KnownZero2 bits as the next call to computeKnownBits will overwrite them anyway

llvm-svn: 285398
2016-10-28 13:42:23 +00:00
Simon Pilgrim 755cef1ba8 [SelectionDAG] Increment computeKnownBits recursion depth for SMIN/SMAX/UMIN/UMAX like all other ops
llvm-svn: 285397
2016-10-28 13:13:16 +00:00
Juergen Ributzka 5cee232be4 Revert "[DAGCombiner] Add vector demanded elements support to computeKnownBits"
This seems to have increased LTO compile time bejond 2x of previous builds.
See http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/10676/

llvm-svn: 285381
2016-10-28 04:01:12 +00:00
Simon Pilgrim 01e755eab1 [DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.

I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.

DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.

Differential Revision: https://reviews.llvm.org/D25691

llvm-svn: 285296
2016-10-27 14:29:28 +00:00
Nemanja Ivanovic 275853e777 Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.

This fixes PR 30715.

llvm-svn: 285231
2016-10-26 19:51:35 +00:00
Tom Stellard 284cf32ab4 LegalizeDAG: Support promoting [US]DIV and [US]REM operations
Summary:
AMDGPU will need this one i16 is added as a legal type.  This is tested by:

test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll

Reviewers: bogner, efriedma

Subscribers: efriedma, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25699

llvm-svn: 285199
2016-10-26 14:52:25 +00:00
Simon Pilgrim de86241a09 [DAGCombiner] Enable (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1)) combine for splatted vectors
llvm-svn: 285129
2016-10-25 22:01:09 +00:00
Simon Pilgrim f534573e8c [DAGCombiner] Enable srem(x.y) -> urem(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285123
2016-10-25 21:20:18 +00:00
Simon Pilgrim 4ebb04510a [DAGCombiner] Enable sdiv(x.y) -> udiv(x,y) combine for vectors
SelectionDAG::SignBitIsZero (via SelectionDAG::computeKnownBits) has supported vectors since rL280927

llvm-svn: 285118
2016-10-25 20:56:42 +00:00
Evandro Menezes 601f4cb9f7 Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases.  The motivation is
that many contemporary processors typically perform case switches fairly
quickly.

Differential revision: https://reviews.llvm.org/D25212

llvm-svn: 285099
2016-10-25 19:11:43 +00:00
Zvi Rackover 124470a202 [DAGCombine] Preserve shuffles when one of the vector operands is constant
Summary:
Do *not* perform combines such as:

    vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
    ->
    build_vector(X, C0, C1, C2)

Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).

Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25524

llvm-svn: 285063
2016-10-25 12:14:19 +00:00
Simon Pilgrim e3e6585c2d [SelectionDAG] Update ComputeNumSignBits SRA/SHL handlers to accept scalar or vector splats
Use isConstOrConstSplat helper.

Also use APInt instead of getZExtValue directly to avoid out of range issues.

llvm-svn: 285033
2016-10-24 21:47:19 +00:00
Simon Pilgrim d8ec09c74f Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 285025
2016-10-24 20:56:52 +00:00
Peter Collingbourne 16e9b944e9 CodeGen: Do not add a global's address space to the folding set profile.
It is already part of the type (which is part of the global, which is already
being added), so there's no need to do it.

llvm-svn: 285002
2016-10-24 18:56:09 +00:00
Sanjay Patel 9ca028c2d6 [DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constant
llvm-svn: 284953
2016-10-23 23:13:31 +00:00
Simon Pilgrim d06641d3dc Use SDValue::getConstantOperandVal() helper. NFCI.
llvm-svn: 284949
2016-10-23 20:17:21 +00:00
Sanjay Patel ca92c36e01 [DAG] enhance computeKnownBits to handle SHL with vector splat constant
Also, use APInt to avoid crashing on types larger than vNi64.

llvm-svn: 284874
2016-10-21 20:16:27 +00:00
Sanjay Patel 81029f6a76 [DAG] fold negation of sign-bit
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value

This is the DAG equivalent of:
https://reviews.llvm.org/rL284649

plus the fold for the NUW case which already existed in InstSimplify.

Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().

llvm-svn: 284844
2016-10-21 17:24:26 +00:00
Sanjay Patel cbaba93ce8 [DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.

Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x

Note that although the test cases include a 'sin' function call, I'm side-stepping the 
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.

Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.

Differential Revision: https://reviews.llvm.org/D25297

llvm-svn: 284824
2016-10-21 14:36:58 +00:00
Pirama Arumuga Nainar 05b0f93ad3 Fix *_EXTEND_VECTOR_INREG legalization
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation.  If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)

Updated the promote-vec3.ll test after this change.  The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.

Reviewers: RKSimon

Subscribers: llvm-commits, srhines

Differential Revision: https://reviews.llvm.org/D25790

llvm-svn: 284752
2016-10-20 17:56:36 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Simon Pilgrim 618d3aedaf [DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> (and x, cst2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284717
2016-10-20 11:10:21 +00:00
Simon Pilgrim e32d0f8413 Merged nested ifs. NFCI.
llvm-svn: 284616
2016-10-19 17:30:24 +00:00
Simon Pilgrim a20aeea998 [DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284613
2016-10-19 17:12:22 +00:00
Reid Kleckner f7ad5341d0 [WinEH] Allow catchpads to reuse the same catch object
This code used a regular when it should have used a multimap.

llvm-svn: 284612
2016-10-19 17:08:23 +00:00
Sanjay Patel 3a3aaf67e0 [DAG] optimize negation of bool
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.

This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395

There may be others, so please let me know if you see any regressions.

Differential Revision: https://reviews.llvm.org/D25485

llvm-svn: 284611
2016-10-19 16:58:59 +00:00
Simon Pilgrim 4554e161be [DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) -> (and x, (shl -1, c1))
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284608
2016-10-19 16:15:30 +00:00
Simon Pilgrim c2e9724909 [DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) -> (mul x, c1 << c2)
We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector

llvm-svn: 284607
2016-10-19 15:59:28 +00:00
Simon Pilgrim 7dcb6e572e [DAGCombiner] Just call isConstOrConstSplat directly. NFCI.
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.

llvm-svn: 284578
2016-10-19 11:28:15 +00:00
Simon Pilgrim b2ca2505cc [DAGCombine] Generalize distributeTruncateThroughAnd to work with any non-opaque constant or constant vector
llvm-svn: 284574
2016-10-19 08:57:37 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Simon Pilgrim 25e9628978 [DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x >>u (log2(pow2)+y)
llvm-svn: 284491
2016-10-18 16:36:00 +00:00
Simon Pilgrim 65e0c73875 Strip trailing whitespace (NFCI)
llvm-svn: 284478
2016-10-18 13:44:00 +00:00
Sanjay Patel 523cd8290a [DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA
The scalar version of this pattern was noted in:
https://reviews.llvm.org/D25485

and fixed with:
https://reviews.llvm.org/rL284395

More refactoring of the constant/splat helpers is needed and will happen in follow-up patches.

Differential Revision: https://reviews.llvm.org/D25685

llvm-svn: 284424
2016-10-17 20:41:39 +00:00
Sanjay Patel a7cab58055 [DAG] make isConstOrConstSplat and isConstOrConstSplatFP more accessible; NFC
As noted in:
https://reviews.llvm.org/D25685

This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch. 
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.

llvm-svn: 284421
2016-10-17 20:26:46 +00:00
Sanjay Patel 2cf6bfaf73 [DAG] optimize away an arithmetic-right-shift of a 0 or -1 value
This came up as part of:
https://reviews.llvm.org/D25485

Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.

llvm-svn: 284395
2016-10-17 15:58:28 +00:00
James Molloy aa79b19a3e [SDAG] Use ABI type alignment for constant pools when optimizing for size
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.

In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.

llvm-svn: 284381
2016-10-17 12:54:07 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Sanjay Patel 72b5ff646d [DAG] avoid creating illegal node when transforming negated shifted sign bit
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.

llvm-svn: 284268
2016-10-14 19:46:31 +00:00
Tom Stellard ab61007914 TargetLowering: Add SimplifyDemandedBits() helper to TargetLoweringOpt
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses.  SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:

and v1, v0, 0xffffff
mul24 v2, v1, v1      ; Multiply ignoring high 8-bits.

To:

mul24 v2, v0, v0

Where before this would not be optimized, because v1 has multiple uses.

Reviewers: bogner, arsenm

Subscribers: nhaehnle, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24964

llvm-svn: 284266
2016-10-14 19:14:26 +00:00
Sanjay Patel 00fc7a6159 [DAG] add folds for negated shifted sign bit
The same folds exist in InstCombine already.

This came up as part of:
https://reviews.llvm.org/D25485

llvm-svn: 284239
2016-10-14 14:26:47 +00:00
Nicolai Haehnle 86e72d98dd Fix use-after-frees
Extracted from D25313, as suggested by Justin Bogner.

llvm-svn: 284220
2016-10-14 09:49:51 +00:00
Craig Topper 40feb7f157 [DAGCombiner] Teach createBuildVecShuffle to handle cases where input vectors are less than half of the output vector size.
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.

llvm-svn: 284204
2016-10-14 06:00:42 +00:00
Sanjay Patel 98d0ea64ca [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 284170
2016-10-13 22:27:10 +00:00
Tom Stellard f80c1875a3 LegalizeDAG: Implement PROMOTE for ISD::BITREVERSE
Summary:
This operation is promoted the same way was ISD::BSWAP.  This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.

Reviewers: bogner, hfinkel

Subscribers: hfinkel, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D25202

llvm-svn: 284163
2016-10-13 21:03:49 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Simon Pilgrim cb59b5257c [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines
llvm-svn: 284122
2016-10-13 14:04:35 +00:00
Simon Pilgrim fa8fadc0e5 [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding
llvm-svn: 284117
2016-10-13 12:49:31 +00:00
Simon Pilgrim 833b8a2071 [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization
Improves commutation potential

llvm-svn: 284113
2016-10-13 12:05:20 +00:00
Albert Gutowski 795d7d6381 Create llvm.addressofreturnaddress intrinsic
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.

Reviewers: majnemer, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25293

llvm-svn: 284061
2016-10-12 22:13:19 +00:00
Simon Pilgrim 08190943cb [DAGCombiner] Update most ADD combines to support general vector combines
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.

Differential Revision: https://reviews.llvm.org/D25374

llvm-svn: 284015
2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov 081385a74e [DAGCombiner] Do not remove the load of stored values when optimizations are disabled
This combiner breaks debug experience and should not be run when optimizations are disabled.

For example:
  int main() {
    int j = 0;
    j += 2;
    if (j == 2)
      return 0;
    return 5;
  }
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.

Differential Revision: https://reviews.llvm.org/D19268

llvm-svn: 284014
2016-10-12 13:44:24 +00:00
Michael Kuperstein 7adbf6b042 [DAG] Fix crash in build_vector -> vector_shuffle combine
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.

llvm-svn: 283953
2016-10-11 22:44:31 +00:00
Arnold Schwaighofer 9103e268cf Silence -Wunused-but-set-variable warning
llvm-svn: 283927
2016-10-11 19:49:29 +00:00
Sanjay Patel 8253e15ef3 [DAG] add fold for masked negated sign-extended bool
This enhances the fold added with:
https://reviews.llvm.org/rL283900

llvm-svn: 283905
2016-10-11 17:05:52 +00:00
Sanjay Patel 8384703d9b [DAG] add fold for masked negated extended bool
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG 
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction. 

An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect 
different patterns. In this case, the SimplifyDemandedBits fold prevents us from 
performing a zext to sext fold that would then be recognized as a negation of a sext. 

llvm-svn: 283900
2016-10-11 16:26:36 +00:00
Sanjay Patel 38a42e4bfa [DAG] simplify logic; NFC
llvm-svn: 283885
2016-10-11 14:14:30 +00:00
Sanjay Patel 907ae69125 [DAG] hoist DL(N) and fix formatting; NFC
llvm-svn: 283884
2016-10-11 14:04:24 +00:00
Sanjay Patel 9609f3d6c7 [DAG] fix formatting; NFC
llvm-svn: 283878
2016-10-11 13:47:43 +00:00
Hal Finkel fcd2421667 [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.

Patch by Edward Jones, thanks!

Differential Revision: https://reviews.llvm.org/D24459

llvm-svn: 283797
2016-10-10 20:45:15 +00:00
Elena Demikhovsky 5b10aa1f1e DAG: Setting Masked-Expand-Load as a variant of Masked-Load node
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions. 
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.

Differential Revision: https://reviews.llvm.org/D25322

llvm-svn: 283694
2016-10-09 10:48:52 +00:00
Arnold Schwaighofer 3f25658143 swifterror: Don't compute swifterror vregs during instruction selection
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.

Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.

Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.

When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.

This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.

rdar://28300923

llvm-svn: 283617
2016-10-07 22:06:55 +00:00
Sanjay Patel 14c02052d6 [DAG] clean up foldSelectOfConstants(); NFCI
Rename variables, simplify logic. 
Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too.

llvm-svn: 283613
2016-10-07 21:55:42 +00:00
Sanjay Patel ecaf343fe7 [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFC
We're missing at least 3 other similar folds based on what we have in InstCombine. 

llvm-svn: 283596
2016-10-07 20:47:51 +00:00
Vedant Kumar 7beb423765 Delete some dead code in SelectionDAG (NFC)
Differential Revision: https://reviews.llvm.org/D24435

llvm-svn: 283505
2016-10-06 22:53:43 +00:00
Pirama Arumuga Nainar cc152ac794 Handle *_EXTEND_VECTOR_INREG during Integer Legalization
Summary:
These nodes need legalization for 3-element vectors.  This commit
handles the legalization and adds tests for zext and sext.

This fixes PR30614.

Reviewers: RKSimon, srhines

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25268

llvm-svn: 283496
2016-10-06 21:27:05 +00:00
Michael Kuperstein 7cc2123847 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683

llvm-svn: 283480
2016-10-06 18:58:24 +00:00
Peter Collingbourne d799d28540 FastISel: Remove unused/un-overridden entry points. NFCI.
llvm-svn: 283366
2016-10-05 19:25:20 +00:00
Bjorn Pettersson 12559441bd [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT.
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.

Reviewers: bogner, tstellarAMD, mkuper

Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25007

llvm-svn: 283347
2016-10-05 17:40:27 +00:00
Mehdi Amini 3e021be3b6 Use StringRef in FastISel API (NFC)
llvm-svn: 283291
2016-10-05 01:37:29 +00:00
whitequark 7c4fe0e9a3 [SelectionDAG] Fix calling convention in expansion of ?MULO.
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.

The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.

Thanks to James Miller <james@aatch.net> for help in debugging.

Differential Revision: https://reviews.llvm.org/D25223

llvm-svn: 283203
2016-10-04 09:07:49 +00:00
Sanjay Patel f7df85af87 fix formatting; NFC
llvm-svn: 283115
2016-10-03 15:18:36 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Michael Kuperstein 3e06eafc20 [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.

Differential Revision: https://reviews.llvm.org/D24625

llvm-svn: 282567
2016-09-28 06:13:58 +00:00
Evandro Menezes e45de8a5ec Add support to optionally limit the size of jump tables.
Many high-performance processors have a dedicated branch predictor for
indirect branches, commonly used with jump tables.  As sophisticated as such
branch predictors are, they tend to have well defined limits beyond which
their effectiveness is hampered or even nullified.  One such limit is the
number of possible destinations for a given indirect branches that such
branch predictors can handle.

This patch considers a limit that a target may set to the number of
destination addresses in a jump table.

Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar
<aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>.

Differential revision: https://reviews.llvm.org/D21940

llvm-svn: 282412
2016-09-26 15:32:33 +00:00
Ayman Musa d7a5ed4141 [X86][avx512] Fix bug in masked compress store.
Differential Revision: https://reviews.llvm.org/D23984

llvm-svn: 282381
2016-09-26 06:22:08 +00:00
Nirav Dave 9011da3d44 [DAG] Fix incorrect alignment of ext load.
Correctly use alignment size from loaded size not output value size.

Reviewers: jyknight, tstellarAMD, arsenm

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23356

llvm-svn: 282177
2016-09-22 17:28:43 +00:00
Arnold Schwaighofer de2490d0dc Disable tail calls if there is an swifterror argument
ISel does not handle them correctly yet i.e we crash trying to emit tail call
code.

radar://28407842

llvm-svn: 282088
2016-09-21 16:53:36 +00:00
Craig Topper af5ee86bc9 [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.

llvm-svn: 281868
2016-09-18 21:49:32 +00:00
Simon Pilgrim 6c21e6a54e [X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.

This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).

While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.

Differential Revision: https://reviews.llvm.org/D24343

llvm-svn: 281852
2016-09-18 12:45:23 +00:00
Wei Mi ab24cd189f Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.

llvm-svn: 281846
2016-09-18 06:10:32 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Sanjay Patel 284582b6d4 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ; NFCI
llvm-svn: 281498
2016-09-14 16:54:10 +00:00
Sanjay Patel 1ed771f5d7 getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281495
2016-09-14 16:37:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Pawel Bylica c397f0b272 [CodeGen] Fix invalid shift in mul expansion
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.

Reviewers: hfinkel, majnemer, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D24478

llvm-svn: 281403
2016-09-13 21:55:41 +00:00
Michael Kuperstein 59f8305305 [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).

Differential Revision: https://reviews.llvm.org/D24491

llvm-svn: 281402
2016-09-13 21:53:32 +00:00
Simon Pilgrim 4a8eba3e96 [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.

Followup to D23007

llvm-svn: 281362
2016-09-13 18:33:29 +00:00
Simon Pilgrim bd28a85d14 [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine
Fix failure to detect out of range shift constants leading to assert in ConstantSDNode::getZExtValue()

Followup to D23007

llvm-svn: 281354
2016-09-13 17:15:28 +00:00
Ayman Musa 0c2da88f82 Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764

llvm-svn: 281308
2016-09-13 09:12:45 +00:00
Michael Kuperstein efc0667583 [DAG] Refactor BUILD_VECTOR combine to make it easier to extend. NFCI.
This should make it easier to add cases that we currently don't cover,
like supporting more kinds of type mismatches and more than 2 input vectors.

llvm-svn: 281283
2016-09-13 00:57:43 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Arnold Schwaighofer 7d7b4b4014 Create phi nodes for swifterror values at the end of the phi instructions list
ISel makes assumption about the order of phi nodes.

rdar://28190150

llvm-svn: 281095
2016-09-09 21:18:47 +00:00
Simon Pilgrim 153b408433 [SelectionDAG] Ensure DAG::getZeroExtendInReg is called with a scalar type
Fixes issue with rL280927 identified by Mikael Holmén

llvm-svn: 281042
2016-09-09 13:31:52 +00:00
James Molloy c6a6144966 [SDAGBuilder] Don't create a binary tree for switches in minsize mode
This bloats codesize - all of the non-leaf nodes are extra code.

llvm-svn: 280932
2016-09-08 13:12:22 +00:00
Simon Pilgrim cc7b4b511b [SelectionDAG] Add BUILD_VECTOR support to computeKnownBits and SimplifyDemandedBits
Add the ability to computeKnownBits and SimplifyDemandedBits to extract the known zero/one bits from BUILD_VECTOR, returning the known bits that are shared by every vector element.

This is an initial step towards determining the sign bits of a vector (PR29079).

Differential Revision: https://reviews.llvm.org/D24253

llvm-svn: 280927
2016-09-08 12:57:51 +00:00
Simon Pilgrim a01ee07a19 [DAGCombiner] Enable AND combines of splatted constant vectors
Allow AND combines to use a vector splatted constant as well as a constant scalar.

Preliminary part of D24253.

llvm-svn: 280926
2016-09-08 12:36:39 +00:00
Elena Demikhovsky dcc86d5bb6 Shift-left (ISD::SHL) operation crashes on "DAG Legalization" phase.
https://llvm.org/bugs/show_bug.cgi?id=29058.

While node legalization we tried to legalize its operands.
If an operand node is replaced during legalization the user node may be destroyed.

Differential Revision: https://reviews.llvm.org/D24244

llvm-svn: 280862
2016-09-07 20:54:33 +00:00
Matt Arsenault 6cda10c950 Remove unnecessary call to getAllocatableRegClass
This reapplies r252565 and r252674, effectively reverting r252956.

This allows VS_32/VS_64 to be unallocatable like they should be.

llvm-svn: 280783
2016-09-07 06:16:45 +00:00
Hal Finkel 8ca2ed22b2 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 280767
2016-09-06 23:02:23 +00:00
Simon Pilgrim 1b4462b7c1 [SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ), Idx ) -> In
If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.

This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.

Differential Revision: https://reviews.llvm.org/D24254

llvm-svn: 280715
2016-09-06 16:42:05 +00:00
Wei Mi c54d1298f5 Split the store of a wide value merged from an int-fp pair into multiple stores.
For the store of a wide value merged from a pair of values, especially int-fp pair,
sometimes it is more efficent to split it into separate narrow stores, which can
remove the bitwise instructions or sink them to colder places.

Now the feature is only enabled on x86 target, and only store of int-fp pair is
splitted. It is possible that the application scope gets extended with perf evidence
support in the future.

Differential Revision: https://reviews.llvm.org/D22840

llvm-svn: 280505
2016-09-02 17:17:04 +00:00
Andrea Di Biagio fd503e5af3 [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.
This fixes a regression introduced by revision 268094.
Revision 268094 added the following dag combine rule:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2

That rule converts a truncate of a shift-by-constant into a shift of a truncated
value. We do this only if the shift count is less than half the size in bits of
the truncated value (K < vt.size / 2).

The problem is that the constraint on the shift count is incorrect, so the rule
doesn't work well in some cases involving vector types. The combine rule should
have been written instead like this:
// trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits()

Basically, if K is smaller than the "scalar size in bits" of the truncated value
then we know that by "sinking" the truncate into the operand of the shift we
would never accidentally make the shift undefined.

This patch fixes the check on the shift count, and adds test cases to make sure
that we don't regress the behavior.

Differential Revision: https://reviews.llvm.org/D24154

llvm-svn: 280482
2016-09-02 11:29:09 +00:00
Aditya Kumar 356f79d535 [SelectionDAGBuilder] Add const to relevant places
Reviewers: hans, evandro, sebpop

Differential Revision: https://reviews.llvm.org/D24112

llvm-svn: 280430
2016-09-01 23:35:26 +00:00
Michael Kuperstein 7bc54cebea [Legalizer] Don't throw away false low half when expanding GT/LT SETCC
When expanding a SETCC for which the low half is known to evaluate to false,
we can only throw it away for LT/GT comparisons, not LE/GE.

This fixes PR29170.

Differential Revision: https://reviews.llvm.org/D24151

llvm-svn: 280424
2016-09-01 23:02:32 +00:00
Michael Kuperstein 5f17d08f49 [SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes
Prior to this, we could generate a vector_shuffle from an IR shuffle when the
size of the result was exactly the sum of the sizes of the input vectors.
If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle
with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts
and inserts.

Instead, we can form a larger vector_shuffle, and then extract a subvector
of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8>
and then extract a <12 x i8>.

This also includes a target-specific X86 combine that in the presence of
AVX2 combines:
(vector_shuffle <mask> (concat_vectors t1, undef)
                       (concat_vectors t2, undef))
into:
(vector_shuffle <mask> (concat_vectors t1, t2), undef)
in cases where this allows us to form VPERMD/VPERMQ.

(This is not a separate commit, as that pattern does not appear without
the DAGBuilder change.)

llvm-svn: 280418
2016-09-01 21:32:09 +00:00
Michael Kuperstein b4743597bd Rename some variables to have meaningful names. NFC.
llvm-svn: 280391
2016-09-01 18:24:42 +00:00
Michael Kuperstein 65bc3c89ff [DAGCombine] Don't fold a trunc if it feeds an anyext
Legalization tends to create anyext(trunc) patterns. This should always be
combined - into either a single trunc, a single ext, or nothing if the
types match exactly. But if we happen to combine the trunc first, we may pull
the trunc away from the anyext or make it implicit (e.g. the truncate(extract)
-> extract(bitcast) fold).

To prevent this, we can avoid doing the fold, similarly to how we already handle
fpround(fpextend).

Differential Revision: https://reviews.llvm.org/D23893

llvm-svn: 280386
2016-09-01 17:59:24 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Philip Reames 2b1084ac93 [statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes.

The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate.

Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.)

My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default.

Differential Revision: https://reviews.llvm.org/D24000

llvm-svn: 280250
2016-08-31 15:12:17 +00:00
Krzysztof Parzyszek 354832e585 Propagate TBAA info in SelectionDAG::getIndexedLoad
Patch by Pranav Bhandarkar.

llvm-svn: 279998
2016-08-29 19:50:15 +00:00
Igor Breger 24281b4740 Fixed a bug in type legalizer for masked gather.
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961
2016-08-29 09:12:31 +00:00
Quentin Colombet e063e1f68a [SelectionDAG] Do not run the ISel process on already selected code.
Right now, this cannot happen, but with the fall back path of GlobalISel
it will show up eventually.

llvm-svn: 279877
2016-08-26 22:32:55 +00:00
Michael Kuperstein 260daed147 Reuse an SDLoc throughout a function. NFC.
llvm-svn: 279767
2016-08-25 18:50:56 +00:00
Justin Lebar 1972e222ea [SelectionDAG] Use a union of bitfield structs for SDNode::SubclassData.
Summary:
This greatly simplifies our handling of SDNode::SubclassData.

NFC, hopefully.  :)

See discussion in D23035 for discussion about the design API of these
bitfields.

Reviewers: chandlerc

Subscribers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D23036

llvm-svn: 279537
2016-08-23 17:18:11 +00:00
Pete Cooper 036b94dad3 Fix some more asserts after r279466.
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types.  There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types.  These paths should be ok
to just print the name instead of crashing.

The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.

Thanks to Björn Pettersson for pointing out that there were more crashes.

llvm-svn: 279528
2016-08-23 16:23:45 +00:00
Simon Pilgrim 02b13d4d3c Use SDValue::getOpcode() helper instead of via SDValue::getNode()
llvm-svn: 279381
2016-08-20 20:04:18 +00:00
James Molloy 7ee640f9b6 [CodeGen] Fix a trivial type conversion bug dating back to pre-2008
The heuristic above this code is incredibly suspect, but disregarding that it mutates the cast opcode so we need to check the *mutated* opcode later to see if we need to emit an AssertSext or AssertZext node.

Fixes PR29041.

llvm-svn: 279223
2016-08-19 08:38:50 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Ayman Musa 71b43c5c1d Fix bug in DAGBuilder for getelementptr with expanded vector.
Replacing the usage of MVT with EVT in case the vector type is expanded.
Differential Revision: https://reviews.llvm.org/D23306

llvm-svn: 278913
2016-08-17 07:52:15 +00:00
Ayman Musa c96f421ad4 First commit (test commit) - Adding empty line.
llvm-svn: 278910
2016-08-17 07:37:34 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Pierre Gousseau 051db7d838 [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.

Differential Revision: https://reviews.llvm.org/D23445

llvm-svn: 278799
2016-08-16 13:53:53 +00:00
Eli Friedman 98151d6440 Fix typo in lowering for fp128 ueq.
Regression from r259791.

Differential Revision: https://reviews.llvm.org/D23374

llvm-svn: 278750
2016-08-15 21:46:19 +00:00
Wolfgang Pieb dfad9b20c9 Local variables whose address is taken and passed on to a call are described
in debug info using their stack slots instead of as an indirection of param reg + 0
offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating
FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack
location operands.

Differential Revision: http://reviews.llvm.org/D23283

llvm-svn: 278703
2016-08-15 18:18:26 +00:00
Duncan P. N. Exon Smith f197b1f78f ADT: Remove all ilist_iterator => pointer casts, NFC
Remove all ilist_iterator to pointer casts.  There were two reasons for
casts:

  - Checking for an uninitialized (i.e., null) iterator.  I added
    MachineInstrBundleIterator::isValid() to check for that case.

  - Comparing an iterator against the underlying pointer value while
    avoiding converting the pointer value to an iterator.  This is
    occasionally necessary in MachineInstrBundleIterator, since there is
    an assertion in the constructors that the underlying MachineInstr is
    not bundled (but we don't care about that if we're just checking for
    pointer equality).

To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.

  - The implicit constructors now use enable_if to exclude
    const-iterator => non-const-iterator conversions from overload
    resolution (previously it was a compiler error on instantiation, now
    it's SFINAE).

  - The == and != operators are now global (friends), and are not
    templated.

  - MachineInstrBundleIterator has overloads to compare against both
    const_pointer and const_reference.  This avoids the implicit
    conversions to MachineInstrBundleIterator that assert, instead just
    checking the address (and I added unit tests to confirm this).

Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore.  It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.

llvm-svn: 278478
2016-08-12 05:05:36 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Simon Pilgrim 85c7ea86ae [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)
If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector.

i.e. 
INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx )

Differential Revision: https://reviews.llvm.org/D23330

llvm-svn: 278211
2016-08-10 10:50:53 +00:00
Simon Pilgrim 76964e3140 [DAGCombiner] Better support for shifting large value type by constants
As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.

This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.

Differential Revision: https://reviews.llvm.org/D23007

llvm-svn: 278141
2016-08-09 17:39:11 +00:00
Diana Picus 4dd6c249ac [SelectionDAG] Refactor visitInlineAsm a bit. NFCI.
This shaves off ~100 lines from visitInlineAsm.

llvm-svn: 277987
2016-08-08 08:54:39 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Diana Picus ddddbc2440 Typo fix in comment. NFC
llvm-svn: 277704
2016-08-04 08:25:08 +00:00
Elliot Colp 82b1468a4d Disable shrinking of SNaN constants
When expanding FP constants, we attempt to shrink doubles to floats and perform an extending load.
However, on SystemZ, and possibly on other targets (I've only confirmed the problem on SystemZ), the FP extending load instruction may convert SNaN into QNaN, or may cause an exception. So in the general case, we would still like to shrink FP constants, but SNaNs should be left as doubles.

Differential Revision: https://reviews.llvm.org/D22685

llvm-svn: 277602
2016-08-03 15:09:21 +00:00
Michael Kuperstein c97da7f3a4 [DAGCombine] Make sext(setcc) combine respect getBooleanContents
We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)"
Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value
of T is 1 or -1, depending on the type of the setcc, and getBooleanContents()
for the type if it is not i1.

This fixes PR28504.

llvm-svn: 277371
2016-08-01 19:39:49 +00:00
Weiming Zhao 812fde3603 DAG: avoid duplicated truncating for sign extended operand
Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.

Reviewers: eli.friedman

Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D22933

llvm-svn: 277252
2016-07-29 23:33:48 +00:00
Andrew Kaylor b99d1cc7ed Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)

llvm-svn: 277189
2016-07-29 18:23:18 +00:00
Nirav Dave 563d6f8614 Cleanup TransferDbgValues
[DAG] Check debug values for invalidation before transferring and mark
old debug values invalid when transferring to another SDValue.

This fixes PR28613.

Reviewers: jyknight, hans, dblaikie, echristo

Subscribers: yaron.keren, ismail, llvm-commits

Differential Revision: https://reviews.llvm.org/D22858

llvm-svn: 277135
2016-07-29 11:49:32 +00:00
Nirav Dave b7c72717c9 Fix DbgValue handling in SelectionDAG.
[DAG] Relocate TransferDbgValues in ReplaceAllUsesWith(SDValue, SDValue)
to before we modify the CSE maps.

llvm-svn: 277027
2016-07-28 19:48:39 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Simon Pilgrim 10bf0ff879 [DAGCombiner] Use APInt directly to detect out of range shift constants
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match

As mentioned on D22726

llvm-svn: 276855
2016-07-27 10:30:55 +00:00
Andrew Kaylor f990fa5f7b Reverting r276771 due to MSan failures.
llvm-svn: 276824
2016-07-27 01:19:24 +00:00
Andrew Kaylor 3104a6bad0 Re-committing r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 276771
2016-07-26 17:23:13 +00:00
Simon Pilgrim 820f87a72d [SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types
An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size.

After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts.

In doing so we can significantly reduce the number of operations required.

Differential Revision: https://reviews.llvm.org/D21578

llvm-svn: 276432
2016-07-22 16:46:25 +00:00
Ahmed Bougacha 29333c9de6 [FastISel] Ignore @llvm.assume.
llvm-svn: 276410
2016-07-22 12:54:53 +00:00
Elena Demikhovsky 2c0780b8e5 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Chih-Hung Hsieh 4d9f2c154d [X86] Accept SELECT op code for x86-64 fp128 type
DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow
SELECT op code for x86_64 fp128 type for MME targets,
so SoftenFloatOperand does not abort on SELECT op code.

Differential Revision: http://reviews.llvm.org/D21758

llvm-svn: 275818
2016-07-18 17:20:09 +00:00
Simon Dardis d32a2d30cb [inlineasm] Propagate operand constraints to the backend
When SelectionDAGISel transforms a node representing an inline asm
block, memory constraint information is not preserved. This can cause
constraints to be broken when a memory offset is of the form:

offset + frame index

when the frame is resolved.

By propagating the constraints all the way to the backend, targets can
enforce memory operands of inline assembly to conform to their constraints.

For MIPSR6, some instructions had their offsets reduced to 9 bits from
16 bits such as ll/sc. This becomes problematic when using inline assembly
to perform atomic operations, as an offset can generated that is too big to
encode in the instruction.

Reviewers: dsanders, vkalintris

Differential Review: https://reviews.llvm.org/D21615

llvm-svn: 275786
2016-07-18 13:17:31 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Justin Lebar 0af80cd6f0 [CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand.
Summary:
Previously we took an unsigned.

Hooray for type-safety.

Reviewers: chandlerc

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D22282

llvm-svn: 275591
2016-07-15 18:26:59 +00:00
Michael Kuperstein 4d36e77048 Fix copy/paste bug in r275340.
llvm-svn: 275343
2016-07-13 23:28:00 +00:00
Michael Kuperstein be837fa40f [DAG] Correctly chain masked loads
If a masked loads is not added to the chain, it should not reset the chain's
root.

This fixes the remaining part of PR28515.

llvm-svn: 275340
2016-07-13 23:23:40 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
Sanjay Patel bb7d87ee25 fix documentation comments; NFC
llvm-svn: 275101
2016-07-11 20:50:39 +00:00
Sanjay Patel fedc01ad76 [DAG] make isConstantSplatVector() available to the rest of lowering
llvm-svn: 275025
2016-07-10 21:27:06 +00:00
Sanjay Patel 9bedcdb5f5 fix documentation comments; NFC
llvm-svn: 275021
2016-07-10 21:02:16 +00:00
Sanjay Patel 303326541b reformat, fix comments/names; NFCI
llvm-svn: 275015
2016-07-10 13:05:57 +00:00
Benjamin Kramer 4d09892e9a Give helper classes/functions internal linkage. NFC.
llvm-svn: 275014
2016-07-10 11:28:51 +00:00
Sanjay Patel 6170b4bebd fix documentation comments; NFC
llvm-svn: 274981
2016-07-09 18:52:07 +00:00
Matt Arsenault 3fb8f9eabf Reapply r274829 with fix for FP vectors
llvm-svn: 274937
2016-07-08 21:25:33 +00:00
Nico Weber 28410c6846 Revert r274829, it caused PR28472.
llvm-svn: 274916
2016-07-08 19:52:19 +00:00
Duncan P. N. Exon Smith 1b824c9e43 SelectionDAG: Avoid implicit iterator conversions in SelectionDAGBuilder, NFC
llvm-svn: 274907
2016-07-08 19:23:12 +00:00
Duncan P. N. Exon Smith dca9bffa31 SelectionDAG: Avoid implicit iterator conversions in SelectionDAGISel, NFC
llvm-svn: 274904
2016-07-08 19:11:40 +00:00
Duncan P. N. Exon Smith 6135f3f1cb SelectionDAG: Avoid implicit iterator conversions in ScheduleDAGSDNodes, NFC
llvm-svn: 274903
2016-07-08 19:07:09 +00:00
Duncan P. N. Exon Smith 10383ecd76 SelectionDAG: Avoid implicit iterator conversions in FastISel, NFC
llvm-svn: 274899
2016-07-08 18:36:41 +00:00
Sjoerd Meijer 1ee119f897 Do not expand SDIV when compiling for minimum code size
Differential Revision: http://reviews.llvm.org/D22139

llvm-svn: 274855
2016-07-08 15:32:01 +00:00
Sjoerd Meijer 46c4c3d31c Addressing post-commit comments regarding not expanding UDIV;
we don't expand only when compiling for minimum code size.

llvm-svn: 274847
2016-07-08 14:17:09 +00:00
Sjoerd Meijer a625af3feb Code size optimisation: don't expand a div to a mul and and a shift sequence.
As a result, the urem instruction will not be expanded to a sequence of umull,
lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod.

Differential Revision: http://reviews.llvm.org/D22131

llvm-svn: 274843
2016-07-08 12:54:43 +00:00
Matt Arsenault c3a6fe6ecd Bug 28444: Fix assertion when extract_vector_elt has mismatched type
For some reason extract_vector_elt is sometimes allowed to have
a different result type than the vector element type.

llvm-svn: 274829
2016-07-08 07:05:00 +00:00
Andrew Kaylor 65fa0704aa Include SelectionDAGISel in the opt-bisect process
Differential Revision: http://reviews.llvm.org/D21143

llvm-svn: 274786
2016-07-07 18:55:02 +00:00
Tim Shen 1c3c0afc53 [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if findBetterNeighborChains doesn't actually CombineTo it.
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.

This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.

Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner

Subscribers: mehdi_amini, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D21692

llvm-svn: 274644
2016-07-06 17:44:03 +00:00
Balaram Makam d4acd7ed10 Revert r259387: "AArch64: Implement missed conditional compare sequences."
This reverts commit r259387 because it inserts illegal code after legalization
    in some backends where i64 OR type is illegal for example.

llvm-svn: 274573
2016-07-05 20:24:05 +00:00
Matt Arsenault 2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Craig Topper d83f818a3e [CodeGen] Make the code that detects a if a shuffle is really a concatenation of the inputs more general purpose.
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.

This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.

llvm-svn: 274483
2016-07-04 06:19:35 +00:00
Craig Topper d1eca0f32c [CodeGen] Teach OR combine of shuffles involving zero vectors to better handle undef indices.
Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef.

llvm-svn: 274472
2016-07-03 19:37:12 +00:00
Craig Topper 90d7664a22 [CodeGen] Cleanup getVectorShuffle a bit to take advantage of its new ArrayRef argument and its begin/end iterators. Also use 'int' type for number of elements and loop iterators to remove several typecasts. No functional change intended.
llvm-svn: 274338
2016-07-01 06:54:51 +00:00
Craig Topper 2bd8b4b180 [CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended.
For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array.

llvm-svn: 274337
2016-07-01 06:54:47 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Rafael Espindola db6bd02185 Delete unused includes. NFC.
llvm-svn: 274225
2016-06-30 12:19:16 +00:00
Craig Topper 3a011de10c [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero vectors where the zero vector is the first operand to the shuffle instead of the second.
llvm-svn: 274097
2016-06-29 03:29:12 +00:00
Craig Topper f067a043fb [CodeGen] Make ShuffleVectorSDNode::commuteMask take a MutableArrayRef instead of SmallVectorImpl. NFC.
llvm-svn: 274095
2016-06-29 03:29:06 +00:00
Rafael Espindola b1556c42ce Use isPositionIndependent in a few more places.
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".

llvm-svn: 274055
2016-06-28 20:13:36 +00:00
Rafael Espindola 3beef8d6db Move shouldAssumeDSOLocal to Target.
Should fix the shared library build.

llvm-svn: 273958
2016-06-27 23:15:57 +00:00
Matt Arsenault f0f721a682 DAGCombiner: Don't narrow volatile vector loads + extract
llvm-svn: 273909
2016-06-27 19:31:04 +00:00
Rafael Espindola 0a68bf9627 Use isPositionIndependent predicate. NFC.
llvm-svn: 273830
2016-06-26 22:38:44 +00:00
Rafael Espindola 12bb38d367 Use isPositionIndependent predicate.
llvm-svn: 273828
2016-06-26 22:30:06 +00:00
Rafael Espindola ae0d866f56 Refactor a duplicated predicate. NFC.
llvm-svn: 273826
2016-06-26 22:13:55 +00:00
Craig Topper 9c809bee1c [SelectionDAG] Use DAG.getCommutedVectorShuffle instead of reimplementing it.
llvm-svn: 273802
2016-06-26 05:10:49 +00:00
Rafael Espindola 88ae09e9be Use shouldAssumeDSOLocal in isOffsetFoldingLegal.
This makes it slightly more powerful for dynamic-no-pic.

llvm-svn: 273704
2016-06-24 18:48:36 +00:00
Nirav Dave bfdb483755 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273585
2016-06-23 17:52:57 +00:00
Peter Collingbourne 6717803485 Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as it caused pr28270.
llvm-svn: 273518
2016-06-23 00:06:17 +00:00
Nirav Dave 96beb7dee5 Preserve DebugInfo when replacing values in DAGCombiner
Recommiting after fixing over-aggressive assertion

[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.

Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.

This refixes PR9817 which was being incompletely checked in the
testsuite.

Reviewers: jyknight

Subscribers: dblaikie, llvm-commits

Differential Revision: http://reviews.llvm.org/D21037

llvm-svn: 273456
2016-06-22 19:03:26 +00:00
Wei Ding 0526e7f8d9 AMDGPU: Add convergent flag to INLINEASM instruction.
Differential Revision: http://reviews.llvm.org/D21214

llvm-svn: 273455
2016-06-22 18:51:08 +00:00
Krzysztof Parzyszek e116d500a7 [SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.

Differential Revision: http://reviews.llvm.org/D20376

llvm-svn: 273403
2016-06-22 12:54:25 +00:00
Simon Pilgrim bb8a40fdd2 Strip trailing whitespace
llvm-svn: 273264
2016-06-21 14:37:39 +00:00
Daniel Sanders bf2c03ee69 [arm+x86] Make GNU variants behave like GNU w.r.t combining sin+cos into sincos.
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.

Reviewers: t.p.northover

Subscribers: t.p.northover, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D21431

llvm-svn: 273259
2016-06-21 12:29:03 +00:00
David Majnemer e61e4bfd87 Replace silly uses of 'signed' with 'int'
llvm-svn: 273244
2016-06-21 05:10:24 +00:00