Commit Graph

7706 Commits

Author SHA1 Message Date
Owen Anderson faf5187ee0 Restore the original behavior of SelectionDAG::getTargetIndex().
It looks like an extra negation snuck in as apart of restoring it.

llvm-svn: 250726
2015-10-19 19:27:40 +00:00
Benjamin Kramer 2002aadaad Put back SelectionDAG::getTargetIndex.
While technically this is untested dead code, it has out-of-tree users.
This reverts a part of r250434.

llvm-svn: 250717
2015-10-19 18:26:16 +00:00
Simon Pilgrim 04d52d26f6 Use SDValue bool check. NFCI.
llvm-svn: 250653
2015-10-18 12:33:54 +00:00
Simon Pilgrim c2c154e078 Move one-use variable inside test. NFC.
llvm-svn: 250651
2015-10-18 11:47:23 +00:00
Simon Pilgrim 24057b9566 [DAG] Ensure vector constant folding uses correct scalar undef types
Minor fix to D13665 found during post-commit review.

llvm-svn: 250616
2015-10-17 16:49:43 +00:00
Joseph Tremoulet 55b51e9dcc [WinEH] Fix eh.exceptionpointer intrinsic lowering
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13747

llvm-svn: 250588
2015-10-17 00:08:08 +00:00
Benjamin Kramer bacc7ba7aa [SelectionDAG] Remove dead code. NFC.
Carefully selected parts without deleting graph stuff and dumping methods.

llvm-svn: 250434
2015-10-15 17:54:06 +00:00
Artyom Skrobov 4bca0bb010 A doccomment for CombineTo, and some NFC refactorings
Summary:
Caching SDLoc(N), instead of recreating it in every single
function call, keeps the code denser, and allows to unwrap long lines.

Reviewers: sunfish, atrick, sdmitrouk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13726

llvm-svn: 250305
2015-10-14 17:18:35 +00:00
Artyom Skrobov a5b9ad22b3 Merge DAGCombiner::visitSREM and DAGCombiner::visitUREM (NFC)
Summary: The two implementations had more code in common than not.

Reviewers: sunfish, MatzeB, sdmitrouk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13724

llvm-svn: 250302
2015-10-14 16:54:14 +00:00
Duncan P. N. Exon Smith e400a7d412 SelectionDAG: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250214
2015-10-13 19:47:46 +00:00
Matt Arsenault e5d9515fb7 DAGCombiner: Don't stop finding better chain on 2 aliases
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.

For a simple case that looks like

t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...

t4 = store t0:1, t0:1
  t5 = store t4, t1:0
    t6 = store t5, t2:0
	  t7 = store t6, t3:0

We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.

llvm-svn: 250138
2015-10-13 00:49:00 +00:00
Matt Arsenault 61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Cong Hou bf22f5063a Assign correct edge weights to unwind destinations when lowering invoke statement.
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.

Differential revision: http://reviews.llvm.org/D13354

llvm-svn: 250119
2015-10-12 23:02:58 +00:00
Simon Pilgrim c8832fc233 [SelectionDAG] Add common vector constant folding helper function
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).

This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.

After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().

Differential Revision: http://reviews.llvm.org/D13665

llvm-svn: 250118
2015-10-12 23:00:11 +00:00
Reid Kleckner 9abb3c06a6 Don't call PrepareEHLandingPad on non EH pads
This was a minor bug in r249492. Calling PrepareEHLandingPad on a
non-landingpad was a no-op, but it attempted to get the generic pointer
register class, which apparently doesn't exist for some targets.

llvm-svn: 250068
2015-10-12 17:42:32 +00:00
David Majnemer 99c1d13e52 [WinEH] Remove CatchObjRecoverIdx
CatchObjRecoverIdx was used for the old scheme, it is no longer
relevant.

llvm-svn: 250065
2015-10-12 16:44:22 +00:00
Oliver Stannard cca893ffac [Debug] Look through bitcasts to find argument registers
On targets where f32 is not legal, we have to look through a BITCAST SDNode to
find the register that an argument is stored in when emitting debug info, or we
will not be able to emit a DW_AT_location for it.

Differential Revision: http://reviews.llvm.org/D13005

llvm-svn: 250056
2015-10-12 15:52:36 +00:00
Simon Pilgrim d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Simon Pilgrim 5eac2607b9 [DAGCombiner] Tidyup FMINNUM/FMAXNUM constant folding
Enable constant folding for vector splats as well as scalars.

Enable constant canonicalization for all scalar and vector constants.

llvm-svn: 249978
2015-10-11 16:02:28 +00:00
David Majnemer bfa5b98201 [WinEH] Remove more dead code
wineh-parent is dead, so is ValueOrMBB.

llvm-svn: 249920
2015-10-10 00:04:29 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Reid Kleckner ebef256269 [SEH] Fix llvm.eh.exceptioncode fast register allocation assertion
I called the wrong MachineBasicBlock::addLiveIn() overload.

llvm-svn: 249786
2015-10-09 00:15:13 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Joseph Tremoulet bde46c5642 [WinEH] Update CoreCLR EH for catchpad MBBs
Summary:
Set the pad MBB as a funclet entry for CoreCLR as well as MSVCCXX, and
update state numbering to put the catchpad block rather than its normal
successor into the unwind map.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13492

llvm-svn: 249569
2015-10-07 17:16:25 +00:00
Reid Kleckner 72ba70418f [SEH] Add llvm.eh.exceptioncode intrinsic
This will support the Clang __exception_code intrinsic.

llvm-svn: 249492
2015-10-07 00:27:33 +00:00
David Majnemer 7735a6d07a [WinEH] Create a separate MBB for funclet prologues
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination.  This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.

Instead, use the CatchPad's location for the funclet prologue.  This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.

Differential Revision: http://reviews.llvm.org/D13424

llvm-svn: 249483
2015-10-06 23:31:59 +00:00
Joseph Tremoulet 7f8c1165cd [WinEH] Implement state numbering for CoreCLR
Summary:
Assign one state number per handler/funclet, tracking parent state,
handler type, and catch type token.
State numbers are arranged such that ancestors have lower state numbers
than their descendants.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13450

llvm-svn: 249457
2015-10-06 20:30:33 +00:00
Joseph Tremoulet 2afea5438f [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
David Majnemer 429c8eda22 [SelectionDAGBuilder] Remove dead code
We already check for LandingPadInst two lines above.

llvm-svn: 249280
2015-10-04 18:44:47 +00:00
Simon Pilgrim dde63374c5 [DAGCombiner] Generalize FADD constant combines to work with vectors
Updated the FADD combines to work with vectors as well as scalars.

Differential Revision: http://reviews.llvm.org/D13416

llvm-svn: 249251
2015-10-03 22:06:06 +00:00
Simon Pilgrim a38d76a087 [DAGCombiner] Merge SIGN_EXTEND_INREG vector constant folding methods. NCI.
visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them.

This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation.

No codegen intended and all existing tests remain the same.

llvm-svn: 249236
2015-10-03 16:26:52 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
NAKAMURA Takumi 096492a07b Reformat.
llvm-svn: 249033
2015-10-01 17:01:03 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Evgeniy Stepanov f608111d1b Fix debug info with SafeStack.
llvm-svn: 248933
2015-09-30 19:55:43 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Reid Kleckner c71d6275ca [WinEH] Fix ip2state table emission with funclets
Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.

llvm-svn: 248763
2015-09-28 23:56:30 +00:00
Hal Finkel bd582581b8 [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking
When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

llvm-svn: 248698
2015-09-28 08:02:14 +00:00
Matthias Braun a3b701f828 SelectionDAGDumper: Print simple operands inline.
Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.

Previously:
  t0: ch = EntryToken
    t1: i64 = Register %vreg0
  t2: i64,ch = CopyFromReg t0, t1
    t3: i64 = Constant<1>
  t4: i64 = add t2, t3
    t5: i64 = Constant<2>
  t6: i64 = add t2, t5
  t10: i64 = undef
  t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
  t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
  t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>

Now:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64 = add t2, Constant:i64<1>
  t6: i64 = add t2, Constant:i64<2>
  t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
  t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
  t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64

Differential Revision: http://reviews.llvm.org/D12567

llvm-svn: 248628
2015-09-25 22:27:02 +00:00
Matt Arsenault 3c07e963b8 DAGCombiner: Check if store is volatile first
This is the simpler check. NFC.

llvm-svn: 248625
2015-09-25 22:06:19 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Mohammad Shahid 13f1dfdf2e Codegen: Fix llvm.*absdiff semantic.
Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly.

Differential Revision: http://reviews.llvm.org/D11678

llvm-svn: 248483
2015-09-24 10:35:03 +00:00
Matt Arsenault c721df0478 Use new TokenFactor chain when merging stores
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.

The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.

This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.

llvm-svn: 248468
2015-09-24 07:22:38 +00:00
Cong Hou 9def6efd7e Fixed an issue on updating profile data when lowering switch statement.
Fixed the issue that when there is an edge from the jump table to the default statement, we should check it directly instead of checking if the sibling of the jump table header is a successor of the jump table header, which may not be the default statment but a successor of it.

llvm-svn: 248354
2015-09-23 00:20:27 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
Simon Pilgrim 4003ed2da3 [DAGCombiner] Improve FMA support for interpolation patterns
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.

This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))

Differential Revision: http://reviews.llvm.org/D13003

llvm-svn: 248210
2015-09-21 20:32:48 +00:00
Simon Pilgrim e8e5a17a12 [DAGCombiner] Tidy up FMA combine helpers. NFCI.
Based on feedback for D13003.

llvm-svn: 248206
2015-09-21 20:15:03 +00:00
Stephen Canon b12db0e42c Remove roundingMode argument in APFloat::mod
Because mod is always exact, this function should have never taken a rounding mode argument.  The actual implementation still has issues, which I'll look at resolving in a subsequent patch.

llvm-svn: 248195
2015-09-21 19:29:25 +00:00
Matt Arsenault 8fb9b94f7f Fix accidentally committed debug printing
llvm-svn: 248190
2015-09-21 18:21:10 +00:00
Matthias Braun b9fe44ddb0 SelectionDAG: Use InsertNode for EntryNode
This fixes problems where two nodes have persistent debug id 0 assigned.

llvm-svn: 248182
2015-09-21 17:41:05 +00:00
Matt Arsenault b774834429 DAGCombiner: Replace store of FP constant after attemping store merges
If storing multiple FP constants, some subset of the stores
would be replaced with integers due to visit order, so
MergeConsecutiveStores would only partially merge
these.

llvm-svn: 248169
2015-09-21 15:59:46 +00:00
Matt Arsenault a30ddb6524 Factor replacement of stores of FP constants into new function
llvm-svn: 248168
2015-09-21 15:59:43 +00:00
Craig Topper 0013be16ff Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC
llvm-svn: 248140
2015-09-21 05:32:41 +00:00
Matthias Braun 77771cfd97 SelectionDAGDumper: Leave out the <multiple use> markers
They mostly clutter the output while it is still possible to see which
node has multiple users without them.

Differential Revision: http://reviews.llvm.org/D12569

llvm-svn: 248013
2015-09-18 17:57:33 +00:00
Matthias Braun bab3fb45e5 SelectionDAGDumper: Avoid unnecessary newlines
Before:
  t0 = EntryToken:ch

    t0: <multiple use>
        t0: <multiple use>
      t1 = CopyFromReg:v4f32,ch t0, Register:v4f32  %vreg0

      t25 = IMPLICIT_DEF:v4f32

    t26 = HADDPSrr:v4f32 t1, t25

  t23 = CopyToReg:ch,glue t0, Register:v4f32  %XMM0, t26

    t23: <multiple use>
    t23: <multiple use>
  t24 = RETQ:ch Register:v4f32  %XMM0, t23, t23:1

After:
    t0: <multiple use>
        t0: <multiple use>
      t1 = CopyFromReg:v4f32,ch t0, Register:v4f32  %vreg0
    t26 = X86ISD::FHADD:v4f32 t1, undef:v4f32
  t23 = CopyToReg:ch,glue t0, Register:v4f32  %XMM0, t26
    t23: <multiple use>
    t21 = TargetConstant:i16<0>
    t23: <multiple use>
  t24 = X86ISD::RET_FLAG:ch t23, t21, Register:v4f32  %XMM0, t23:1

Differential Revision: http://reviews.llvm.org/D12568

llvm-svn: 248012
2015-09-18 17:57:31 +00:00
Matthias Braun f89b7c7188 SelectionDAGDumper: Hide [ID=X], [ORD=X] and source locations by default.
You can show them with the new -dag-dump-verbose switch.

Differential Revision: http://reviews.llvm.org/D12566

llvm-svn: 248011
2015-09-18 17:57:28 +00:00
Matthias Braun 0b7d6c14c9 SelectionDAG: Introduce PersistentID to SDNode for assert builds.
This gives us more human readable numbers to identify nodes in debug
dumps.

Before:
  0x7fcbd9700160: ch = EntryToken

  0x7fcbd985c7c8: i64 = Register %RAX

   ...

      0x7fcbd9700160: <multiple use>
    0x7fcbd985c578: i64,ch = MOV64rm 0x7fcbd985c6a0, 0x7fcbd985cc68, 0x7fcbd985c200, 0x7fcbd985cd90, 0x7fcbd985ceb8, 0x7fcbd9700160<Mem:LD8[@foo]> [ORD=2]

  0x7fcbd985c8f0: ch,glue = CopyToReg 0x7fcbd9700160, 0x7fcbd985c7c8, 0x7fcbd985c578 [ORD=3]

    0x7fcbd985c7c8: <multiple use>
    0x7fcbd985c8f0: <multiple use>
    0x7fcbd985c8f0: <multiple use>
  0x7fcbd985ca18: ch = RETQ 0x7fcbd985c7c8, 0x7fcbd985c8f0, 0x7fcbd985c8f0:1 [ORD=3]

Now:
  t0: ch = EntryToken

  t5: i64 = Register %RAX

    ...

      t0: <multiple use>
    t3: i64,ch = MOV64rm t10, t12, t11, t13, t14, t0<Mem:LD8[@foo]> [ORD=2]

  t6: ch,glue = CopyToReg t0, t5, t3 [ORD=3]

    t5: <multiple use>
    t6: <multiple use>
    t6: <multiple use>
  t7: ch = RETQ t5, t6, t6:1 [ORD=3]

Differential Revision: http://reviews.llvm.org/D12564

llvm-svn: 248010
2015-09-18 17:41:00 +00:00
Bob Wilson dd0eadce7d Whitespace. Indent with spaces instead of a tab.
llvm-svn: 247969
2015-09-18 05:36:13 +00:00
Reid Kleckner b005d281c3 [WinEH] Pull Adjectives and CatchObj out of the catchpad arg list
Clang now passes the adjectives as an argument to catchpad.

Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.

llvm-svn: 247844
2015-09-16 20:16:27 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Chandler Carruth 2e4ca848f4 Add an explicit 'inline' specifier to these static functions. GCC is
warning on them having always_inline attribute for reasons I don't fully
understand -- static functions are just as inlinable as inline
functions in terms of linkage.

llvm-svn: 247334
2015-09-10 20:34:57 +00:00
Silviu Baranga df9ce8408a [DAGCombine] Truncate BUILD_VECTOR operators if necessary when constant folding vectors
Summary:
The BUILD_VECTOR node will truncate its operators to match the
type. We need to take this into account when constant folding -
we need to perform a truncation before constant folding the elements.
This is because the upper bits can change the result, depending on
the operation type (for example this is the case for min/max).

This change also adds a regression test.

Reviewers: jmolloy

Subscribers: jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12697

llvm-svn: 247265
2015-09-10 10:34:34 +00:00
Reid Kleckner 7878391208 [WinEH] Add codegen support for cleanuppad and cleanupret
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.

This small example now compiles and executes successfully on win32:
  extern "C" int printf(const char *, ...) noexcept;
  struct Dtor {
    ~Dtor() { printf("~Dtor\n"); }
  };
  void has_cleanup() {
    Dtor o;
    throw 42;
  }
  int main() {
    try {
      has_cleanup();
    } catch (int) {
      printf("caught it\n");
    }
  }

Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.

llvm-svn: 247219
2015-09-10 00:25:23 +00:00
Reid Kleckner 94b704c469 [SEH] Emit 32-bit SEH tables for the new EH IR
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.

The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.

llvm-svn: 247192
2015-09-09 21:10:03 +00:00
Chandler Carruth 7b560d40bd [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

llvm-svn: 247167
2015-09-09 17:55:00 +00:00
Daniel Sanders 2038747fce Fix vector splitting for extract_vector_elt and vector elements of <8-bits.
Summary:
One of the vector splitting paths for extract_vector_elt tries to lower:
    define i1 @via_stack_bug(i8 signext %idx) {
      %1 = extractelement <2 x i1> <i1 false, i1 true>, i8 %idx
      ret i1 %1
    }
to:
    define i1 @via_stack_bug(i8 signext %idx) {
      %base = alloca <2 x i1>
      store <2 x i1> <i1 false, i1 true>, <2 x i1>* %base
      %2 = getelementptr <2 x i1>, <2 x i1>* %base, i32 %idx
      %3 = load i1, i1* %2
      ret i1 %3
    }
However, the elements of <2 x i1> are not byte-addressible. The result of this
is that the getelementptr expands to '%base + %idx * (1 / 8)' which simplifies
to '%base + %idx * 0', and then simply '%base' causing all values of %idx to
extract element zero.

This commit fixes this by promoting the vector elements of <8-bits to i8 before
splitting the vector.

This fixes a number of test failures in pocl.

Reviewers: pekka.jaaskelainen

Subscribers: pekka.jaaskelainen, llvm-commits

Differential Revision: http://reviews.llvm.org/D12591

llvm-svn: 247128
2015-09-09 09:53:20 +00:00
Matt Arsenault acd68b58ae SelectionDAG: Support Expand of f16 extloads
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.

This moves a hack out of SI's argument lowering and
is covered by existing tests.

llvm-svn: 247113
2015-09-09 01:12:27 +00:00
Reid Kleckner 51189f0a1d [WinEH] Avoid creating MBBs for LLVM BBs that cannot contain code
Typically these are catchpads, which hold data used to decide whether to
catch the exception or continue unwinding. We also shouldn't create MBBs
for catchendpads, cleanupendpads, or terminatepads, since no real code
can live in them.

This fixes a problem where MI passes (like the register allocator) would
try to put code into catchpad blocks, which are not executed by the
runtime. In the new world, blocks ending in invokes now have many
possible successors.

llvm-svn: 247102
2015-09-08 23:28:38 +00:00
Reid Kleckner df1295173f [WinEH] Emit prologues and epilogues for funclets
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.

64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12546

llvm-svn: 247092
2015-09-08 22:44:41 +00:00
Hal Finkel 10aac5fd0e [SelectionDAG] Swap commutative binops before constant-based folding
In searching for a fix for the underlying code-quality bug highlighted by
r246937 (that SDAG simplification can lead to us generating an ISD::OR node
with a constant zero LHS), I ran across this:

We generically canonicalize commutative binary-operation nodes in SDAG getNode
so that, if only one operand is a constant, it will be on the RHS.  However, we
were doing this only after a bunch of constant-based simplification checks that
all assume this canonical form (that any constant will be on the RHS). Moving
the operand-swapping canonicalization prior to these checks seems like the
right thing to do (and, as it turns out, causes SDAG to completely fold away the
computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine
would do).

llvm-svn: 246938
2015-09-06 05:42:13 +00:00
Sanjay Patel ce74db9d8d check for fastness before merging in DAGCombiner::MergeConsecutiveStores()
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned 
16-byte (SSE) memops for x86 targets where those accesses are slow.

This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905

and will help solve PR21711.

Differential Revision: http://reviews.llvm.org/D12573

llvm-svn: 246771
2015-09-03 15:03:19 +00:00
Joseph Tremoulet 9ce71f76b9 [WinEH] Add cleanupendpad instruction
Summary:
Add a `cleanupendpad` instruction, used to mark exceptional exits out of
cleanups (for languages/targets that can abort a cleanup with another
exception).  The `cleanupendpad` instruction is similar to the `catchendpad`
instruction in that it is an EH pad which is the target of unwind edges in
the handler and which itself has an unwind edge to the next EH action.
The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad`
argument indicating which cleanup it exits.  The unwind successors of a
`cleanuppad`'s `cleanupendpad`s must agree with each other and with its
`cleanupret`s.

Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12433

llvm-svn: 246751
2015-09-03 09:09:43 +00:00
Sanjay Patel fff7c6dc73 use "unpredictable" metadata in SelectionDAG when splitting compares
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.

Differential Revision: http://reviews.llvm.org/D12343

llvm-svn: 246691
2015-09-02 19:17:25 +00:00
Elena Demikhovsky 1b9d6914d3 Optimization for Gather/Scatter with uniform base
Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence.
While looking for uniform base, we want to use the scalar base pointer of GEP, if exists.

Differential Revision: http://reviews.llvm.org/D11121

llvm-svn: 246622
2015-09-02 08:39:13 +00:00
Cong Hou 511298b919 Distribute the weight on the edge from switch to default statement to edges generated in lowering switch.
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.

For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.

There are some exceptions:

For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.

Differential Revision: http://reviews.llvm.org/D12418

llvm-svn: 246522
2015-09-01 01:42:16 +00:00
Hal Finkel 1baec5323b [DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

llvm-svn: 246507
2015-08-31 23:15:04 +00:00
Sanjay Patel 719b3e6a3e don't set a legal vector type if we know we can't use that type (NFCI)
Added benefit: the 'if' logic now matches the text of the comment that describes it.

llvm-svn: 246506
2015-08-31 22:59:03 +00:00
Sanjay Patel 218cbd5a48 generalize helper function of MergeConsecutiveStores to handle vector types (NFCI)
This was part of D7208 (r227242), but that commit was reverted because it exposed
a bug in AArch64 lowering. I should have that fixed and the rest of the commit
reinstated soon.

llvm-svn: 246493
2015-08-31 21:50:16 +00:00
Hal Finkel 2483f2060a [DAGCombine] Use getSetCCResultType utility function
DAGCombine has a utility wrapper around TLI's getSetCCResultType; use it in the
one place in DAGCombine still directly calling the TLI function. NFC.

llvm-svn: 246482
2015-08-31 20:42:38 +00:00
Reid Kleckner e00faf8ce1 [EH] Handle non-Function personalities like unknown personalities
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads.  Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.

llvm-svn: 246478
2015-08-31 20:02:16 +00:00
Hal Finkel a894266d28 [DAGCombine] Remove some old dead code for forming SETCC nodes
This code was dead when it was committed in r23665 (Oct 7, 2005), and before it
reaches its 10th anniversary, it really should go. We can always bring it back
if we'd like, but it forms more SETCC nodes, and the way we do legality
checking on SETCC nodes is wrong in a number of places, and removing this means
fewer places to fix. NFC.

llvm-svn: 246466
2015-08-31 18:38:55 +00:00
Renato Golin 3b1d3b0d84 Revert "Revert "New interface function is added to VectorUtils Value *getSplatValue(Value *Val);""
This reverts commit r246379. It seems that the commit was not the culprit,
and the bot will be investigated for instability.

llvm-svn: 246380
2015-08-30 10:49:04 +00:00
Renato Golin c7be31736c Revert "New interface function is added to VectorUtils Value *getSplatValue(Value *Val);"
This reverts commit r246371, as it cause a rather obscure bug in AArch64
test-suite paq8p (time outs, seg-faults). I'll investigate it before
reapplying.

llvm-svn: 246379
2015-08-30 10:05:30 +00:00
Elena Demikhovsky a59fcfa56b New interface function is added to VectorUtils
Value *getSplatValue(Value *Val);

It complements the CreateVectorSplat(), which creates 2 instructions - insertelement and shuffle with all-zero mask.

The new function recognizes the pattern - insertelement+shuffle and returns the splat value (or nullptr).
It also returns a splat value form ConstantDataVector, for completeness.

Differential Revision:	http://reviews.llvm.org/D11124

llvm-svn: 246371
2015-08-30 07:28:18 +00:00
Fiona Glaser 934765c1df SelectionDAG: add missing ComputeSignBits case for SELECT_CC
Identical to SELECT, just with different operand numbers.

llvm-svn: 246366
2015-08-29 23:04:38 +00:00
Matt Arsenault d9c830154f Make MergeConsecutiveStores look at other stores on same chain
When combiner AA is enabled, look at stores on the same chain.
Non-aliasing stores are moved to the same chain so the existing
code fails because it expects to find an adajcent store on a consecutive
chain.

Because of how DAGCombiner tries these store combines,
MergeConsecutiveStores doesn't see the correct set of stores on the chain
when it visits the other stores. Each store individually has its chain
fixed before trying to merge consecutive stores, and then tries to merge
stores from that point before the other stores have been processed to
have their chains fixed. To fix this, attempt to use FindBetterChain
on any possibly neighboring stores in visitSTORE.

Suppose you have 4 32-bit stores that should be merged into 1 vector
store. One store would be visited first, fixing the chain. What happens is
because not all of the store chains have yet been fixed, 2 of the stores
are merged. The other 2 stores later have their chains fixed,
but because the other stores were already merged, they have different
memory types and merging the two different sized stores is not
supported and would be more difficult to handle.

llvm-svn: 246307
2015-08-28 17:31:28 +00:00
Ahmed Bougacha f9c19da03a [CodeGen] Support (and default to) expanding READCYCLECOUNTER to 0.
For targets that didn't support this, this will let us respect the
langref instead of failing to select.

Note that we don't need to change the 32-bit x86/PPC lowerings (to
account for the result type/# difference) because they're both
custom and bypass type legalization.

llvm-svn: 246258
2015-08-28 01:49:59 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Ahmed Bougacha 87166905c8 [CodeGen] Check FoldConstantArithmetic result before using it.
Fixes PR24602: r245689 introduced an unguarded use of
SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails
because of opaque (hoisted) constants.

llvm-svn: 246217
2015-08-27 21:46:04 +00:00
Cong Hou 08cb4fc688 Fixed a bug that edge weights are not assigned correctly when lowering switch statement.
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.

Differential Revision: http://reviews.llvm.org/D12391

llvm-svn: 246129
2015-08-27 00:37:40 +00:00
Cong Hou 03127700d5 Assign weights to edges to jump table / bit test header when lowering switch statement.
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.

Differential Revision: http://reviews.llvm.org/D12166#fae6eca7

llvm-svn: 246104
2015-08-26 23:15:32 +00:00
Matthias Braun 4e7ded834f SelectionDAGBuilder: Fix SPDescriptor not resetting GuardReg
This was causing problems when some functions use a GuardReg and some
don't as can happen when mixing SelectionDAG and FastISel generated
functions.

llvm-svn: 246075
2015-08-26 20:46:52 +00:00
Matthias Braun 4816b18d86 FastISel: Avoid adding a successor block twice for degenerate IR.
This fixes http://llvm.org/PR24581

Differential Revision: http://reviews.llvm.org/D12350

llvm-svn: 246074
2015-08-26 20:46:49 +00:00
Matthias Braun 17af607796 FastISel: Factor out common code; NFC intended
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.

llvm-svn: 245997
2015-08-26 01:38:00 +00:00
Charles Davis 119525914c Make variable argument intrinsics behave correctly in a Win64 CC function.
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.

Reviewers: nadav, asl, eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1622

llvm-svn: 245990
2015-08-25 23:27:41 +00:00
Cong Hou cd59591396 Remove the final bit test during lowering switch statement if all cases in bit test cover a contiguous range.
When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.

Differential Revision: http://reviews.llvm.org/D12249

llvm-svn: 245976
2015-08-25 21:34:38 +00:00
Steve King 5cdbd20cc3 Pass function attributes instead of boolean in isIntDivCheap().
llvm-svn: 245921
2015-08-25 02:31:21 +00:00
Dan Gohman 7b63484b99 [WebAssembly] Skeleton FastISel support
llvm-svn: 245860
2015-08-24 18:44:37 +00:00
Oliver Stannard 284f2bffc9 Add DAG optimisation for FP16_TO_FP
The FP16_TO_FP node only uses the bottom 16 bits of its input, so the
following pattern can be optimised by removing the AND:

  (FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op)

This is a common pattern for ARM targets when functions have __fp16
arguments, as they are passed as floats (so that they get passed in the
correct registers), but then bitcast and truncated to ignore the top 16
bits.

llvm-svn: 245832
2015-08-24 09:47:45 +00:00
Simon Pilgrim 2a7049abe0 [DAGCombiner] Fold CONCAT_VECTORS of bitcasted EXTRACT_SUBVECTOR
Minor generalization of D12125 - peek through any bitcast to the original vector that we're extracting from.

llvm-svn: 245814
2015-08-23 15:22:14 +00:00
Mehdi Amini 5aa7bd7d62 Do not use dyn_cast<> after isa<>
Reported by coverity.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245799
2015-08-23 00:27:57 +00:00
Yaron Keren 528d8d6092 Disable Visual C++ 2013 Debug mode assert on null pointer in some STL algorithms,
such as std::equal on the third argument. This reverts previous workarounds.

Predefining _DEBUG_POINTER_IMPL disables Visual C++ 2013 headers from defining
it to a function performing the null pointer check. In practice, it's not that
bad since any function actually using the nullptr will seg fault. The other
iterator sanity checks remain enabled in the headers.

Reviewed by Aaron Ballmanþ and Duncan P. N. Exon Smith.

llvm-svn: 245711
2015-08-21 17:31:03 +00:00
John Brawn eab960c46f [DAGCombiner] Fold together mul and shl when both are by a constant
This is intended to improve code generation for GEPs, as the index value is
shifted by the element size and in GEPs of multi-dimensional arrays the index
of higher dimensions is multiplied by the lower dimension size.

Differential Revision: http://reviews.llvm.org/D12197

llvm-svn: 245689
2015-08-21 10:48:17 +00:00
Simon Pilgrim 35f528262f [DAGCombiner] Added SMAX/SMIN/UMAX/UMIN constant folding
We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes

I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding

Differential Revision: http://reviews.llvm.org/D12118

llvm-svn: 245503
2015-08-19 21:11:58 +00:00
Simon Pilgrim 989cbbd2f5 [DAGCombiner] Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.
Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle.

Differential Revision: http://reviews.llvm.org/D12125

llvm-svn: 245490
2015-08-19 20:09:50 +00:00
Michael Kuperstein dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Steve King d4c8f70ce1 Fix backward operands in call to isTruncateFree() and improve comments.
llvm-svn: 245385
2015-08-18 23:02:41 +00:00
Matthias Braun fa3b248a66 DAGCombiner: Improve DAGCombiner select normalization
The current code normalizes select(C0, x, select(C1, x, y)) towards
select(C0|C1, x, y) if the targets prefers that form. This patch adds an
additional rule that if the select(C1, x, y) part already exists in the
function then we want to normalize into the other direction because the
effects of reusing the existing value are bigger than transforming into
the target preferred form.

This addresses regressions following r238793, see also:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html

Differential Revision: http://reviews.llvm.org/D11616

llvm-svn: 245350
2015-08-18 20:48:36 +00:00
Matthias Braun 2e920bd04f DAGCombiner: Optimize SELECTs first before turning them into SELECT_CC
This is part of http://reviews.llvm.org/D11616 - I just decided to split
this up into a separate commit.

llvm-svn: 245349
2015-08-18 20:48:29 +00:00
David Majnemer 0ad363eebc [WinEH] Calculate state numbers for the new EH representation
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet.   Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.

Differential Revision: http://reviews.llvm.org/D12098

llvm-svn: 245331
2015-08-18 19:07:12 +00:00
James Molloy ef183397b1 Generate FMINNAN/FMINNUM/FMAXNAN/FMAXNUM from SDAGBuilder.
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.

For example on AArch32 (V8), we have scalar fminnm but not fmin.

Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.

llvm-svn: 245196
2015-08-17 07:13:10 +00:00
Sanjay Patel 3ab4a73bac use SDValue bool operator; NFCI
llvm-svn: 245181
2015-08-16 17:54:28 +00:00
Simon Pilgrim 0750c84623 [DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.

(zext (truncate x)) -> (zext (and(x, m))

Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.

This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.

Differential Revision: http://reviews.llvm.org/D11764

llvm-svn: 245160
2015-08-15 13:27:30 +00:00
Ahmed Bougacha a196661bb0 [CodeGen] Mark the promoted FCOPYSIGN result FP_ROUND as TRUNCating.
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.

FCOPYSIGN doesn't change anything but the sign bit, so
  (fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
  (fp_round (fpext a))
which is a no-op.

llvm-svn: 244862
2015-08-13 01:32:30 +00:00
Ahmed Bougacha b5b0cfdff7 [CodeGen] Assert on getNode(FP_EXTEND) with a smaller dst type.
This would have caught the problem in r244858.

llvm-svn: 244859
2015-08-13 01:10:29 +00:00
Ahmed Bougacha 40ded502ff [CodeGen] When Promoting, don't extend the 2nd FCOPYSIGN operand.
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
  (f32 (fp_extend (f64 v)))

Scalar f16 follow-up to r243924.

llvm-svn: 244858
2015-08-13 01:09:43 +00:00
Ahmed Bougacha 31e0d9a2b1 [CodeGen] Simplify getNode(*EXT/TRUNC) type size assert. NFC.
We already check that vectors have the same number of elements, we
don't need to use the scalar types explicitly: comparing the size of
the whole vector is enough.

llvm-svn: 244857
2015-08-13 01:08:48 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
JF Bastien 0cf74528d0 NFC SelectionDAGDumper: fix typo
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11959

llvm-svn: 244667
2015-08-11 21:10:07 +00:00
Jingyue Wu 99eb4685ef SelectionDAG: Prefer to combine multiplication with less uses for fma
Summary:
For example:

  s6 = s0*s5;
  s2 = s6*s6 + s6;
  ...
  s4 = s6*s3;

We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.

Test Plan: test/CodeGen/NVPTX/fma-assoc.ll

Patch by Xuetian Weng

Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm

Subscribers: arsenm, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D11855

llvm-svn: 244649
2015-08-11 19:21:46 +00:00
Sanjay Patel 070df89928 fix minsize detection: minsize attribute implies optimizing for size
llvm-svn: 244631
2015-08-11 17:04:31 +00:00
James Molloy 01cdeccdc7 Add new ISD nodes: ISD::FMINNAN and ISD::FMAXNAN
The intention of these is to be a corollary to ISD::FMINNUM/FMAXNUM,
differing only on how NaNs are treated. FMINNUM returns the non-NaN
input (when given one NaN and one non-NaN), FMINNAN returns the NaN
input instead.

This patch includes support for scalarizing, widening and splitting
vectors, but not expansion or softening. The reason is that these
should never be needed - FMINNAN nodes are only going to be created
in one place (SDAGBuilder::visitSelect) and there we'll check if the
node is legal or custom. I could preemptively add expand and soften
code, but I'm fairly opposed to adding code I can't test. It's bad
enough I can't create tests with this patch, but at least this code
will be exercised by the ARM and AArch64 backends fairly shortly.

llvm-svn: 244581
2015-08-11 09:13:05 +00:00
James Molloy 134bec2722 Add support for floating-point minnum and maxnum
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.

matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.

C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.

ARM and AArch64 at least have different instructions for these different cases.

llvm-svn: 244580
2015-08-11 09:12:57 +00:00
Alex Lorenz 2f43dd5a12 StackMap: FastISel: Add an appropriate number of immediate operands to the
frame setup instruction.

This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.

The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.

Reviewers: Juergen Ributzka

Differential Revision: http://reviews.llvm.org/D11853

llvm-svn: 244508
2015-08-10 21:27:03 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Chandler Carruth 50fee93926 [PM/AA] Simplify the AliasAnalysis interface by removing a wrapper
around a DataLayout interface in favor of directly querying DataLayout.

This wrapper specifically helped handle the case where this no
DataLayout, but LLVM now requires it simplifynig all of this. I've
updated callers to directly query DataLayout. This in turn exposed
a bunch of places where we should have DataLayout readily available but
don't which I've fixed. This then in turn exposed that we were passing
DataLayout around in a bunch of arguments rather than making it readily
available so I've also fixed that.

No functionality changed.

llvm-svn: 244189
2015-08-06 02:05:46 +00:00
Sanjay Patel b6a79f9916 revert r243687: enable fast-math-flag propagation to DAG nodes
We can't propagate FMF partially without breaking DAG-level CSE. We either need to
relax CSE to account for mismatched FMF as a temporary work-around or fully propagate
FMF throughout the DAG.

Surprisingly, there are no existing regression tests for this, but here's an example:

  define float @fmf(float %a, float %b) {
    %mul1 = fmul fast float %a, %b
    %nega = fsub fast float 0.0, %a
    %mul2 = fmul fast float %nega, %b
    %abx2 = fsub fast float %mul1, %mul2
    ret float %abx2
  }


$ llc -o - badflags.ll -march=x86-64 -mattr=fma -enable-unsafe-fp-math -enable-fmf-dag=0
...
    vmulss    %xmm1, %xmm0, %xmm0
    vaddss    %xmm0, %xmm0, %xmm0
    retq

$ llc -o - badflags.ll -march=x86-64 -mattr=fma -enable-unsafe-fp-math -enable-fmf-dag=1
...
    vmulss    %xmm1, %xmm0, %xmm2
    vfmadd213ss    %xmm2, %xmm1, %xmm0  <--- failed to recognize that (a * b) was already calculated
    retq

llvm-svn: 244053
2015-08-05 15:12:03 +00:00
Sanjay Patel 924879ad2c wrap OptSize and MinSize attributes for easier and consistent access (NFCI)
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).

Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.

This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.

Differential Revision: http://reviews.llvm.org/D11734

llvm-svn: 243994
2015-08-04 15:49:57 +00:00
Hal Finkel caf1149b8b [SDAG] Fix a result chain in ExpandUnalignedLoad
On the code path in ExpandUnalignedLoad which expands an unaligned vector/fp
value in terms of a legal integer load of the same size, the ChainResult needs
to be the chain result of the integer load.

No in-tree test case is currently available.

Patch by Jan Hranac!

llvm-svn: 243956
2015-08-04 06:29:12 +00:00
Ahmed Bougacha f65371a235 [CodeGen] Fix FCOPYSIGN legalization to account for mismatched types.
We used to legalize it like it's any other binary operations.  It's not,
because it accepts mismatched operand types.  Because of that, we used
to hit various asserts and miscompiles.

Specialize vector legalizations to, in the worst case, unroll, or, when
possible, to just legalize the operand that needs legalization.

Scalarization isn't covered, because I can't think of a target where
some but not all of the 1-element vector types are to be scalarized.

llvm-svn: 243924
2015-08-04 00:32:55 +00:00
Simon Pilgrim f328fd4441 Remove trailing whitespace. NFCI.
llvm-svn: 243838
2015-08-01 17:06:47 +00:00
Simon Pilgrim b447dc5aaa Use SDValue bool check. NFCI.
llvm-svn: 243837
2015-08-01 17:05:50 +00:00
Simon Pilgrim 503a2594c3 [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level
The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks.

This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.

There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.)

Differential Revision: http://reviews.llvm.org/D11518

llvm-svn: 243831
2015-08-01 10:01:46 +00:00
Duncan P. N. Exon Smith ed013cd221 DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.

Most of the testcase updates were generated by the following sed script:

    find test/ -name "*.ll" -o -name "*.mir" |
    xargs grep -l 'DILocalVariable' |
    xargs sed -i '' \
      -e 's/tag: DW_TAG_arg_variable, //' \
      -e 's/tag: DW_TAG_auto_variable, //'

There were only a handful of tests in `test/Assembly` that I needed to
update by hand.

(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`.  I've added a FIXME to that effect.)

llvm-svn: 243774
2015-07-31 18:58:39 +00:00
David Majnemer 654e130b6e New EH representation for MSVC compatibility
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support.  Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.

Differential Revision: http://reviews.llvm.org/D11097

llvm-svn: 243766
2015-07-31 17:58:14 +00:00
Sanjay Patel 1166f2ff9f fix memcpy/memset/memmove lowering when optimizing for size
Fixing MinSize attribute handling was discussed in D11363. 
This is a prerequisite patch to doing that.

The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).

The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize 
attribute must also exist. 

Fixing this more generally will be a follow-on patch or two.

Differential Revision: http://reviews.llvm.org/D11568

llvm-svn: 243693
2015-07-30 21:41:50 +00:00
Sanjay Patel a93cf60a77 enable fast-math-flag propagation to DAG nodes
This uncovered latent bugs previously:
http://reviews.llvm.org/D10403

...but it's time to try again because internal tests aren't finding more.

If time passes and no other bugs are reported, we can remove this cl::opt.

llvm-svn: 243687
2015-07-30 21:06:55 +00:00
Sanjay Patel 0f9dcf8b90 move DAGCombiner's allowableAlignment() helper function into the TLI
Making allowableAlignment() more accessible was suggested as a predecessor patch
for D10662, so I've pulled it into TargetLowering. This let's us remove 4 instances
of duplicate logic in LegalizeDAG.

There's a subtle functional change in the implementation: the existing 
allowableAlignment() code was using getPrefTypeAlignment() when checking 
alignment with the DataLayout and assumed that was fast. In this implementation,
we use getABITypeAlignment() and assume that is fast. See the TODO comment or the
discussion in the Phab review for future improvements in this implementation
(don't use the data layout at all).

There are no regression test changes from this difference, and I'm not sure how to
expose it via a test. I think we actually do want to provide the 'Fast' param when
checking this from DAGCombiner::MergeConsecutiveStores(). Ie, we shouldn't merge 
stores if the new stores are not going to be fast. But that change will require 
fixing allowsMisalignedMemoryAccess() overrides as noted in D10662.

Differential Revision: http://reviews.llvm.org/D10905

llvm-svn: 243549
2015-07-29 18:24:18 +00:00
Sanjoy Das cfe41f050c [Statepoints] Let patchable statepoints have a symbolic call target.
Summary:
As added initially, statepoints required their call targets to be a
constant pointer null if ``numPatchBytes`` was non-zero.  This turns out
to be a problem ergonomically, since there is no way to mark patchable
statepoints as calling a (readable) symbolic value.

This change remove the restriction of requiring ``null`` call targets
for patchable statepoints, and changes PlaceSafepoints to maintain the
symbolic call target through its transformation.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11550

llvm-svn: 243502
2015-07-28 23:50:30 +00:00
Sanjay Patel 133e68b45c ignore duplicate divisor uses when transforming into reciprocal multiplies (PR24141)
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.

After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.

The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.

Differential Revision: http://reviews.llvm.org/D11345

llvm-svn: 243500
2015-07-28 23:28:22 +00:00
Sanjay Patel 1dd15598cf fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold
This fix was suggested as part of D11345 and is part of fixing PR24141.

With this change, we can avoid walking the uses of a divisor node if the target
doesn't want the combineRepeatedFPDivisors transform in the first place.

There is no NFC-intended other than that.

Differential Revision: http://reviews.llvm.org/D11531

llvm-svn: 243498
2015-07-28 23:05:48 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Mehdi Amini b58f8137c1 Move the Target way of overriding DAG Scheduler to a target hook
Summary:
The previous way of overriding it was relying on calling "setDefault"
on the global registry, which implies global mutable state.

Reviewers: echristo, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11538

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243388
2015-07-28 06:18:04 +00:00
Sanjay Patel c1c2b87001 move combineRepeatedFPDivisors logic into a helper function; NFCI
llvm-svn: 243293
2015-07-27 17:58:49 +00:00
Chandler Carruth 96ada25bf3 [PM/AA] Remove all of the dead AliasAnalysis pointers being threaded
through APIs that are no longer necessary now that the update API has
been removed.

This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.

llvm-svn: 242882
2015-07-22 09:52:54 +00:00
Simon Pilgrim 4ef0576c40 [DAGCombiner] Fixed minor typo that was missed in D9097.
We don't bitcast the UNDEFs - that is done in visitVECTOR_SHUFFLE, and the getValueType should come from the operand's SDValue not the SDNode.

llvm-svn: 242640
2015-07-19 11:31:40 +00:00
Simon Pilgrim 3aca32ea4a Use SDValue bool check. NFCI.
llvm-svn: 242636
2015-07-19 09:56:36 +00:00
Matt Arsenault cabe02e141 Only do fmul (fadd x, x), c combine if the fadd only has one use
This was increasing the instruction count if the fadd has multiple uses.

llvm-svn: 242498
2015-07-17 01:14:35 +00:00
Matthias Braun 3cd00c1739 Fix __builtin_setjmp in combination with sjlj exception handling.
llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling
style but is also used in clang to implement __builtin_setjmp.  The ARM
backend needs to output additional dispatch tables for the SjLj
exception handling style, these tables however can't be emitted if
llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual
landing pad blocks exist.

To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is
introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj
exception handling lowering, so we can differentiate between the case
where we actually need to setup a dispatch table and the case where we
just need the __builtin_setjmp semantic.

Differential Revision: http://reviews.llvm.org/D9313

llvm-svn: 242481
2015-07-16 22:34:16 +00:00
James Molloy 7395a8182c [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
This adds new intrinsics "*absdiff" for absolute difference ops to facilitate efficient code generation for "sum of absolute differences" operation.
The patch also contains the introduction of corresponding SDNodes and basic legalization support.Sanity of the generated code is tested on X86.

This is 1st of the three patches.

Patch by Shahid Asghar-ahmad!

llvm-svn: 242409
2015-07-16 15:22:46 +00:00
Mehdi Amini bd7287ebe5 Move most user of TargetMachine::getDataLayout to the Module one
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

This patch is quite boring overall, except for some uglyness in
ASMPrinter which has a getDataLayout function but has some clients
that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so
some methods are taking a DataLayout as parameter.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11090

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242386
2015-07-16 06:11:10 +00:00
Cong Hou ab23bfbc0e Create a wrapper pass for BranchProbabilityInfo.
This new wrapper pass is useful when we want to do branch probability analysis conditionally (e.g. only in PGO mode) but don't want to add one more pass dependence.

http://reviews.llvm.org/D11241

llvm-svn: 242349
2015-07-15 22:48:29 +00:00
Alexey Bataev b9288601a3 [SDAG] Optimize unordered comparison in soft-float mode (patch by Anton Nadolskiy)
Current implementation handles unordered comparison poorly in soft-float mode. 
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0). 
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.
Differential Revision: http://reviews.llvm.org/D10804

llvm-svn: 242280
2015-07-15 08:39:35 +00:00
Pete Cooper 6923461a16 Use enum instead of unsigned. NFC.
The unsigned opcode argument here was the result of BinaryOperator->getOpcode().
That returns a BinaryOps enum which is more accurate than passing around an
unsigned.

llvm-svn: 242265
2015-07-15 01:31:26 +00:00
Pete Cooper a8127d8c92 Use cast<> instead of dyn_cast to remove llvm_unreachable. NFC.
This code was checking if we are an ICmpInst or FCmpInst then throwing
unreachable if we are neither.  We must be one or the other, so use a
cast on the FCmpInst case to ensure that we are that case.  Then we can
avoid having an unreachable but still catch an error if we ever had another
subclass of CmpInst.

llvm-svn: 242264
2015-07-15 01:31:23 +00:00
Pete Cooper 20dc71b1f1 Use another foreach loop. NFC
llvm-svn: 242263
2015-07-15 01:31:20 +00:00
Pete Cooper 6a96c61659 Use getAnyExtOrTrunc helper instead of manually doing ext/trunc check. NFC.
The code here was doing exactly what is already in getAnyExtOrTrunc().
Just use that method instead.

llvm-svn: 242261
2015-07-15 00:43:57 +00:00
Pete Cooper 8acd386969 Use getZExtOrTrunc helper instead of manually doing zext/trunc check. NFC.
The code here was doing exactly what is already in getZExtOrTrunc().
Just use that method instead.

llvm-svn: 242260
2015-07-15 00:43:54 +00:00
Pete Cooper 7e747d26c5 Use getStoreSize() instead of getStoreSizeInBits()/8. NFC.
The calls here were both to getStoreSizeInBits() which multiplies by 8.
We then immediately divided by 8.  Calling getStoreSize() returns the
values we need without the extra arithmetic.

llvm-svn: 242254
2015-07-15 00:07:55 +00:00
Pete Cooper 7e64ef06e6 Use more foreach loops in SelectionDAG. NFC
llvm-svn: 242249
2015-07-14 23:43:29 +00:00
Pete Cooper 65c69407c8 Add allnodes() iterator range to SelectionDAG. NFC.
SelectionDAG already had begin/end methods for iterating over all
the nodes, but didn't define an iterator_range for us in foreach
loops.

This adds such a method and uses it in some of the eligible places
throughout the backends.

llvm-svn: 242212
2015-07-14 22:10:54 +00:00
Pete Cooper 06e249e713 Constify parameters in SelectionDAG methods. NFC
llvm-svn: 242210
2015-07-14 21:54:52 +00:00
Pete Cooper cf17e18f4e Remove unnecessary .getNode() in SelectionDAG. NFC.
The simplify_type specialisation allows us to cast directly from
SDValue to an SDNode* subclass so we don't need to pass a SDNode*
to cast<>.

llvm-svn: 242209
2015-07-14 21:54:48 +00:00
Pete Cooper e89ba67f72 Use more foreach loops in SelectionDAG. NFC
llvm-svn: 242208
2015-07-14 21:54:45 +00:00
Matthias Braun 75e668ea6e Revert "LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization"
Accidental commit, needs review first.

This reverts commit r242107.

llvm-svn: 242108
2015-07-14 02:09:57 +00:00
Matthias Braun 4ac4ecdadf LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization
- Factor out code to query and modify the sign bit of a floatingpoint
  value as an integer. This also works if none of the targets integer
  types is big enough to hold all bits of the floatingpoint value.

- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
  otherwise perform bit manipulation on the sign bit. The previous code
  used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
  takes 34 instructions on ARM Cortex-M4. With this patch we only
  require 5:
    vldr d0, LCPI0_0
    vmov r2, r3, d0
    lsrs r2, r3, #31
    bfi r1, r2, #31, #1
    bx lr
  (This could be further improved if the compiler would recognize that
   r2, r3 is zero).

- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
  available otherwise perform bit manipulation on the sign bit.

- Perform the sign(x) test by masking out the sign bit and comparing
  with 0 rather than shifting the sign bit to the highest position and
  testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
    testl $32768, %eax
  rather than:
    shlq $48, %rax
    sets %al
    testb %al, %al

llvm-svn: 242107
2015-07-14 02:08:26 +00:00
James Y Knight 46f91c8457 Fix handling of the 'n' asm constraint with invalid operands.
It had accidently accepted a symbol+offset value (and emitted
incorrect code for it, keeping only the offset part) instead of
properly reporting the constraint as invalid.

Differential Revision: http://reviews.llvm.org/D11039

llvm-svn: 242040
2015-07-13 16:36:22 +00:00
Matt Arsenault f54dc2384d DAGCombiner: Assume invariant load cannot alias a store
The motivation is to allow GatherAllAliases / FindBetterChain
to not give up on dependent loads of a pointer from constant memory.

This is important for AMDGPU, because most loads are pointers
derived from a load of a kernel argument from constant memory.

llvm-svn: 241948
2015-07-10 22:17:40 +00:00
Fiona Glaser b08ae7affb ComputeKnownBits: be a bit smarter about ADDs
If our two inputs have known top-zero bit counts M and N, we trivially
know that the output cannot have any bits set in the top (min(M, N)-1)
bits, since nothing could carry past that point.

llvm-svn: 241927
2015-07-10 18:29:02 +00:00
David Majnemer db82d2f338 Revert the new EH instructions
This reverts commits r241888-r241891, I didn't mean to commit them.

llvm-svn: 241893
2015-07-10 07:15:17 +00:00
David Majnemer ae2ffc8a8c New EH representation for MSVC compatibility
Summary:
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support.  Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.

Reviewers: rnk, JosephTremoulet, reames, nlewycky, rjmccall

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11041

llvm-svn: 241888
2015-07-10 07:00:44 +00:00
Reid Kleckner 0f7f8d41f7 Remove dead code from old 64-bit SEH lowering
llvm-svn: 241829
2015-07-09 17:46:39 +00:00
Pat Gavlin a717f255b6 Allow {e,r}bp as the target of {read,write}_register.
This patch allows the read_register and write_register intrinsics to
read/write the RBP/EBP registers on X86 iff the targeted register is
the frame pointer for the containing function.

Differential Revision: http://reviews.llvm.org/D10977

llvm-svn: 241827
2015-07-09 17:40:29 +00:00
Sanjay Patel e2361d4a18 fix an invisible bug when combining repeated FP divisors
This patch fixes bugs that were exposed by the addition of fast-math-flags in the DAG:
r237046 ( http://reviews.llvm.org/rL237046 ):

1. When replacing a division node, it's not enough to RAUW.
   We should call CombineTo() to delete dead nodes and combine again.
2. Because we are changing the DAG, we can't return an empty SDValue
   after the transform. As the code comments say:

    Visitation implementation - Implement dag node combining for different node types.
    The semantics are as follows: Return Value:
      SDValue.getNode() == 0 - No change was made
      SDValue.getNode() == N - N was replaced, is dead and has been handled.
      otherwise - N should be replaced by the returned Operand.

The new test case shows no difference with or without this patch, but it will crash if
we re-apply r237046 or enable FMF via the current -enable-fmf-dag cl::opt.

Differential Revision: http://reviews.llvm.org/D9893

llvm-svn: 241826
2015-07-09 17:28:37 +00:00
Mehdi Amini eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Pawel Bylica d1b818bcf4 Reapply fixed r241790: Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241806
2015-07-09 14:58:04 +00:00
Pawel Bylica 627762fda5 Revert r241790: Fix shift legalization and lowering for big constants.
llvm-svn: 241792
2015-07-09 09:50:54 +00:00
Pawel Bylica eb122f2baf Fix shift legalization and lowering for big constants.
Summary: If shift amount is a constant value > 64 bit it is handled incorrectly during type legalization and X86 lowering. This patch the type of shift amount argument in function DAGTypeLegalizer::ExpandShiftByConstant from unsigned to APInt.

Reviewers: nadav, majnemer, sanjoy, RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10767

llvm-svn: 241790
2015-07-09 08:01:36 +00:00
Elena Demikhovsky 37a4da825f Extended syntax of vector version of getelementptr instruction.
The justification of this change is here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-March/082989.html

According to the current GEP syntax, vector GEP requires that each index must be a vector with the same number of elements.

%A = getelementptr i8, <4 x i8*> %ptrs, <4 x i64> %offsets

In this implementation I let each index be or vector or scalar. All vector indices must have the same number of elements. The scalar value will mean the splat vector value.

(1) %A = getelementptr i8, i8* %ptr, <4 x i64> %offsets
or
(2) %A = getelementptr i8, <4 x i8*> %ptrs, i64 %offset

In all cases the %A type is <4 x i8*>

In the case (2) we add the same offset to all pointers.

The case (1) covers C[B[i]] case, when we have the same base C and different offsets B[i].

The documentation is updated.

http://reviews.llvm.org/D10496

llvm-svn: 241788
2015-07-09 07:42:48 +00:00
Mehdi Amini 157e5a6d10 Remove getDataLayout() from TargetSelectionDAGInfo (had no users)
Summary:
Remove empty subclass in the process.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted

Differential Revision: http://reviews.llvm.org/D11045

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241780
2015-07-09 02:10:08 +00:00
Mehdi Amini a749f2ad47 Remove getDataLayout() from TargetLowering
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11042

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
2015-07-09 02:09:52 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 5c183d5239 Make getByValTypeAlignment() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11038

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241777
2015-07-09 02:09:28 +00:00
Mehdi Amini 9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Mehdi Amini 56228dabfa Redirect DataLayout from TargetMachine to Module in ComputeValueVTs()
Summary:
Avoid using the TargetMachine owned DataLayout and use the Module owned
one instead. This requires passing the DataLayout up the stack to
ComputeValueVTs().

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11019

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241773
2015-07-09 01:57:34 +00:00
Sanjay Patel c1afa95a51 early exits -> less indenting; NFCI
llvm-svn: 241716
2015-07-08 19:32:39 +00:00
Mehdi Amini ffc1402fad Remove IsLittleEndian from TargetLowering and redirect to DataLayout
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11017

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241655
2015-07-08 01:00:38 +00:00
Reid Kleckner e69bdb8619 [WinEH] Make llvm.x86.seh.restoreframe work for stack realignment prologues
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.

llvm-svn: 241648
2015-07-07 23:45:58 +00:00
Reid Kleckner 60381791b5 Rename llvm.frameescape and llvm.framerecover to localescape and localrecover
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.

These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.

Naming suggestions at this point are welcome, I'm happy to re-run sed.

Reviewers: majnemer, nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11011

llvm-svn: 241633
2015-07-07 22:25:32 +00:00
Mehdi Amini 8ac7a9d57a Redirect DataLayout from TargetMachine to Module in SelectionDAG
Summary:
SelectionDAG itself is not invoking directly the DataLayout in the
TargetMachine, but the "TargetLowering" class is still using it. I'll
address it in a following commit.

This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11000

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241618
2015-07-07 19:07:19 +00:00
Mehdi Amini 7da8b536f4 Redirect DataLayout from TargetMachine to Module in FastISel
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10985

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241613
2015-07-07 18:39:02 +00:00
Peter Collingbourne 6a9d1774d0 IR: Do not consider available_externally linkage to be linker-weak.
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.

Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.

Differential Revision: http://reviews.llvm.org/D10941

llvm-svn: 241413
2015-07-05 20:52:35 +00:00
Benjamin Kramer 9bfb627a0e [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Nadav Rotem 754eb7c563 Fix an overly aggressive assertion in getCopyFromPartsVector.
The assertion in getCopyFromPartsVector assumed that the vector 'part' must
match the type of argument (arguments are potentially split into multiple
parts). However, in some cases the targets return a 'part' of the right size
but with a different type. We already handle this case correctly later on
and generate a bitcast. This commit just makes sure that we are actually
checking the property that we care about.

llvm-svn: 241312
2015-07-02 23:23:52 +00:00
Akira Hatanaka 56c70441dc Use function attribute "trap-func-name" and remove TargetOptions::TrapFuncName.
This commit changes normal isel and fast isel to read the user-defined trap
function name from function attribute "trap-func-name" attached to llvm.trap or
llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to
use clang's command line option "-ftrap-function" for LTO and enable changing
the trap function name on a per-call-site basis.

Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the
trap function name should attach attribute "trap-func-name" to the call sites
of llvm.trap and llvm.debugtrap instead.

rdar://problem/21225723

Differential Revision: http://reviews.llvm.org/D10832

llvm-svn: 241305
2015-07-02 22:13:27 +00:00
Pawel Bylica c52eabb285 Reapply r240291: Fix shl folding in DAG combiner.
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.

llvm-svn: 241254
2015-07-02 11:44:54 +00:00
Sanjoy Das bbb2e8234c [NFC] Make the Statepoint class more like CallSite
Summary: Rename some methods to make Statepoint look more like CallSite.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10756

llvm-svn: 241235
2015-07-02 02:53:45 +00:00
Benjamin Kramer 85b2815aba [SDAG] Give InstrEmitter hidden visibility
NFC.

llvm-svn: 241165
2015-07-01 14:55:10 +00:00
Pawel Bylica 143ceb6d46 [DAGCombiner] Fix & simplify constant folding of sext/zext.
Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$

Test Plan: New regression test included.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: RKSimon, llvm-commits

Differential Revision: http://reviews.llvm.org/D10607

llvm-svn: 240991
2015-06-29 20:28:47 +00:00
Benjamin Kramer 5b455f0b62 [SDAG] Now that we have a way to communicate the exact bit on sdiv use it to simplify sdiv by a constant.
We had a hack in SDAGBuilder in place to work around this but now we
can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can
perform this trick automatically.

The added check in DAGCombiner is necessary to prevent exact sdiv by pow2
from regressing as the target-specific pow2 lowering is not aware of
exact bits yet.

This is mostly covered by existing tests. One side effect is that we
get the better lowering for exact vector sdivs now too :)

llvm-svn: 240891
2015-06-27 20:33:26 +00:00
Pete Cooper 485d1146db Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240822
2015-06-26 19:37:02 +00:00
Pete Cooper af61ac71e2 Wrap assert loops in #ifndef NDEBUG
The body of the loops here only contained asserts.  This triggered an unused variable
warning on release builds and -Werror on the bots.

llvm-svn: 240819
2015-06-26 19:23:20 +00:00
Pete Cooper 9271ccc345 Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240817
2015-06-26 19:18:49 +00:00
Pete Cooper 8fc121dfc4 Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240815
2015-06-26 19:08:33 +00:00
Pete Cooper 8c0a710995 Convert a bunch of loops to foreach. NFC.
This uses the new SDNode::op_values() iterator range committed in r240805.

llvm-svn: 240809
2015-06-26 18:41:54 +00:00
Benjamin Kramer 1dcd8b09b4 [DAGCombine] Fix demanded bits computation for exact shifts.
Fixes a miscompilation of MultiSource/Benchmarks/MallocBench/gs

llvm-svn: 240796
2015-06-26 16:59:31 +00:00
Benjamin Kramer c2ae767377 [DAGCombiner] Preserve the exact bit when simplifying SRA to SRL.
Allows more aggressive folding of ashr/shl pairs.

llvm-svn: 240788
2015-06-26 14:51:49 +00:00
Benjamin Kramer 07e70b4fa4 [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)
Instcombine also does this but many opportunities only become visible
after GEPs are lowered.

llvm-svn: 240787
2015-06-26 14:51:36 +00:00
Matt Arsenault f735cab986 DAGCombiner: Use pop_back_val()
llvm-svn: 240709
2015-06-25 22:15:05 +00:00
Sanjay Patel e4aedb55d6 fix typos; NFC
llvm-svn: 240699
2015-06-25 21:11:08 +00:00
Matt Arsenault c244dcb804 DAGCombiner: Remove redundant check
MemIntrinsicSDNode is already a subclass of MemSDNode,
so the MemSDNode check is sufficient.

llvm-svn: 240672
2015-06-25 18:47:02 +00:00
Pawel Bylica cc35812877 Fix instruction scheduling live register tracking
Summary:
This patch fixes PR23405 (https://llvm.org/bugs/show_bug.cgi?id=23405).

During a node unscheduling an entry in LiveRegGens can be replaced with a new value. That corrupts the live reg tracking and LiveReg* structure is not cleared as should be during unscheduling. Problematic condition that enforces Gen replacement is `I->getSUnit()->getHeight() < LiveRegGens[I->getReg()]->getHeight()`. This condition should be checked only if LiveRegGen was set in current node unscheduling.

Test Plan: Regression test included.

Reviewers: hfinkel, atrick

Reviewed By: atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9993

llvm-svn: 240538
2015-06-24 12:49:42 +00:00
Rafael Espindola ce4c2bc1d6 Use MCSymbols for FastISel.
The summary is that it moves the mangling earlier and replaces a few
calls to .addExternalSymbol with addSym.

I originally wanted to replace all the uses of addExternalSymbol with
addSym, but noticed it was a lot of work and doesn't need to be done
all at once.

llvm-svn: 240395
2015-06-23 12:21:54 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Pawel Bylica e6fd8c4232 Revert r240291: causes problems in self-hosted builds.
llvm-svn: 240343
2015-06-22 21:54:07 +00:00
Rafael Espindola 36b718fc74 Avoid a Symbol -> Name -> Symbol conversion.
Before this we were producing a TargetExternalSymbol from a MCSymbol.
That meant extracting the symbol name and fetching the symbol again
down the pipeline.

This patch adds a DAG.getMCSymbol that lets the MCSymbol pass unchanged on the
DAG.

Doing so removes the need for MO_NOPREFIX and fixes the root cause of pr23900,
allowing r240130 to be committed again.

llvm-svn: 240300
2015-06-22 17:46:53 +00:00
Pawel Bylica 06407c0320 Fix shl folding in DAG combiner.
Summary: The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

Test Plan: A regression test included.

Reviewers: andreadb

Reviewed By: andreadb

Subscribers: andreadb, test, llvm-commits

Differential Revision: http://reviews.llvm.org/D10602

llvm-svn: 240291
2015-06-22 15:58:11 +00:00
Chandler Carruth c3f49eb451 [PM/AA] Hoist the AliasResult enum out of the AliasAnalysis class.
This will allow classes to implement the AA interface without deriving
from the class or referencing an internal enum of some other class as
their return types.

Also, to a pretty fundamental extent, concepts such as 'NoAlias',
'MayAlias', and 'MustAlias' are first class concepts in LLVM and we
aren't saving anything by scoping them heavily.

My mild preference would have been to use a scoped enum, but that
feature is essentially completely broken AFAICT. I'm extremely
disappointed. For example, we cannot through any reasonable[1] means
construct an enum class (or analog) which has scoped names but converts
to a boolean in order to test for the possibility of aliasing.

[1]: Richard Smith came up with a "solution", but it requires class
templates, and lots of boilerplate setting up the enumeration multiple
times. Something like Boost.PP could potentially bundle this up, but
even that would be quite painful and it doesn't seem realistically worth
it. The enum class solution would probably work without the need for
a bool conversion.

Differential Revision: http://reviews.llvm.org/D10495

llvm-svn: 240255
2015-06-22 02:16:51 +00:00
Hans Wennborg 6ed81cbcdb Switch lowering: add heuristic for filling leaf nodes in the weight-balanced binary search tree
Sparse switches with profile info are lowered as weight-balanced BSTs. For
example, if the node weights are {1,1,1,1,1,1000}, the right-most node would
end up in a tree by itself, bringing it closer to the top.

However, a leaf in this BST can contain up to 3 cases, and having a single
case in a leaf node as in the example means the tree might become
unnecessarily high.

This patch adds a heauristic to the pivot selection algorithm that moves more
cases into leaf nodes unless that would lower their rank. It still doesn't
yield the optimal tree in every case, but I believe it's conservatibely correct.

llvm-svn: 240224
2015-06-20 17:14:07 +00:00
Sanjoy Das d200893741 [Statepoint] Remove unnecessary argument from Statepoint::getRelocates
NFC.

llvm-svn: 240198
2015-06-20 00:01:03 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Hans Wennborg 67d492a544 Switch lowering: enable whole-switch jump tables at -O0.
To same compile time, the analysis to find dense case-clusters in switches is
not done at -O0. However, when the whole switch is dense enough, it is easy to
turn it into a jump table, resulting in much faster code with no extra effort.

llvm-svn: 240071
2015-06-18 22:22:30 +00:00
Sanjay Patel 8730ef78f8 fix typo; NFC
llvm-svn: 240022
2015-06-18 15:53:33 +00:00
Sanjay Patel a3f423b4fc remove unnecessary casts; NFC
llvm-svn: 239942
2015-06-17 20:54:46 +00:00
David Majnemer 7fddeccb8b Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

llvm-svn: 239940
2015-06-17 20:52:32 +00:00
Sanjay Patel dcaa53791c fix typos in comments; NFC
llvm-svn: 239916
2015-06-17 16:34:48 +00:00
Chandler Carruth ac80dc7532 [PM/AA] Remove the Location typedef from the AliasAnalysis class now
that it is its own entity in the form of MemoryLocation, and update all
the callers.

This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.

llvm-svn: 239885
2015-06-17 07:18:54 +00:00
Sanjay Patel 0fcc53f6d6 rename variables; NFC
...because I see 'StoreBW' and read it as 'store bandwidth'

llvm-svn: 239850
2015-06-16 20:47:19 +00:00
Sanjay Patel bb385ed454 extract some code into a helper function for MergeConsecutiveStores(); NFCI
llvm-svn: 239847
2015-06-16 20:05:00 +00:00
Sanjay Patel f134048b1d propagate IR-level fast-math-flags to DAG nodes, disabled by default
This is an updated version of the patch that was checked in at:
http://reviews.llvm.org/rL237046

but subsequently reverted because it exposed a bug in the DAG Combiner:
http://reviews.llvm.org/D9893

This time, there's an enablement flag ("EnableFMFInDAG") around the code in
SelectionDAGBuilder where we copy the set of FP optimization flags from IR
instructions to DAG nodes. So, in theory, there should be no functional change
from this patch as-is, but it will allow testing with the added functionality
to proceed via "-enable-fmf-dag" passed to llc.

This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

Differential Revision: http://reviews.llvm.org/D10403

llvm-svn: 239828
2015-06-16 16:25:43 +00:00
Matt Arsenault ed891b5561 Revert "Revert "Fix merges of non-zero vector stores""
Reapply r239539. Don't assume the collected number of
stores is the same vector size. Just take the first N
stores to fill the vector.

llvm-svn: 239825
2015-06-16 15:51:48 +00:00
Simon Pilgrim d3f6427446 [DAGCombiner] Added BSWAP(BSWAP(x)) -> x combine pattern.
llvm-svn: 239682
2015-06-13 16:25:12 +00:00
Simon Pilgrim 2c35e7a264 [SelectionDAG] Added assertions + UNDEF handling for BSWAP node creation.
llvm-svn: 239679
2015-06-13 15:23:58 +00:00
Simon Pilgrim 011381d48b [DAGCombiner] Added BSWAP vector constant folding support.
llvm-svn: 239675
2015-06-13 14:08:15 +00:00
Simon Pilgrim 096cccd01a Stripped trailing whitespace. NFC.
llvm-svn: 239674
2015-06-13 12:57:36 +00:00
Reid Kleckner 2691c59e97 Revert "Fix merges of non-zero vector stores"
This reverts commit r239539.

It was causing SDAG assertions while building freetype.

llvm-svn: 239543
2015-06-11 17:25:24 +00:00
Matt Arsenault e23a063dc3 Fix merges of non-zero vector stores
Now actually stores the non-zero constant instead of 0.
I somehow forgot to include this part of r238108.

The test change was just an independent instruction order swap,
so just add another check line to satisfy CHECK-NEXT.

llvm-svn: 239539
2015-06-11 16:03:52 +00:00
Igor Laevsky 346ff628f7 [StatepointLowering] Reuse stack slots across basic blocks
During statepoint lowering we can sometimes avoid spilling of the value if we know that it was already spilled for previous statepoint.
We were doing this by checking if incoming statepoint value was lowered into load from stack slot. This was working only in boundaries of one basic block.

But instead of looking at the lowered node we can look directly at the llvm-ir value and if it was gc.relocate (or some simple modification of it) look up stack slot for it's derived pointer and reuse stack slot from it. This allows us to look across basic block boundaries.

Differential Revision: http://reviews.llvm.org/D10251

llvm-svn: 239472
2015-06-10 12:31:53 +00:00
Reid Kleckner f12c030f48 [WinEH] Add 32-bit SEH state table emission prototype
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.

The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.

llvm-svn: 239433
2015-06-09 21:42:19 +00:00
Matt Arsenault 705eb8f6b1 Implement computeKnownBits for min/max nodes
llvm-svn: 239378
2015-06-09 00:52:41 +00:00
Simon Pilgrim 4791f6d89b [DAGCombiner] Added CTLZ vector constant folding support.
llvm-svn: 239305
2015-06-08 16:19:00 +00:00
Simon Pilgrim c789e1d57b [DAGCombiner] Added CTTZ vector constant folding support.
llvm-svn: 239293
2015-06-08 09:57:09 +00:00
Simon Pilgrim 68cd237f57 [DAGCombiner] Added CTPOP vector constant folding support.
Added tests to the existing SSE/AVX test files.

llvm-svn: 239252
2015-06-07 15:37:14 +00:00
Fiona Glaser 666e352440 DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is free
For targets with a free fneg, this fold is always a net loss if it
ends up duplicating the multiply, so definitely avoid it.

This might be true for some targets without a free fneg too, but
I'll leave that for future investigation.

llvm-svn: 239167
2015-06-05 17:52:34 +00:00
Andrea Di Biagio eb33134ce7 Simplify code; NFC.
Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into
CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using
update_llc_test_checks.py.

llvm-svn: 239142
2015-06-05 10:29:55 +00:00
Swaroop Sridhar 70d18df18f Statepoint: Fix handling of Far Immediate calls
gc.statepoint intrinsics with a far immediate call target 
were lowered incorrectly as pc-rel32 calls.

This change fixes the problem, and generates an indirect call 
via a scratch register.

For example: 

Intrinsic:
  %safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* inttoptr (i64 140727162896504 to void ()*), i32 0, i32 0, i32 0, i32 0)

Old Incorrect Lowering:
  callq 140727162896504

New Correct Lowering:
  movabsq $140727162896504, %rax 
  callq *%rax

In lowerCallFromStatepoint(), the callee-target was modified and 
represented as a "TargetConstant" node, rather than a "Constant" node.
Undoing this modification enabled LowerCall() to generate the 
correct CALL instruction.

llvm-svn: 239114
2015-06-04 23:03:21 +00:00
Benjamin Kramer ff0fb6936b [SDAG switch lowering] Fix switch case -> or merging for 0 and INT_MIN
The big/small ordering here is based on signed values so SmallValue will
be INT_MIN and BigValue 0. This shouldn't be a problem but the code
assumed that BigValue always had more bits set than SmallValue.

We used to just miss the transformation, but a recent refactoring of
mine turned this into an assertion failure.

llvm-svn: 239105
2015-06-04 22:05:51 +00:00
Sergey Dmitrouk 3160d02b5b Erase constant dbgloc on reuse in PHI node
Basic block selection involves checking successor BBs for PHI nodes
that depend on the current BB.  In case such BBs are found, the value
being selected is a constant and such constant already exists in
current BB, it's value is reused.

This might lead to wrong locations in some situations, especially if
same constant value ends up being materialized twice in two different
ways, which discards that sharing and leaves us with wrong debug
location in the successor BB.

In code this involves the following sequence of calls:

 SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks ->
 SelectionDAGBuilder::CopyValueToVirtualRegister ->
 SelectionDAGBuilder::getNonRegisterValue

llvm-svn: 239089
2015-06-04 20:48:40 +00:00
Andrea Di Biagio 9ac8a6b13d [DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.

However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.

Example:

define <2 x double> @example(<2 x double> %A) {
entry:
  %0 = extractelement <2 x double> %A, i32 0
  %1 = insertelement <2 x double> undef, double %0, i32 0
  %2 = insertelement <2 x double> %1, double -0.0, i32 1
  ret <2 x double> %2
}

Before this patch, llc (with -mattr=+sse4.1) wrongly generated
  movq   %xmm0, %xmm0  # xmm0 = xmm0[0],zero

So, the sign bit of the negative zero was effectively lost.

This patch fixes the problem by adding explicit checks for positive zero.

With this patch, llc produces the following code for the example above:
  movhpd .LCPI0_0(%rip), %xmm0

where .LCPI0_0 referes to a 'double -0'.

llvm-svn: 239070
2015-06-04 19:15:01 +00:00
Benjamin Kramer 185579bf0c [SDag switch lowering] Simplify code a bit. No functional change intended.
llvm-svn: 239056
2015-06-04 17:07:59 +00:00
Matt Arsenault ca519dc28b Pass address space to isLegalAddressingMode in DAGCombiner
No test because I don't know of a target that makes use
of address spaces and indexed load / store.

llvm-svn: 239051
2015-06-04 16:17:34 +00:00
Hans Wennborg d922915685 Switch lowering: fix assert in buildBitTests (PR23738)
When checking (High - Low + 1).sle(BitWidth), BitWidth would be truncated
to the size of the left-hand side. In the case of this PR, the left-hand
side was i4, so BitWidth=64 got truncated to 0 and the assert failed.

llvm-svn: 239048
2015-06-04 15:55:00 +00:00
James Molloy 37593732a4 Don't create a MIN/MAX node if the underlying compare has more than one use.
If the compare in a select pattern has another use then it can't be removed, so we'd just
be creating repeated code if we created a min/max node.

Spotted by Matt Arsenault!

llvm-svn: 239037
2015-06-04 13:48:23 +00:00
Sanjoy Das 513aadecac [SelectionDAG] Fix PR23603.
Summary:
LLVM's MI level notion of invariant_load is different from LLVM's IR
level notion of invariant_load with respect to dereferenceability.  The
IR notion of invariant_load only guarantees that all *non-faulting*
invariant loads result in the same value.  The MI notion of invariant
load guarantees that the load can be legally moved to any location
within its containing function.  The MI notion of invariant_load is
stronger than the IR notion of invariant_load -- an MI invariant_load is
an IR invariant_load + a guarantee that the location being loaded from
is dereferenceable throughout the function's lifetime.

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10075

llvm-svn: 238881
2015-06-02 22:33:30 +00:00
Chandler Carruth 502b23a7a9 [sdag] Add the helper I most want to the DAG -- building a bitcast
around a value using its existing SDLoc.

Start using this in just one function to save omg lines of code.

llvm-svn: 238638
2015-05-30 04:14:10 +00:00
Jim Grosbach 13760bd152 MC: Clean up MCExpr naming. NFC.
llvm-svn: 238634
2015-05-30 01:25:56 +00:00
Fiona Glaser b82e33106b SelectionDAG: fix logic for promoting shift types
r238503 fixed the problem of too-small shift types by promoting them
during legalization, but the correct solution is to promote only the
operands that actually demand promotion.

This fixes a crash on an out-of-tree target caused by trying to
promote an operand that can't be promoted.

llvm-svn: 238632
2015-05-29 23:37:22 +00:00
Benjamin Kramer f5e2fc474d Replace push_back(Constructor(foo)) with emplace_back(foo) for non-trivial types
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.


Call sites were found with the ASTMatcher + some semi-automated cleanup.

memberCallExpr(
    argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
    on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
    hasArgument(0, bindTemporaryExpr(
                       hasType(recordDecl(hasNonTrivialDestructor())),
                       has(constructExpr()))),
    unless(isInTemplateInstantiation()))

No functional change intended.

llvm-svn: 238602
2015-05-29 19:43:39 +00:00
Reid Kleckner fe4d491bd9 [WinEH] Start inserting state number stores for C++ EH
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.

Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.

llvm-svn: 238514
2015-05-28 22:00:24 +00:00
David Majnemer 22d2b02706 [SelectionDAG] Scalar shift amounts may require legalization
The shift amount may be too small to cope with promoted left hand side,
make sure to promote it as well.

This fixes PR23664.

llvm-svn: 238503
2015-05-28 21:29:59 +00:00
Jan Vesely 86f2fda623 SelectionDAG: Don't do libcall on div/rem if divrem is custom
v2: TargetLoweringBase:: -> TargetLowering::
    Use Ops array
v3: Explicitly use value 0 for ?DIV
    Remove redundant newline

Differential revision: http://reviews.llvm.org/D7803
reviewer: ab

llvm-svn: 238336
2015-05-27 16:54:09 +00:00
Elena Demikhovsky 1c1391ba24 Added promotion to EXTRACT_SUBVECTOR operand.
I encountered with this case in one of KNL tests for i1 vectors.
v16i1 = EXTRACT_SUBVECTOR v32i1, x

llvm-svn: 238130
2015-05-25 11:33:13 +00:00
Matt Arsenault 65ad1602b0 Add target hook to allow merging stores of nonzero constants
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.

Most of the new testcases aren't optimally merged, and are for
later improvements.

llvm-svn: 238108
2015-05-24 00:51:27 +00:00
Duncan P. N. Exon Smith 0c54197d31 SDAG: Give SDDbgValues their own allocator (and reset it)
Previously `SDDbgValue`s used the general allocator that lives for all
of `SelectionDAG`.  Instead, give them their own allocator, and reset it
whenever `SDDbgInfo::clear()` is called, plugging a spiritual leak.

This drops `SelectionDAGBuilder::visitIntrinsicCall()` off of my heap
profile (was at around 2% of `llc` for codegen of `-flto -g`).  Thanks
to Pete Cooper for spotting the problem and suggesting the fix.

llvm-svn: 237998
2015-05-22 05:45:19 +00:00
Duncan P. N. Exon Smith 1f0c1c4f47 SDAG: Cleanup initialization of SDDbgValue, NFC
Cleanup how `SDDbgValue` is initialized, and rearrange the fields to
save two pointers in the struct layout.  No real functionality change
though (and I doubt the memory savings would show up in a profile).

llvm-svn: 237997
2015-05-22 05:35:53 +00:00
Simon Pilgrim e054199354 [X86][SSE] Improve support for 128-bit vector sign extension
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).

It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.

Differential Revision: http://reviews.llvm.org/D9848

llvm-svn: 237885
2015-05-21 10:05:03 +00:00
Andrew Kaylor 69fc4418ab Fix build warning
llvm-svn: 237855
2015-05-20 23:28:03 +00:00
Andrew Kaylor a6c5b9682e [WinEH] C++ EH state numbering fixes
Differential Revision: http://reviews.llvm.org/D9787

llvm-svn: 237854
2015-05-20 23:22:24 +00:00
Matthias Braun 56a781495a DAGCombiner: Continue combining if FoldConstantArithmetic() fails.
DAG.FoldConstantArithmetic() can fail even though both operands are
Constants if OpaqueConstants are involved. Continue trying other combine
possibilities in tis case.

Differential Revision: http://reviews.llvm.org/D6946

Somewhat related to PR21801 / rdar://19211454

llvm-svn: 237822
2015-05-20 18:54:02 +00:00
Pawel Bylica 8011da9628 Fix icmp lowering
Summary:
During icmp lowering it can happen that a constant value can be larger than expected (see the code around the change).
APInt::getMinSignedBits() must be checked again as the shift before can change the constant sign to positive.
I'm not sure it is the best fix possible though.

Test Plan: Regression test included.

Reviewers: resistor, chandlerc, spatel, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D9147

llvm-svn: 237812
2015-05-20 17:21:09 +00:00
Pete Cooper 9e1d335697 Change Function::getIntrinsicID() to return an Intrinsic::ID. NFC.
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.

This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.

llvm-svn: 237810
2015-05-20 17:16:39 +00:00
Igor Laevsky 423bc9ec4c [StatepointLowering] Support of the gc.relocates for invoke statepoints.
This change implements support for lowering of the gc.relocates tied to the invoke statepoint.
This is acomplished by storing frame indices of the lowered values in "StatepointRelocatedValues" map inside FunctionLoweringInfo instead of storing them in per-basic block structure StatepointLowering.
After this change StatepointLowering is used only during "LowerStatepoint" call and it is not necessary to store it as a field in SelectionDAGBuilder anymore.

Differential Revision: http://reviews.llvm.org/D7798

llvm-svn: 237786
2015-05-20 11:37:25 +00:00
Philip Reames 7738dd68cf Remove a stale comment
The todo was implemented a while ago; I just forgot to remove the comment.  

llvm-svn: 237736
2015-05-19 22:26:33 +00:00
Sanjay Patel 03abbb48a4 use 'auto *' for pointers; clearer usage, no deep copying
llvm-svn: 237719
2015-05-19 20:10:16 +00:00
Sanjay Patel ad11415962 tidy up
1. remove duplicate local variable
2. add local variable with name to match comment
3. remove useless comment

llvm-svn: 237715
2015-05-19 19:10:57 +00:00
Sanjay Patel 64a6da947a use range-based for-loop
llvm-svn: 237711
2015-05-19 18:24:33 +00:00
Sanjay Patel 3c9e370ec0 use range-based for loop
llvm-svn: 237705
2015-05-19 17:49:14 +00:00
Matthias Braun 20683efd47 SelectionDAG: Cleanup and simplify FoldConstantArithmetic
This cleans up the FoldConstantArithmetic code by factoring out the case
of two ConstantSDNodes into an own function. This avoids unnecessary
complexity for many callers who already have ConstantSDNode arguments.

This also avoids an intermeidate SmallVector datastructure and a loop
over that datastructure.

llvm-svn: 237651
2015-05-19 01:40:21 +00:00
Matthias Braun 887fdfb759 DAGCombiner: Factor common pattern into isOneConstant() function. NFC
llvm-svn: 237645
2015-05-19 00:25:21 +00:00
Matthias Braun 033121981d DAGCombiner: Factor common pattern into isAllOnesConstant() function. NFC
llvm-svn: 237644
2015-05-19 00:25:20 +00:00
Matthias Braun 0542b5d1db DAGCombiner: Use isNullConstant() where possible
llvm-svn: 237643
2015-05-19 00:25:17 +00:00
Matthias Braun c545234772 Revert accidental change in r237633
llvm-svn: 237635
2015-05-18 23:18:13 +00:00
Matthias Braun 1505efb0bb DAGCombiner: Factor common pattern into isNullConstant() function. NFC
llvm-svn: 237633
2015-05-18 23:07:27 +00:00
Jim Grosbach 6f482000e9 MC: Clean up method names in MCContext.
The naming was a mish-mash of old and new style. Update to be consistent
with the new. NFC.

llvm-svn: 237594
2015-05-18 18:43:14 +00:00
Hal Finkel 44b81ee40b Preserve the order of READ_REGISTER and WRITE_REGISTER
At the present time, we don't have a way to represent general dependency
relationships, so everything is represented using memory dependency. In order
to preserve the data dependency of a READ_REGISTER on WRITE_REGISTER, we need
to model WRITE_REGISTER as writing (which we had been doing) and model
READ_REGISTER as reading (which we had not been doing). Fix this, and also the
way that the chain operands were generated at the SDAG level.

Patch by Nicholas Paul Johnson, thanks! Test case by me.

llvm-svn: 237584
2015-05-18 16:42:10 +00:00
Oliver Stannard 6cb23465e0 Revert r237579, as it broke windows buildbots
llvm-svn: 237583
2015-05-18 16:39:16 +00:00
Oliver Stannard 0c553afe6a [LLVM - ARM/AArch64] Add ACLE special register intrinsics
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.

This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.

Patch by Luke Cheeseman.

Differential Revision: http://reviews.llvm.org/D9699

llvm-svn: 237579
2015-05-18 16:23:33 +00:00
Hal Finkel a60e633fdd [DAGCombine] Be more pedantic about use iteration in CombineToPreIndexedLoadStore
In CombineToPreIndexedLoadStore, when the offset is a constant, we have code
that looks for other uses of the pointer which are constant offset computations
so that they can be rewritten in terms of the updated pointer so that we don't
need to keep a copy of the base pointer to compute these constant offsets.

Unfortunately, when it iterated over the uses, it did so by SDNodes, and so we
could confuse ourselves if the base pointer was produced by a node that had
multiple results (because we would not immediately exclude uses of the other
node results). This was reported as PR22755. Unfortunately, we don't have a
test case (and I've also been unable to produce one thus far), but at least the
mistake is clear. The right way to fix this problem is to make use of the information
contained in the use iterators to filter out any uses of other results of the
node producing the base pointer.

This should be mostly NFC, but should also fix PR22755 (for which,
unfortunately, we have no in-tree test case).

llvm-svn: 237576
2015-05-18 15:46:02 +00:00
Benjamin Kramer a48e0656b6 [WinEH] Push unique_ptr through the Action interface.
This was the source of many leaks in the past, this should fix them once and
for all.

llvm-svn: 237524
2015-05-16 15:40:03 +00:00
James Molloy 7307cd57c5 [SDAGBuilder] Make the AArch64 builder happier.
I intended this loop to only unwrap SplitVector actions, but it
was more broad than that, such as unwrapping WidenVector actions,
which makes operations seem legal when they're not.

llvm-svn: 237457
2015-05-15 17:41:29 +00:00
James Molloy 7e9776b559 Add SDNodes for umin, umax, smin and smax.
This adds new SDNodes for signed/unsigned min/max. These nodes are built from
select/icmp pairs matched at SDAGBuilder stage.

This patch adds the nodes, as well as legalization support and sets them to
be "expand" for all targets.

NFC for now; this will be tested when I switch AArch64 to using these new
nodes.

llvm-svn: 237423
2015-05-15 09:03:15 +00:00
Nick Lewycky 37a175007b Revert r237046. See the testcase on the thread where r237046 was committed.
llvm-svn: 237317
2015-05-13 23:41:47 +00:00
Sergey Dmitrouk 46c4f02848 [DebugInfo] Debug locations for constant SD nodes
Several updates for [DebugInfo] Add debug locations to constant SD nodes (r235989).
Includes:

 *  re-enabling the change (disabled recently);
 *  missing change for FP constants;
 *  resetting debug location of constant node if it's used more than at one place
    to prevent emission of wrong locations in case of coalesced constants;
 *  a couple of additional tests.

Now all look ups in CSEMap are wrapped by additional method.

Comment in D9084 suggests that debug locations aren't useful for "target constants",
so there might be one more change related to this API (namely, dropping debug
locations for getTarget*Constant methods).

Differential Revision: http://reviews.llvm.org/D9604

llvm-svn: 237237
2015-05-13 08:58:03 +00:00
Sanjoy Das a1d39ba940 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Pat Gavlin 08d7027cc1 [Statepoints] Clean up statepoint argument accessors.
Differential Revision: http://reviews.llvm.org/D9622

llvm-svn: 237191
2015-05-12 21:33:48 +00:00
Pat Gavlin c7dc6d6ee7 [Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands.
Differential Revision: http://reviews.llvm.org/D9623

llvm-svn: 237166
2015-05-12 19:50:19 +00:00
Igor Laevsky 87ef5eaf46 Reverse ordering of base and derived pointer during safepoint lowering.
According to the documentation in StackMap section for the safepoint we should have:
"The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated."
But before this change we emitted them in reverse order - derived pointer first, base pointer second.

llvm-svn: 237126
2015-05-12 13:12:14 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Sanjay Patel 5b202966f5 propagate IR-level fast-math-flags to DAG nodes; 2nd try; NFC
This is a less ambitious version of:
http://reviews.llvm.org/rL236546

because that was reverted in:
http://reviews.llvm.org/rL236600

because it caused memory corruption that wasn't related to FMF
but was actually due to making nodes with 2 operands derive from a
plain SDNode rather than a BinarySDNode. 

This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
 

llvm-svn: 237046
2015-05-11 21:07:09 +00:00
Andrew Kaylor ce6f907e2f Fixing build warnings
llvm-svn: 237042
2015-05-11 20:45:11 +00:00
Andrew Kaylor 762a6bea1f [WinEH] Update exception numbering to give handlers their own base state.
Differential Revision: http://reviews.llvm.org/D9512

llvm-svn: 237014
2015-05-11 19:41:19 +00:00
Simon Pilgrim e09584ca95 [SelectionDAG] Fixed constant folding issue when legalised types are smaller then the folded type.
Found when testing with llvm-stress on i686 targets.

llvm-svn: 236954
2015-05-10 14:14:51 +00:00
James Y Knight fca02be3c1 Fix MergeConsecutiveStore for non-byte-sized memory accesses.
The bug showed up as a compile-time assertion failure:
  Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.

Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.

It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.

Differential Revision: http://reviews.llvm.org/D9626

llvm-svn: 236927
2015-05-09 03:13:37 +00:00
Pete Cooper d54fb89901 [Fast-ISel] Don't mark the first use of a remat constant as killed.
When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed.  Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.

However, rematerialised constant expressions aren't generated bottom up.  The local value save area
grows downwards.  This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.

This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.

Note that this commit only makes kill flag generation conservative.  There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.

However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.

llvm-svn: 236922
2015-05-09 00:51:03 +00:00
Hans Wennborg ae0254dabc Switch lowering: cluster adjacent fall-through cases even at -O0
It's cheap to do, and codegen is much faster if cases can be merged
into clusters.

llvm-svn: 236905
2015-05-08 21:23:39 +00:00
Pete Cooper e4bb07ecff [Fast-ISel] Clear kill flags on registers replaced by updateValueMap.
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register.  This is done via updateValueMap,.

However, its possible that the source register we are rewriting *to* to also have uses.  If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.

This code checks for the case where the to register is also used, and if so it clears all kill on the from register.  This is conservative, but better that always clearing kills on the from register.

llvm-svn: 236897
2015-05-08 20:46:54 +00:00
Pat Gavlin cc0431d1c0 Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware.
This changes the shape of the statepoint intrinsic from:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)

to:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)

This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.

In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.

Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.

Differential Revision: http://reviews.llvm.org/D9501

llvm-svn: 236888
2015-05-08 18:07:42 +00:00
James Y Knight 284e7b3d6c Fix alignment checks in MergeConsecutiveStores.
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.

Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.

2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.

llvm-svn: 236850
2015-05-08 13:47:01 +00:00
Igor Laevsky 9d3932bf96 Fix coding standart based on post submit comments.
Differential Revision: http://reviews.llvm.org/D7760

llvm-svn: 236849
2015-05-08 13:17:22 +00:00
Hans Wennborg 44faaa7aa4 Switch lowering: handle zero-weight branch probabilities
After r236617, branch probabilities are no longer guaranteed to be >= 1. This
patch makes the swich lowering code handle that correctly, without bumping the
branch weights by 1 which might cause overflow and skews the probabilities.

Covered by @zero_weight_tree in test/CodeGen/X86/switch.ll.

llvm-svn: 236739
2015-05-07 15:47:15 +00:00
Pete Cooper 54085cdc7b Fix incorrect kill flags in fastisel.
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.

Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.

llvm-svn: 236650
2015-05-06 22:09:29 +00:00
Sanjoy Das 6c0fe24bd1 [SelectionDAG] Delete SelectionDAGBuilder::removeValue. NFC.
SelectionDAGBuilder::removeValue is dead now, after rL236563.

llvm-svn: 236618
2015-05-06 18:02:10 +00:00
Diego Novillo 14f94de1ee Allow 0-weight branches in BranchProbabilityInfo.
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".

In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.

Reviewers: hansw

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9533

llvm-svn: 236617
2015-05-06 17:55:11 +00:00
NAKAMURA Takumi e452998b4b Reformat.
llvm-svn: 236601
2015-05-06 14:03:22 +00:00
NAKAMURA Takumi d7c0be9c42 Revert r236546, "propagate IR-level fast-math-flags to DAG nodes (NFC)"
It caused undefined behavior.

llvm-svn: 236600
2015-05-06 14:03:12 +00:00
Pawel Bylica 9f1fb9d1ef SelectionDAG: Handle out-of-bounds index in extract vector element
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.

Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.

Reviewers: qcolombet, chandlerc, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D9250

llvm-svn: 236584
2015-05-06 10:19:14 +00:00
Sanjoy Das 4bfb472072 [Statepoint] Clean up StatepointLowering: symbolic constants.
For accessors in the `Statepoint` class, use symbolic constants for
offsets into the argument vector instead of literals.  This makes the
code intent clearer and simpler to change.

llvm-svn: 236566
2015-05-06 02:36:31 +00:00
Sanjoy Das 499d703f52 [Statepoint] Clean up Statepoint.h: accessor names.
Use getFoo() as accessors consistently and some other naming changes.

llvm-svn: 236564
2015-05-06 02:36:26 +00:00
Sanjoy Das c6bf3e9f12 [StatepointLowering] Don't create temporary instructions. NFCI.
Summary:
Instead of creating a temporary call instruction and lowering that, use
SelectionDAGBuilder::lowerCallOperands.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9480

llvm-svn: 236563
2015-05-06 02:36:20 +00:00
Sanjoy Das 1194d1e799 [SelectionDAG] Make an argument optional in RFV::getCopyToRegs. NFC.
Summary:
We default the value argument to nullptr.  The only use of the value is
in diagnosePossiblyInvalidConstraint and that seems to be resilient to
it being nullptr.

Reviewers: atrick, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9479

llvm-svn: 236555
2015-05-05 23:06:57 +00:00
Sanjoy Das 3936a97f11 [SelectionDAG] Move RegsForValue into SelectionDAGBuilder.h. NFC.
Summary:
The exported class will be used in later change, in
StatepointLowering.cpp.  It is still internal to SelectionDAG (not
exported via include/).

Reviewers: reames, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9478

llvm-svn: 236554
2015-05-05 23:06:54 +00:00
Sanjoy Das 84153c450a [SelectionDAG] Pass explicit type to lowerCallOperands. NFC.
Summary:
Currently this does not change anything, but change will be used in a
later change to StatepointLowering.cpp

Reviewers: reames, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9477

llvm-svn: 236553
2015-05-05 23:06:52 +00:00
Sanjoy Das 3fb91c0a0d [StatepointLowering] Rename variable, NFC.
Rename LoweredArgs to LoweredMetaArgs to clarify intent.

llvm-svn: 236552
2015-05-05 23:06:49 +00:00
Sanjay Patel 801caff64d propagate IR-level fast-math-flags to DAG nodes (NFC)
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997

...which split the existing nsw / nuw / exact flags and FMF
into their own struct.

There are 2 structural changes here:

1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics 
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.

2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes. 

Differential Revision: http://reviews.llvm.org/D8900

llvm-svn: 236546
2015-05-05 21:40:38 +00:00
Ulrich Weigand 9958c489bb [DAGCombiner] Account for getVectorIdxTy() when narrowing vector load
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it.  This is needed on z, where element numbers are i32
but pointers are i64.

Original patch by Richard Sandiford.

llvm-svn: 236530
2015-05-05 19:34:10 +00:00
Ulrich Weigand af2c618e2b [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt).  For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)

Original patch by Richard Sandiford.

llvm-svn: 236529
2015-05-05 19:33:37 +00:00
Ulrich Weigand 2693c0a491 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.

This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores.  E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.

Original patch by Richard Sandiford.

llvm-svn: 236528
2015-05-05 19:32:57 +00:00
Elena Demikhovsky 1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim 017ca19384 [DAGCombiner] Enabled vector float/double -> int constant folding
llvm-svn: 236387
2015-05-02 13:04:07 +00:00
Simon Pilgrim 9fb06bca67 [SelectionDAG] Unary vector constant folding integer legality fixes
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.

The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.

I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.

Differential Revision: http://reviews.llvm.org/D9282

llvm-svn: 236308
2015-05-01 08:20:04 +00:00
Jan Vesely 808fff585b Reinstate revisions r234755, r234759, r234760
changes:
  Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
  Add location to getConstant
  Drop comment about the ops being turned into expand

llvm-svn: 236240
2015-04-30 17:15:56 +00:00
Daniel Jasper 0366cd23ac Inline local variable to silence unused warning.
llvm-svn: 236212
2015-04-30 08:51:13 +00:00
Elena Demikhovsky e1eda8a9e6 Masked gather and scatter - added DAGCombine visitors
and AVX-512 instruction selection patterns.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 236211
2015-04-30 08:38:48 +00:00
Owen Anderson d8a029c81b Semantically revert r236031, which is not a good idea for in-order targets.
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).

Sanjay agreed with reverting this patch until these issues can be
resolved.

llvm-svn: 236199
2015-04-30 04:06:32 +00:00
Hans Wennborg 4b828d35fd Switch lowering: use profile info to build weight-balanced binary search trees
This will cause hot nodes to appear closer to the root.

The literature says building the tree like this makes it a near-optimal (in
terms of search time given key frequencies) binary search tree. In LLVM's case,
we can do up to 3 comparisons in each leaf node, so it might be better to opt
for lower tree height in some cases; that's something to look into in the
future.

Differential Revision: http://reviews.llvm.org/D9318

llvm-svn: 236192
2015-04-30 00:57:37 +00:00
Sanjay Patel 04b0e92766 generalize binop reassociation; NFC
Move the fold introduced in r236031:
http://reviews.llvm.org/rL236031

to its own helper function, so we can use it for other binops.

This is a preliminary step before partially solving:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

llvm-svn: 236171
2015-04-29 22:30:02 +00:00
Pat Gavlin 022c5acad8 Run StatepointLowering.{cpp,h} through clang-format.
llvm-svn: 236166
2015-04-29 21:52:45 +00:00
Sanjay Patel caf5180ff7 tidy up; NFC
llvm-svn: 236156
2015-04-29 21:01:41 +00:00
Sanjay Patel ee6678119d too much space again; NFC
llvm-svn: 236150
2015-04-29 20:38:02 +00:00
Sanjay Patel 435efaadff too much space; NFC
llvm-svn: 236147
2015-04-29 20:32:57 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Jan Vesely 7539548738 CodeGen: Default overflow operations to expand so we don't have to assume targets are lying
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: ab
Differential Revision: http://reviews.llvm.org/D9265

llvm-svn: 236119
2015-04-29 16:30:46 +00:00
Elena Demikhovsky ac969012ef Fixed masked gather/scatter switch-case
llvm-svn: 236092
2015-04-29 08:38:53 +00:00
Elena Demikhovsky 744fe0de33 fixed comments, blanks, nullptr; NFC
llvm-svn: 236086
2015-04-29 06:49:50 +00:00
Sanjay Patel 2fbc4e5c49 transform fadd chains to increase parallelism
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
The optimal balanced binary tree would reduce the chain to log2(N).

The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.

We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Differential Revision: http://reviews.llvm.org/D9232

llvm-svn: 236031
2015-04-28 21:03:22 +00:00
Sanjay Patel ba55804ea3 move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325

llvm-svn: 235997
2015-04-28 16:39:12 +00:00
Sergey Dmitrouk 842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper 48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Elena Demikhovsky 584ce378ab Masked gather and scatter: Added code for SelectionDAG.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 235970
2015-04-28 07:57:37 +00:00
Hans Wennborg 7bf4d4eee0 Switch lowering: use uint32_t for weights everywhere
I previously thought switch clusters would need to use uint64_t in case
the weights of multiple cases overflowed a 32-bit int. It turns
out that the weights on a terminator instruction are capped to allow for
being added together, so using a uint32_t should be safe.

llvm-svn: 235945
2015-04-27 23:52:19 +00:00
Hans Wennborg 67c03759e4 Switch lowering: Take branch weight into account when ordering for fall-through
Previously, the code would try to put a fall-through case last,
even if that meant moving a case with much higher branch weight
further down the chain.

Ordering by branch weight is most important, putting a fall-through
block last is secondary.

llvm-svn: 235942
2015-04-27 23:35:22 +00:00
Hans Wennborg ba6d2568f9 Switch lowering: order bit tests by branch weight.
llvm-svn: 235912
2015-04-27 20:21:17 +00:00
Quentin Colombet 8229145961 [DAGCombiner] Fix the type used in canFoldInAddressingMode to account for the
right scaling.

In the function canFoldInAddressingMode, VT is computed as the type of the
destination/source of a LOAD/STORE operations, instead of the memory type of the
operation.
On targets with a scaling factor on the offset of the LOAD/STORE operations, the
function may return false for actually valid cases. This may then prevent the
selection of profitable pre or post indexed load/store operations, and instead
select pre or post indexed load/store for unprofitable cases.

Patch by Francois de Ferriere <francois.de-ferriere@st.com>!

Differential Revision: http://reviews.llvm.org/D9146

llvm-svn: 235780
2015-04-24 21:28:00 +00:00
Reid Kleckner cfbfe6f29c [SEH] Implement GetExceptionCode in __except blocks
This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered
by copying the EAX value live into whatever basic block it is called
from. Obviously, this only works if you insert it late during codegen,
because otherwise mid-level passes might reschedule it.

llvm-svn: 235768
2015-04-24 20:25:05 +00:00
Hans Wennborg ec679a8b3b Switch lowering: fix APInt overflow causing infinite loop / OOM
llvm-svn: 235729
2015-04-24 16:53:55 +00:00
Reid Kleckner 5c5facc2ce Re-commit "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works"
This reverts commit r235617.

r235649 should have addressed the problems.

llvm-svn: 235667
2015-04-23 23:22:33 +00:00
Reid Kleckner 909ea7e6b8 Revert "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works"
We still have some "uses remain after removal" issues in -O0 builds.

This reverts commit r235557.

llvm-svn: 235617
2015-04-23 18:34:01 +00:00
Hans Wennborg 0867b151c9 Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.

llvm-svn: 235608
2015-04-23 16:45:24 +00:00
Aaron Ballman 0be238cebd Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range.
llvm-svn: 235597
2015-04-23 13:41:59 +00:00
Simon Pilgrim 86b034bae9 [DAGCombiner] Remove extra bitcasts surrounding vector shuffles
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)

Differential Revision: http://reviews.llvm.org/D9097

llvm-svn: 235578
2015-04-23 08:43:13 +00:00
Hans Wennborg 15823d49b6 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a re-commit of r235101, which also fixes the problems with the previous patch:

- Switches with only a default case and non-fallthrough were handled incorrectly

- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.

> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235560
2015-04-22 23:14:56 +00:00
Reid Kleckner 64a2a6a473 [SEH] Remove the old __C_specific_handler code now that WinEHPrepare works
This removes the -sehprepare flag and makes __C_specific_handler
functions always to use WinEHPrepare.

This was tested by building all of chromium_builder_tests and running a
few tests that use SEH, but if something breaks, we can revert this.

llvm-svn: 235557
2015-04-22 22:13:09 +00:00
Olivier Sallenave c587bee405 Fixed logic to enable complex FMA formation.
llvm-svn: 235508
2015-04-22 14:07:26 +00:00
Hal Finkel 0d49cf2645 [DAGCombine] Disable select(c, load,load) for indexed loads
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.

llvm-svn: 235497
2015-04-22 11:32:25 +00:00
Lang Hames 65613a634a [patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.

The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.

llvm-svn: 235483
2015-04-22 06:02:31 +00:00
Duncan P. N. Exon Smith 60635e39b6 DebugInfo: Drop rest of DIDescriptor subclasses
Delete the remaining subclasses of (the already deleted) `DIDescriptor`.
Part of PR23080.

llvm-svn: 235404
2015-04-21 18:44:06 +00:00
Duncan P. N. Exon Smith d4a19a396d DebugInfo: Assert dbg.declare/value insts are valid
Remove early returns for when `getVariable()` is null, and just assert
that it never happens.  The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken.  I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.

llvm-svn: 235400
2015-04-21 18:24:23 +00:00
Reid Kleckner d2a1a51996 Re-land r235154-r235156 under the existing -sehprepare flag
Keep the old SEH fan-in lowering on by default for now, since projects
rely on it.  This will make it easy to test this change with a simple
flag flip.

llvm-svn: 235399
2015-04-21 18:23:57 +00:00
Simon Pilgrim 860f08779c CONCAT_VECTOR of BUILD_VECTOR - minor fix
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.

Test case spotted by Patrik Hagglund

llvm-svn: 235371
2015-04-21 08:05:43 +00:00
Pawel Bylica 57c2f7c756 Fix generic shift expansion when shift amount is 0
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439. 

This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type. 

Patch by Keno Fischer!

Test Plan: regression test from http://reviews.llvm.org/D7752

Reviewers: chfast, resistor

Reviewed By: chfast, resistor

Subscribers: sanjoy, resistor, chfast, llvm-commits

Differential Revision: http://reviews.llvm.org/D4978

llvm-svn: 235370
2015-04-21 06:28:36 +00:00
Olivier Sallenave b99c2eb0f0 Refactoring and enhancement to FMA combine.
llvm-svn: 235344
2015-04-20 20:29:40 +00:00
Tom Stellard 69a7b91e95 DAGCombine: Remove redundant NaN checks around ISD::FSQRT
This folds:

(select (setcc x, -0.0, *lt), NaN, (fsqrt x)) -> ( fsqrt x)

llvm-svn: 235333
2015-04-20 19:38:27 +00:00
Hal Finkel 1e5733bbed [InlineAsm] Remove EarlyClobber on registers that are also inputs
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:

({                                                                         \
  register uint64_t r0 __asm__ ("r0") = (__NR_ ## name);                   \
  register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0));               \
  register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1));               \
  register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2));               \
  __asm__ __volatile__                                                     \
  ("sc"                                                                    \
   : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5)                               \
   :   "0"(r0),  "1"(r3),  "2"(r4),  "3"(r5)                               \
   : "r6","r7","r8","r9","r10","r11","r12","cr0","memory");                \
  r3;                                                                      \
})

Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).

Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.

llvm-svn: 235283
2015-04-20 00:01:30 +00:00
Pirama Arumuga Nainar 50604a69e9 Fix build errors introduced by r235215
Summary:
- Handle TypePromoteFloat in switch statements
- Move an expression into an assert to avoid unused variable in
  non-assert builds.

Reviewers: srhines, ab

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9086

llvm-svn: 235220
2015-04-17 19:51:44 +00:00
Pirama Arumuga Nainar db7c07e2bf Add support to promote f16 to f32
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.

Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll

Reviewers: srhines, t.p.northover

Differential Revision: http://reviews.llvm.org/D8755

llvm-svn: 235215
2015-04-17 18:36:25 +00:00
James Molloy a4ff7b2713 Fix TRUNCATE splitting helper logic.
This is a followon to r233681 - I'd misunderstood the semantics of FTRUNC,
and had confused it with (FP_ROUND ..., 0).

Thanks for Ahmed Bougacha for his post-commit review!

llvm-svn: 235191
2015-04-17 13:51:40 +00:00
Nico Weber a762fa6c98 Revert r235154-r235156, they cause asserts when building win64 code (http://crbug.com/477988)
llvm-svn: 235170
2015-04-17 09:10:43 +00:00
Reid Kleckner d4523e3c51 [SEH] Reimplement x64 SEH using WinEHPrepare
This now emits simple, unoptimized xdata tables for __C_specific_handler
based on the handlers listed in @llvm.eh.actions calls produced by
WinEHPrepare.

This adds support for running __finally blocks when exceptions are
thrown, and removes the old landingpad fan-in codepath.

I ran some manual execution tests on small basic test cases with and
without optimization, as well as on Chrome base_unittests, which uses a
small amount of SEH.  I'm sure there are bugs, and we may need to
revert.

llvm-svn: 235154
2015-04-17 01:01:27 +00:00
Hans Wennborg a9e2057416 Revert the switch lowering change (r235101, r235103, r235106)
Looks like it broke the sanitizer-ppc64-linux1 build. Reverting for now.

llvm-svn: 235108
2015-04-16 15:43:26 +00:00
Hans Wennborg d403664ed8 Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)
This is a major rewrite of the SelectionDAG switch lowering. The previous code
would lower switches as a binary tre, discovering clusters of cases
suitable for lowering by jump tables or bit tests as it went along. To increase
the likelihood of finding jump tables, the binary tree pivot was selected to
maximize case density on both sides of the pivot.

By not selecting the pivot in the middle, the binary trees would not always
be balanced, leading to performance problems in the generated code.

This patch rewrites the lowering to search for clusters of cases
suitable for jump tables or bit tests first, and then builds the binary
tree around those clusters. This way, the binary tree will always be balanced.

This has the added benefit of decoupling the different aspects of the lowering:
tree building and jump table or bit tests finding are now easier to tweak
separately.

For example, this will enable us to balance the tree based on profile info
in the future.

The algorithm for finding jump tables is O(n^2), whereas the previous algorithm
was O(n log n) for common cases, and quadratic only in the worst-case. This
doesn't seem to be major problem in practice, e.g. compiling a file consisting
of a 10k-case switch was only 30% slower, and such large switches should be rare
in practice. Compiling e.g. gcc.c showed no compile-time difference.  If this
does turn out to be a problem, we could limit the search space of the algorithm.

This commit also disables all optimizations during switch lowering in -O0.

Differential Revision: http://reviews.llvm.org/D8649

llvm-svn: 235101
2015-04-16 14:49:23 +00:00
Simon Pilgrim 6bd5d3caa9 TRUNCATE constant folding - minor fix for rL233224
Fix for test case found by James Molloy - TRUNCATE of constant build vectors can be more simply achieved by simply replacing with a new build vector node with the truncated value type - no need to touch the scalar operands at all.

llvm-svn: 235079
2015-04-16 08:21:09 +00:00
Ahmed Bougacha c984b90c86 [CodeGen] Re-apply r234809 (concat of scalars), with an x86_mmx fix.
The only type that isn't an integer, isn't floating point, and isn't
a vector; ladies and gentlemen, the gift that keeps on giving: x86_mmx!

Fixes PR23246.

Original message (reverted in r235062):
[CodeGen] Combine concat_vectors of scalars into build_vector.

Combine something like:
  (v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
  (v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

If any of the scalars are floating point, use that throughout.

Differential Revision: http://reviews.llvm.org/D8948

llvm-svn: 235072
2015-04-16 02:39:14 +00:00
Duncan P. N. Exon Smith b273d06b63 DebugInfo: Gut DIScope, DIEnumerator and DISubrange
The only class the still has API left is `DIDescriptor` itself.

llvm-svn: 235067
2015-04-16 01:37:00 +00:00
Nick Lewycky b8557a972f Revert r234809 because it caused PR23246.
llvm-svn: 235062
2015-04-16 00:56:20 +00:00
Reid Kleckner 3e9fadfbc8 [WinEH] Try to make the MachineFunction CFG more accurate
This avoids emitting code for unreachable landingpad blocks that contain
calls to llvm.eh.actions and indirectbr.

It's also a first step towards unifying the SEH and WinEH lowering
codepaths. I'm keeping the old fan-in lowering of SEH around until the
preparation version works well enough that we can switch over without
breaking existing users.

llvm-svn: 235037
2015-04-15 18:48:15 +00:00
Daniel Berlin 25db4f4141 Add range iterators for post order and inverse post order. Use them
llvm-svn: 235026
2015-04-15 17:41:42 +00:00
Duncan P. N. Exon Smith 7348ddaa74 DebugInfo: Gut DIVariable and DIGlobalVariable
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`.  Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).

llvm-svn: 234840
2015-04-14 02:22:36 +00:00
Ahmed Bougacha 8ebcdb3bc3 [CodeGen] Combine concat_vectors of scalars into build_vector.
Combine something like:
  (v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
  (v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

If any of the scalars are floating point, use that throughout.

Differential Revision: http://reviews.llvm.org/D8948

llvm-svn: 234809
2015-04-13 22:57:21 +00:00
Duncan P. N. Exon Smith 745a5db444 SelectionDAG: Stop using DIVariable::isInlinedFnArgument()
Instead of calling the somewhat confusingly-named
`DIVariable::isInlinedFnArgument()`, do the check directly here.
There's possibly a small functionality change here: instead of
`dyn_cast<>`'ing `DV->getScope()` to `MDSubprogram`, I'm looking up the
scope chain for the actual subprogram.  I suspect that this is a no-op
for function arguments so in practise there isn't a real difference.

I've also added a `FIXME` to check the `inlinedAt:` chain instead, since
I wonder if that would be more reliable than the
`MDSubprogram::describes()` function.

Since this was the only user of `DIVariable::isInlinedFnArgument()`,
delete it.

llvm-svn: 234799
2015-04-13 21:38:48 +00:00
Jan Vesely ffcd968647 Revert revisions r234755, r234759, r234760
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"

Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.

llvm-svn: 234768
2015-04-13 17:47:15 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Matthias Braun a283cb3265 DAGCombiner: Fix crash in select(select) opt.
In case of different types used for the condition of the selects the
select(select) -> select(and) normalisation cannot be performed.

See also: http://reviews.llvm.org/D7622

llvm-svn: 234763
2015-04-13 17:16:33 +00:00
David Blaikie 155f38e0d8 Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)
llvm-svn: 234760
2015-04-13 16:37:50 +00:00
Jan Vesely a835555e40 LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB
v2: consider BooleanContents when processing overflow

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: resistor, jholewinsky (nvidia parts)
Differential Revision: http://reviews.llvm.org/D6340

llvm-svn: 234755
2015-04-13 15:32:01 +00:00
Benjamin Kramer dd0ff85701 Remove empty non-virtual destructors or mark them =default when non-public
These add no value but can make a class non-trivially copyable. NFC.

llvm-svn: 234688
2015-04-11 15:32:26 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
David Majnemer 5c65f58f64 [WinEHPrepare] Don't rely on the order of IR
The IPToState table must be emitted after we have generated labels for
all functions in the table.  Don't rely on the order of the list of
globals.  Instead, utilize WinEHFuncInfo to tell us how many catch
handlers we expect to outline.  Once we know we've visited all the catch
handlers, emit the cppxdata.

llvm-svn: 234566
2015-04-10 04:56:17 +00:00
Ahmed Bougacha 1ffe7c7d36 [AArch64] Promote f16 operations to f32.
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.

Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.

f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.

While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.

Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755

llvm-svn: 234550
2015-04-10 00:08:48 +00:00
Ahmed Bougacha df43737782 [CodeGen] Combine concat_vector of trunc'd scalar to scalar_to_vector.
We already do:
  concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
  concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.

Differential Revision: http://reviews.llvm.org/D8883

llvm-svn: 234530
2015-04-09 20:04:47 +00:00
Rafael Espindola 1c84271694 Revert "Refactoring and enhancement to FMA combine."
This reverts commit r234513. It was failing on the bots.

llvm-svn: 234518
2015-04-09 18:29:32 +00:00
Olivier Sallenave 53703d0862 Refactoring and enhancement to FMA combine.
llvm-svn: 234513
2015-04-09 17:55:26 +00:00
Akira Hatanaka c6fab80536 [DAGCombine] Fix a bug in MergeConsecutiveStores.
The bug manifests when there are two loads and two stores chained as follows in
a DAG,

(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)

and the stores' values are extracted from the preceding vector loads.

MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.

This commits fixes the bug by replacing the last store in the chain instead.

rdar://problem/20275084

Differential Revision: http://reviews.llvm.org/D8849

llvm-svn: 234430
2015-04-08 20:34:53 +00:00
Duncan P. N. Exon Smith e686f1591f CodeGen: Stop using DIDescriptor::is*() and auto-casting
Same as r234255, but for lib/CodeGen and lib/Target.

llvm-svn: 234258
2015-04-06 23:27:40 +00:00
Rafael Espindola d58de064b8 Use sext in fast isel.
Fast isel used to zero extends immediates to 64 bits. This normally goes
unnoticed because the value is truncated to 32 bits for output.

Two cases were it is noticed:

* We fail to use smaller encodings.
* If the original constant was smaller than i32.

In the tests using i1 constants, codegen would change to use -1, which is fine
(and matches what regular isel does) since only the lowest bit is then used.

Instead, this patch then changes the ir to use i8 constants, which looks more
like what clang produces.

llvm-svn: 234249
2015-04-06 22:29:07 +00:00
Reid Kleckner b401941f3d [WinEH] Don't sink allocas into child handlers
The uselist isn't enough to infer anything about the lifetime of such
allocas. If we want to re-add this optimization, we will need to
leverage lifetime markers to do it.

Fixes PR23122.

llvm-svn: 234196
2015-04-06 18:50:38 +00:00
Simon Pilgrim 07e063e44c [DAGCombiner] Add support for FCEIL, FFLOOR and FTRUNC vector constant folding
Differential Revision: http://reviews.llvm.org/D8715

llvm-svn: 234179
2015-04-06 17:15:41 +00:00
Simon Pilgrim bcf3bc2757 [DAGCombiner] Merge FMUL Scalar and Vector constant canonicalization to RHS. NFCI.
llvm-svn: 234118
2015-04-05 14:30:37 +00:00
Sanjay Patel 59f60a91b8 less space; NFC
llvm-svn: 234106
2015-04-04 21:05:52 +00:00
Simon Pilgrim 20b7aba04a [DAGCombiner] Canonicalize vector constants for ADD/MUL/AND/OR/XOR re-association
Scalar integers are commuted to move constants to the RHS for re-association - this ensures vectors do the same.

llvm-svn: 234092
2015-04-04 10:20:31 +00:00
David Majnemer 7f5e714406 [WinEH] Fill out CatchHigh in the TryBlockMap
Now all fields in the WinEH xdata have been filled out.

llvm-svn: 234067
2015-04-03 23:37:34 +00:00
David Majnemer 69132a7fb2 [WinEH] Fill out .xdata for catch objects
This add support for catching an exception such that an exception object
available to the catch handler will be initialized by the runtime.

llvm-svn: 234062
2015-04-03 22:49:05 +00:00
David Majnemer 3337064a47 [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

llvm-svn: 234061
2015-04-03 22:32:26 +00:00
Andrew Kaylor aa92ab069c [WinEH] Handle nested landing pads in outlined catch handlers
Differential Revision: http://reviews.llvm.org/D8596

llvm-svn: 234041
2015-04-03 19:37:50 +00:00
Duncan P. N. Exon Smith 3bef6a3803 CodeGen: Assert that inlined-at locations agree
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.

The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.

If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.

llvm-svn: 234038
2015-04-03 19:20:26 +00:00
Duncan P. N. Exon Smith 66463cc5dc SelectionDAG: Use specialized metadata nodes in EmitFuncArgumentDbgValue(), NFC
Use `MDLocalVariable` and `MDExpression` directly for the arguments of
`EmitFuncArgumentDbgValue()` to simplify a follow-up patch.

llvm-svn: 234026
2015-04-03 17:11:42 +00:00
Simon Pilgrim ed2ba33ba0 [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

Differential Revision: http://reviews.llvm.org/D8516

llvm-svn: 234004
2015-04-03 10:02:21 +00:00
David Majnemer e8eb9e6de3 [WinEH] Implement support for catch-all
A catch (...) doesn't have a type descriptor.  Instead, the 'adjectives'
field has bit six set.

llvm-svn: 233788
2015-04-01 05:20:42 +00:00
Jiangning Liu b0f076910b Fix PR23065. Avoid optimizing bitcast of build_vector with constant input to scalar_to_vector.
llvm-svn: 233778
2015-04-01 01:52:38 +00:00
David Majnemer d1079bf27a [WinEH] ExitingScope is vacuously true if !PoppedCatches.empty()
Remove a redundant condition, no functional change intended.

llvm-svn: 233770
2015-03-31 22:43:56 +00:00
David Majnemer a225a19dd0 [WinEH] Generate .xdata for catch handlers
This lets us catch exceptions in simple cases.

N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
  that we generate.
- IP-to-State tables are sensitive to the order of emission.

llvm-svn: 233767
2015-03-31 22:35:44 +00:00
Hal Finkel 17b6d77a5f [SDAG] Handle non-integer preferred memset types for non-constant values
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:

  1. If the preferred type is a floating-point value, do the mul-splat trick on
     the corresponding integer type and then bitcast.
  2. If the preferred type is a vector, do the mul-splat trick on one vector
     element, and then build a vector out of them.

Fixes PR22754 (although, we should also turn off use of vector types at -O0).

llvm-svn: 233749
2015-03-31 20:35:26 +00:00
Sanjay Patel d399d94837 typos; NFC
llvm-svn: 233701
2015-03-31 16:17:51 +00:00
James Molloy 4c1b746771 [SDAG] Move TRUNCATE splitting logic into a helper, and use
it more liberally.

SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.

llvm-svn: 233681
2015-03-31 10:20:58 +00:00
David Majnemer 9a55539bef Silence an unused variable warning.
No functional change intended.

llvm-svn: 233639
2015-03-30 23:14:45 +00:00
David Majnemer cde33036ed [WinEH] Run cleanup handlers when an exception is thrown
Generate tables in the .xdata section representing what actions to take
when an exception is thrown.  This currently fills in state for
cleanups, catch handlers are still unfinished.

llvm-svn: 233636
2015-03-30 22:58:10 +00:00
Duncan P. N. Exon Smith 9dffcd04f7 CodeGen: Use the new DebugLoc API, NFC
Update lib/CodeGen (and lib/Target) to use the new `DebugLoc` API.

llvm-svn: 233582
2015-03-30 19:14:47 +00:00
Duncan P. N. Exon Smith b525e1c07c SelectionDAG: Reflow code to use early returns, NFC
llvm-svn: 233577
2015-03-30 18:23:28 +00:00
Simon Pilgrim dcbe1213c8 Use SDValue bool check to tidyup some possible vector folding ops. NFC.
llvm-svn: 233498
2015-03-29 19:13:40 +00:00
Simon Pilgrim d15c2805ab Use SDValue bool check to tidyup some possible ReassociateOps. NFC.
llvm-svn: 233495
2015-03-29 16:49:51 +00:00
Simon Pilgrim 7fdcc30e93 [DAGCombiner] Fixed incorrect test for buildvector of constant integers.
DAGCombiner::ReassociateOps was correctly testing for an constant integer scalar but failed to correctly test for constant integer vectors (it was testing for any constant vector).

llvm-svn: 233482
2015-03-28 18:31:31 +00:00
Sanjay Patel f176566a00 fix typo and 80-col; NFC
llvm-svn: 233427
2015-03-27 21:45:18 +00:00
Ahmed Bougacha faf8065a99 [CodeGen] Don't attempt a tail-call with a non-forwarded explicit sret.
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.

Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510

llvm-svn: 233410
2015-03-27 20:35:49 +00:00
Ahmed Bougacha e2bd5d36b3 [CodeGen] Don't attempt a tail-call with implicit sret.
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.

Differential Revison: http://reviews.llvm.org/D8510

llvm-svn: 233409
2015-03-27 20:28:30 +00:00
Yaron Keren 75e0c4b060 Remove superfluous .str() and replace std::string concatenation with Twine.
llvm-svn: 233392
2015-03-27 17:51:30 +00:00
Philip Reames e1bf27045d Require a GC strategy be specified for functions which use gc.statepoint
This was discussed a while back and I left it optional for migration.  Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.  

llvm-svn: 233357
2015-03-27 05:09:33 +00:00
Philip Reames f8f0933b48 Allow explicit spill slots to be specified for a gc.statepoint
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint.  This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure.  The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like.  One major option is to give up on doing spilling in the backend and do it at the IR level instead.  We'd give up the ability to have gc values in registers, but that's a minor cost in practice.  We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure.  Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.  

I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.

llvm-svn: 233356
2015-03-27 04:52:48 +00:00
David Majnemer b919dd693f WinEH: Create a parent frame alloca for HandlerType xdata tables
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub.  We can see the intrinsic in the
prepared IR, though.

llvm-svn: 233354
2015-03-27 04:17:07 +00:00
Andrew Trick e97ff5a2ad Fix a bug in SelectionDAG scheduling backtracking code: PR22304.
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.

This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.

Patch by Pawel Bylica!

llvm-svn: 233351
2015-03-27 03:44:13 +00:00
Ahmed Bougacha e85a2d34c6 [CodeGen] Report error rather than crash when unable to makeLibCall.
Also, make the assumption explicit in the header.

llvm-svn: 233329
2015-03-26 22:46:58 +00:00
Sanjay Patel 5b305d2d66 revert inadvertent change
llvm-svn: 233294
2015-03-26 17:19:24 +00:00
Sanjay Patel 4fa4a886d7 comment cleanup; NFC
llvm-svn: 233293
2015-03-26 17:18:17 +00:00
Sanjay Patel d95dd9e5fb fix indent; NFC
llvm-svn: 233288
2015-03-26 16:55:17 +00:00
Simon Pilgrim 09f3ff9a0a [DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.

It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.

Differential Revision: http://reviews.llvm.org/D8593

llvm-svn: 233224
2015-03-25 22:30:31 +00:00
Reid Kleckner 7e9546b378 WinEH: Create an unwind help alloca for __CxxFrameHandler3 xdata tables
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.

llvm-svn: 233209
2015-03-25 20:10:36 +00:00
Paul Robinson 284f0451cf 'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.

llvm-svn: 233153
2015-03-25 00:10:24 +00:00
Simon Pilgrim 481f4146cd [SelectionDAG] Fixed issue with uitofp vector constant folding being treated as sitofp
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):

%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0

The vector constant folding was always using sitofp:

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>

This patch fixes this so that the correct opcode is used for sitofp and uitofp.

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>

Differential Revision: http://reviews.llvm.org/D8560

llvm-svn: 233033
2015-03-23 22:44:55 +00:00
Benjamin Kramer 799003bf8c Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Benjamin Kramer 51f6096cf8 Move private classes into anonymous namespaces
NFC.

llvm-svn: 232944
2015-03-23 12:30:58 +00:00
Petar Jovanovic 5b4362276b Fix sign extension for MIPS64 in makeLibCall function
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7791

llvm-svn: 232943
2015-03-23 12:28:13 +00:00
Hal Finkel 8f7c5a7f18 [SDAG] Don't widen VSETCC during type legalization for split operands
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).

llvm-svn: 232935
2015-03-23 08:22:43 +00:00
Hans Wennborg 90aa1a9653 SelectionDAGBuilder: Rangeify a loop. NFC.
llvm-svn: 232831
2015-03-20 18:48:40 +00:00
Hans Wennborg 2bdc4cf35f SelectionDAGBuilder::handleJTSwitchCase, simplify loop; NFC
llvm-svn: 232830
2015-03-20 18:48:31 +00:00
Hans Wennborg 077845eb81 Rewrite SelectionDAGBuilder::Clusterify to run in linear time. NFC.
It was previously repeatedly erasing elements from the middle of a vector,
causing O(n^2) worst-case run-time.

llvm-svn: 232789
2015-03-20 00:41:03 +00:00
Owen Anderson db4201235b Fix a nasty bug in DAGCombine of STORE nodes.
This is very related to the bug fixed in r174431.  The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores.  When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.

I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.

llvm-svn: 232780
2015-03-19 22:48:57 +00:00
Hans Wennborg b4db1420c2 Switch lowering: extract NextBlock function. NFC.
llvm-svn: 232759
2015-03-19 20:41:48 +00:00
Hans Wennborg 783254386e Switch lowering: remove unnecessary ConstantInt casts. NFC.
llvm-svn: 232729
2015-03-19 16:42:21 +00:00
Hans Wennborg 5b64657e36 SelectionDAGBuilder: update comment in HandlePHINodesInSuccessorBlocks.
From what I can tell, the code is checking for PHIs that expect any value from
this block, not just constants.

llvm-svn: 232697
2015-03-19 00:57:51 +00:00
Hans Wennborg 81cfeb1999 SelectionDAGIsel: Fix comment about terminators being "handled below".
That changed in r102128.

llvm-svn: 232692
2015-03-19 00:02:22 +00:00
David Majnemer e48237df95 DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x)
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x).  This saves an instruction on
architectures like X86 and POWER(64).

Differential Revision: http://reviews.llvm.org/D8350

llvm-svn: 232572
2015-03-18 00:03:36 +00:00
Simon Pilgrim 257849f11a XformToShuffleWithZero - Added clearer early outs and general tidy up. NFCI
llvm-svn: 232557
2015-03-17 22:19:08 +00:00
Daniel Sanders 60f1db0525 Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints.
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.

PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there. 

llvm-svn: 232165
2015-03-13 12:45:09 +00:00
Sanjay Patel 4339abe66f [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the removal of insert/extract128 intrinsics.

The Clang precursor patch for this change was checked in at r232109.

llvm-svn: 232120
2015-03-12 23:16:18 +00:00
Hal Finkel e78e52ba9b Revert "r232027 - Add infrastructure for support of multiple memory constraints"
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.

Original commit message:

The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

llvm-svn: 232093
2015-03-12 20:09:39 +00:00
Sanjay Patel af1846c097 [X86, AVX] replace vextractf128 intrinsics with generic shuffles
Now that we've replaced the vinsertf128 intrinsics, 
do the same for their extract twins.

This is very much like D8086 (checked in at r231794):
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is also the LLVM sibling to the cfe D8275 patch.

Differential Revision: http://reviews.llvm.org/D8276

llvm-svn: 232045
2015-03-12 15:15:19 +00:00
Daniel Sanders 41c072e63b Add infrastructure for support of multiple memory constraints.
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8171

llvm-svn: 232027
2015-03-12 11:00:48 +00:00
Reid Kleckner 016c6b2104 Handle big index in getelementptr instruction
CodeGen incorrectly ignores (assert from APInt) constant index bigger
than 2^64 in getelementptr instruction. This is a test and fix for that.

Patch by Paweł Bylica!

Reviewed By: rnk

Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits

Differential Revision: http://reviews.llvm.org/D8219

llvm-svn: 231984
2015-03-11 23:36:10 +00:00
Eric Christopher 5f141b03fa Remove useMachineScheduler and replace it with subtarget options
that control, individually, all of the disparate things it was
controlling.

At the same time move a FIXME in the Hexagon port to a new
subtarget function that will enable a user of the machine
scheduler to avoid using the source scheduler for pre-RA-scheduling.
The FIXME would have this removed, but involves either testcase
changes or adding -pre-RA-sched=source to a few testcases.

llvm-svn: 231980
2015-03-11 22:56:10 +00:00
Eric Christopher 9deb75d176 Have getCallPreservedMask and getThisCallPreservedMask take a
MachineFunction argument so that we can grab subtarget specific
features off of it.

llvm-svn: 231979
2015-03-11 22:42:13 +00:00
Igor Laevsky 85f7f727d3 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)

Differential Revision: http://reviews.llvm.org/D7760

llvm-svn: 231800
2015-03-10 16:26:48 +00:00
Sanjay Patel 19792fb270 [X86, AVX] replace vinsertf128 intrinsics with generic shuffles
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088

Differential Revision: http://reviews.llvm.org/D8086

llvm-svn: 231794
2015-03-10 16:08:36 +00:00
Daniel Sanders 2db94ba0bc The operand flag word used in ISD::INLINEASM is an i32 not a pointer. NFC.
Summary:
This is part of the work to support memory constraints that behave
differently to 'm'. The subsequent patches will expand on the existing
encoding (which is a 32-bit int) and as a result in some flag words will no
longer fit into an i16. This problem only affected the MSP430 target which
appears to have 16-bit pointers.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D8168

llvm-svn: 231783
2015-03-10 10:42:59 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Ahmed Bougacha c809761dc0 [CodeGen] Replace the reused stores' chain for extractelt expansion.
This fixes a subtle issue that was introduced in r205153.

When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.

To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.

rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180

llvm-svn: 231721
2015-03-09 22:51:05 +00:00
Simon Pilgrim 8c58c066b7 [DAGCombiner] Add a shuffle mask commutation helper function. NFCI.
We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code.

This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it.

Differential Revision: http://reviews.llvm.org/D8139

llvm-svn: 231581
2015-03-07 22:33:11 +00:00
Benjamin Kramer 867bfc53ee Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.

llvm-svn: 231571
2015-03-07 17:41:00 +00:00
Simon Pilgrim 2dcbe74dfd Use SDValue bool check to tidyup some possible combines. NFC.
llvm-svn: 231569
2015-03-07 16:34:55 +00:00
Andrea Di Biagio c9d79e8103 [DAGCombiner] Fix wrong folding of AND dag nodes.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
2015-03-07 12:24:55 +00:00
Simon Pilgrim bede80a440 [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.

This prevents many cases of spilling scalar data between the gpr + simd registers. 

At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).

Differential Revision: http://reviews.llvm.org/D8132

llvm-svn: 231554
2015-03-07 05:52:42 +00:00
Matthias Braun 898d11e864 DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))

Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.

Differential Revision: http://reviews.llvm.org/D7622

llvm-svn: 231507
2015-03-06 19:49:10 +00:00
Matthias Braun 3ecb557739 DAGCombiner: Factor out some and/or combines.
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.

The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.

Differential Revision: http://reviews.llvm.org/D8026

llvm-svn: 231506
2015-03-06 19:49:06 +00:00
Michael Zolotukhin 03dd1082ad LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.

llvm-svn: 231443
2015-03-06 01:13:01 +00:00
Benjamin Kramer fb0abceb5c SelectionDAGBuilder: Merge 3 copies of the limited precision exp2 emission code.
NFC intended.

llvm-svn: 231406
2015-03-05 21:13:08 +00:00
Benjamin Kramer c54c38e090 SDAG: Merge the meat of two ExpandAtomic implementations.
The copies already diverged, don't let them become any worse. Reduce
redundancy in code with a little macro metaprogramming.

llvm-svn: 231401
2015-03-05 20:04:29 +00:00
David Majnemer 71b9b6be1b X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes
We know that the absolute symbol will be less than 2GB and thus will
always fit.

llvm-svn: 231389
2015-03-05 18:50:12 +00:00
Reid Kleckner cfb9ce53c1 Replace llvm.frameallocate with llvm.frameescape
Turns out it's pretty straightforward and simplifies the implementation.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D8051

llvm-svn: 231386
2015-03-05 18:26:34 +00:00
Simon Pilgrim 7189084bef [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

llvm-svn: 231380
2015-03-05 17:14:04 +00:00
Igor Laevsky 8d0851f509 Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test.
llvm-svn: 231374
2015-03-05 15:41:14 +00:00
Elena Demikhovsky de05f10de2 AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.

llvm-svn: 231371
2015-03-05 15:11:35 +00:00
Igor Laevsky 1725997f14 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.

llvm-svn: 231366
2015-03-05 14:11:21 +00:00
Michael Kuperstein fb95697c88 [DAGCombine] Fix a bug in a BUILD_VECTOR combine
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. 
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.

Differential Revision: http://reviews.llvm.org/D8040

llvm-svn: 231219
2015-03-04 07:27:39 +00:00
Mehdi Amini 367bfa42d8 Use report_fatal_error instead of unreachable for -fast-isel-abort
Suggestion by Andrea Di Biagio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231201
2015-03-04 01:48:39 +00:00
David Blaikie 49cfb81665 DAGCombiner::LoadedSlice: Remove explicit copy ctor in favor of the Rule of Zero
This way, the copy assignment operator can be used without hitting the
deprecated case in C++11.

llvm-svn: 231144
2015-03-03 21:50:47 +00:00